Updated documentation
This commit is contained in:
11
docs/README.md
Normal file
11
docs/README.md
Normal file
@@ -0,0 +1,11 @@
|
||||
# Documentation
|
||||
In this folder you can find the technical documentation of the
|
||||
`Datum` library as well as practical details on how to use it
|
||||
efficiently and safely.
|
||||
|
||||
At the time being, this documentation includes the following pages:
|
||||
|
||||
- [vector.md](vector.md): vector documentation;
|
||||
- [map.md](map.md): map documentation;
|
||||
- [sort.md](sort.md): how to use the `vector_sort` method.
|
||||
|
||||
75
docs/map.md
Normal file
75
docs/map.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Map Technical Details
|
||||
In this document you can find a quick overview of the technical
|
||||
aspects (internal design, memory layout, etc.) of the `Map` data structure.
|
||||
|
||||
`Map` is an hash table that uses open addressing with linear probing for collision
|
||||
resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function) as its hashing function. Resizing is performed
|
||||
automatically by doubling the capacity when the load factor exceeds 75%. Internally,
|
||||
this data structure is represented by the following two structures:
|
||||
|
||||
```c
|
||||
typedef struct {
|
||||
char *key;
|
||||
void *value;
|
||||
element_state_t state;
|
||||
} map_element_t;
|
||||
|
||||
typedef struct {
|
||||
map_element_t *elements;
|
||||
size_t capacity;
|
||||
size_t size;
|
||||
size_t tombstone_count;
|
||||
} map_t;
|
||||
```
|
||||
|
||||
where the `key` variable represent a string used to index the `value`. The `state`, instead, indicates whether the entry is empty, occupied or deleted and is primarily used
|
||||
by the garbage collector for internal memory management. An array of `map_element_t`,
|
||||
with the variables indicating the *capacity*, the *current size* and
|
||||
the *tombstone count* (that is, the number of delete entries), form a `map_t` data type.
|
||||
|
||||
The keys are **copied** by the hashmap; this means that it **owns** them and is therefore
|
||||
responsible for managing their memory. Values, on the other hand,
|
||||
**are stored as pointers**. This means that the hashmap **does NOT own them** and that
|
||||
the caller is responsible for managing their memory; this includes: allocate
|
||||
enough memory for them, ensure that the pointers remain valid for their whole lifecycle
|
||||
on the map, delete old values when updating a key and, if the values were heap-allocated,
|
||||
free them before removing the keys or destroying the map.
|
||||
|
||||
The `Map` data structure supports the following methods:
|
||||
|
||||
- `map_result_t map_new()`: initialize a new map;
|
||||
- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map;
|
||||
- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists;
|
||||
- `map_result_t map_remove(map, key)`: remove a key from the map if it exists;
|
||||
- `map_result_t map_clear(map)`: reset the map state;
|
||||
- `map_result_t map_destroy(map)`: delete the map;
|
||||
- `size_t map_size(map)`: returns map size (i.e., the number of elements);
|
||||
- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
|
||||
|
||||
As you can see by the previous function signatures, most methods that operate
|
||||
on the `Map` data type return a custom type called `map_result_t` which is
|
||||
defined as follows:
|
||||
|
||||
```c
|
||||
typedef enum {
|
||||
MAP_OK = 0x0,
|
||||
MAP_ERR_ALLOCATE,
|
||||
MAP_ERR_INVALID,
|
||||
MAP_ERR_NOT_FOUND
|
||||
} map_status_t;
|
||||
|
||||
typedef struct {
|
||||
map_status_t status;
|
||||
uint8_t message[RESULT_MSG_SIZE];
|
||||
union {
|
||||
map_t *map;
|
||||
void *element;
|
||||
} value;
|
||||
} map_result_t;
|
||||
```
|
||||
|
||||
Each method that returns such type indicates whether the operation was successful or not by setting
|
||||
the `status` field and by providing a descriptive message on the `message` field. If the operation was
|
||||
successful (that is, `status == MAP_OK`), you can either move on with the rest of the program or read
|
||||
the returned value from the sum data type. Of course, you can choose to ignore the return value (if you're brave enough :D), as illustrated
|
||||
in the first part of the README.
|
||||
173
docs/sort.md
Normal file
173
docs/sort.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# Sorting
|
||||
As indicated in the [its documentation](/docs/vector.md), the `Vector` data type
|
||||
provides an efficient in-place sorting function called `vector_sort` that uses
|
||||
a builtin implementation of the [Quicksort algorithm](https://en.wikipedia.org/wiki/Quicksort). This method requires an user-defined comparison procedure which allows the
|
||||
caller to customize the sorting behavior. The comparison procedure must adhere to the
|
||||
following specification:
|
||||
|
||||
1. Must return `vector_order_t`, which is defined as follows:
|
||||
|
||||
```c
|
||||
typedef enum {
|
||||
VECTOR_ORDER_LT = 0x0, // First element should come before the second
|
||||
VECTOR_ORDER_EQ, // The two elements are equivalent
|
||||
VECTOR_ORDER_GT // First element should come after the second
|
||||
} vector_order_t;
|
||||
```
|
||||
|
||||
and indicates the ordering relationship between any two elements.
|
||||
|
||||
2. Must accept two `const void*` parameters representing two elements to compare;
|
||||
3. Must be self-contained and handle all its own resources.
|
||||
|
||||
Let's look at some examples. For instance, let's say that we want to sort an array
|
||||
of integers in ascending and descending order:
|
||||
|
||||
```c
|
||||
#include <stdio.h>
|
||||
#include "src/vector.h"
|
||||
|
||||
vector_order_t cmp_int_asc(const void *x, const void *y) {
|
||||
int x_int = *(const int*)x;
|
||||
int y_int = *(const int*)y;
|
||||
|
||||
if (x_int < y_int) return VECTOR_ORDER_LT;
|
||||
if (x_int > y_int) return VECTOR_ORDER_GT;
|
||||
|
||||
return VECTOR_ORDER_EQ;
|
||||
}
|
||||
|
||||
vector_order_t cmp_int_desc(const void *x, const void *y) {
|
||||
return cmp_int_asc(y, x);
|
||||
}
|
||||
|
||||
/*
|
||||
* Compile with: gcc main.c src/vector.h
|
||||
* Output: Before sorting: -8 20 -10 125 34 9
|
||||
* After sorting (ascending order): -10 -8 9 20 34 125
|
||||
* After sorting (descending order): 125 34 20 9 -8 -10
|
||||
*/
|
||||
int main(void) {
|
||||
vector_t *v = vector_new(5, sizeof(int)).value.vector;
|
||||
|
||||
int values[] = { -8, 20, -10, 125, 34, 9 };
|
||||
for (size_t idx = 0; idx < 6; idx++) {
|
||||
vector_push(v, &values[idx]);
|
||||
}
|
||||
|
||||
// Print unsorted array
|
||||
printf("Before sorting: ");
|
||||
for (size_t idx = 0; idx < vector_size(v); idx++) {
|
||||
printf("%d ", *(int*)vector_get(v, idx).value.element);
|
||||
}
|
||||
|
||||
// Sort array in ascending order
|
||||
vector_sort(v, cmp_int_asc);
|
||||
|
||||
// Print sorted array
|
||||
printf("\nAfter sorting (ascending order): ");
|
||||
for (size_t idx = 0; idx < vector_size(v); idx++) {
|
||||
printf("%d ", *(int*)vector_get(v, idx).value.element);
|
||||
}
|
||||
|
||||
// Sort array in descending order
|
||||
vector_sort(v, cmp_int_desc);
|
||||
|
||||
// Print sorted array
|
||||
printf("\nAfter sorting (descending order): ");
|
||||
for (size_t idx = 0; idx < vector_size(v); idx++) {
|
||||
printf("%d ", *(int*)vector_get(v, idx).value.element);
|
||||
}
|
||||
|
||||
printf("\n");
|
||||
|
||||
vector_destroy(v);
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
Obviously, you can use the `vector_sort` method on custom data type as well.
|
||||
For instance, let's suppose that you have a structure representing the employees of
|
||||
a company and you wish to sort them based on their age and on their name (lexicographic sort):
|
||||
|
||||
```c
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include "src/vector.h"
|
||||
|
||||
typedef struct {
|
||||
char name[256];
|
||||
int age;
|
||||
} Employee;
|
||||
|
||||
vector_order_t cmp_person_by_age(const void *x, const void *y) {
|
||||
const Employee *x_person = (const Employee*)x;
|
||||
const Employee *y_person = (const Employee*)y;
|
||||
|
||||
if (x_person->age < y_person->age) return VECTOR_ORDER_LT;
|
||||
if (x_person->age > y_person->age) return VECTOR_ORDER_GT;
|
||||
|
||||
return VECTOR_ORDER_EQ;
|
||||
}
|
||||
|
||||
vector_order_t cmp_person_by_name(const void *x, const void *y) {
|
||||
const Employee *x_person = (const Employee*)x;
|
||||
const Employee *y_person = (const Employee*)y;
|
||||
|
||||
const int result = strcmp(x_person->name, y_person->name);
|
||||
|
||||
if(result < 0) return VECTOR_ORDER_LT;
|
||||
if(result > 0) return VECTOR_ORDER_GT;
|
||||
|
||||
return VECTOR_ORDER_EQ;
|
||||
}
|
||||
|
||||
/*
|
||||
* Compile with: gcc main.c src/vector.h
|
||||
* Output: Sort by age:
|
||||
* Name: Marco, Age: 25
|
||||
* Name: Alice, Age: 28
|
||||
* Name: Bob, Age: 45
|
||||
*
|
||||
* Sort by name:
|
||||
* Name: Alice, Age: 28
|
||||
* Name: Bob, Age: 45
|
||||
* Name: Marco, Age: 25
|
||||
*/
|
||||
int main(void) {
|
||||
vector_t *employees = vector_new(5, sizeof(Employee)).value.vector;
|
||||
|
||||
Employee e1 = { .name = "Bob", .age = 45 };
|
||||
Employee e2 = { .name = "Alice", .age = 28 };
|
||||
Employee e3 = { .name = "Marco", .age = 25 };
|
||||
|
||||
vector_push(employees, &e1);
|
||||
vector_push(employees, &e2);
|
||||
vector_push(employees, &e3);
|
||||
|
||||
// Sort array by age
|
||||
vector_sort(employees, cmp_person_by_age);
|
||||
|
||||
// Print sorted array
|
||||
printf("Sort by age:\n");
|
||||
for (size_t idx = 0; idx < vector_size(employees); idx++) {
|
||||
Employee *p = (Employee*)vector_get(employees, idx).value.element;
|
||||
printf("Name: %s, Age: %d\n", p->name, p->age);
|
||||
}
|
||||
|
||||
// Sort array by name
|
||||
vector_sort(employees, cmp_person_by_name);
|
||||
|
||||
// Print sorted array
|
||||
printf("\nSort by name:\n");
|
||||
for (size_t idx = 0; idx < vector_size(employees); idx++) {
|
||||
Employee *p = (Employee*)vector_get(employees, idx).value.element;
|
||||
printf("Name: %s, Age: %d\n", p->name, p->age);
|
||||
}
|
||||
|
||||
vector_destroy(employees);
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
70
docs/vector.md
Normal file
70
docs/vector.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# Vector Technical Details
|
||||
In this document you can find a quick overview of the technical
|
||||
aspects (internal design, memory layout, etc.) of the `Vector` data structure.
|
||||
|
||||
`Vector` is a dynamic array with generic data type support; this means that you can store
|
||||
any kind of homogenous value on this data structure. Resizing is performed automatically
|
||||
by increasing the capacity by 1.5 times when the array becomes full. Internally, this
|
||||
data structure is represented by the following structure:
|
||||
|
||||
```c
|
||||
typedef struct {
|
||||
size_t size;
|
||||
size_t capacity;
|
||||
size_t data_size;
|
||||
void *elements;
|
||||
} vector_t;
|
||||
```
|
||||
|
||||
where the `elements` variable represents the actual dynamic and generic array, the
|
||||
`data_size` variable indicates the size (in bytes) of the data type while the `size`
|
||||
and the `capacity` represent the number of store elements and the total size of
|
||||
the structure, respectively. The dynamic array copies the values upon insertion,
|
||||
thus **it owns the data** and is therefore responsible for its allocation and its
|
||||
deletion.
|
||||
|
||||
At the time being, `Vector` supports the following methods:
|
||||
|
||||
- `vector_result_t vector_new(size, data_size)`: create a new vector;
|
||||
- `vector_result_t vector_push(vector, value)`: add a new value to the vector;
|
||||
- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists;
|
||||
- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists;
|
||||
- `map_result_t vector_sort(map, cmp)`: sort array using `cmp` function;
|
||||
- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy;
|
||||
- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes
|
||||
will overwrite the memory;
|
||||
- `vector_result_t vector_destroy(vector)`: delete the vector;
|
||||
- `size_t vector_size(vector)`: return vector size (i.e., the number of elements);
|
||||
- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size).
|
||||
|
||||
As you can see by the previous function signatures, most methods that operate
|
||||
on the `Vector` data type return a custom type called `vector_result_t` which is
|
||||
defined as follows:
|
||||
|
||||
```c
|
||||
typedef enum {
|
||||
VECTOR_OK = 0x0,
|
||||
VECTOR_ERR_ALLOCATE,
|
||||
VECTOR_ERR_OVERFLOW,
|
||||
VECTOR_ERR_UNDERFLOW,
|
||||
VECTOR_ERR_INVALID
|
||||
} vector_status_t;
|
||||
|
||||
typedef struct {
|
||||
vector_status_t status;
|
||||
uint8_t message[RESULT_MSG_SIZE];
|
||||
union {
|
||||
vector_t *vector;
|
||||
void *element;
|
||||
} value;
|
||||
} vector_result_t;
|
||||
```
|
||||
|
||||
Each method that returns such type indicates whether the operation was successful or not
|
||||
by setting the `status` field and by providing a descriptive message on the `message`
|
||||
field. If the operation was successful (that is, `status == VECTOR_OK`), you can either
|
||||
move on with the rest of the program or read the returned value from the sum data type. Of course, you can choose to
|
||||
ignore the return value (if you're brave enough :D), as illustrated in the first part of the README.
|
||||
|
||||
The documentation for the `vector_sort(map, cmp)` method can be found
|
||||
in [the following document](/docs/sort.md).
|
||||
Reference in New Issue
Block a user