From 1293006eba699cefec7014e439f3366a3fc3598e Mon Sep 17 00:00:00 2001 From: Marco Cetica Date: Mon, 10 Nov 2025 10:49:23 +0100 Subject: [PATCH] Updated documentation --- README.md | 320 ++----------------------------------------------- docs/README.md | 11 ++ docs/map.md | 75 ++++++++++++ docs/sort.md | 173 ++++++++++++++++++++++++++ docs/vector.md | 70 +++++++++++ 5 files changed, 336 insertions(+), 313 deletions(-) create mode 100644 docs/README.md create mode 100644 docs/map.md create mode 100644 docs/sort.md create mode 100644 docs/vector.md diff --git a/README.md b/README.md index 5193b4f..303eb64 100644 --- a/README.md +++ b/README.md @@ -8,8 +8,8 @@ Datum is a collection of dynamic and generic data structures implemented from scratch in C with no external dependencies beyond the standard library. It currently features: -- **Vector**: a growable, contiguous array of homogenous generic data types; -- **Map**: an associative array that handles generic heterogenous data types; +- [**Vector**](/docs/vector.md): a growable, contiguous array of homogenous generic data types; +- [**Map**](/docs/map.md): an associative array that handles generic heterogenous data types; ## Usage At its simplest, you can use this library as follows: @@ -23,7 +23,7 @@ At its simplest, you can use this library as follows: /* * Compile with: gcc main.c src/vector.c * Output: First element: 5 - * Head of vector 6, size is now: 1 + * Head of vector: 6, size is now: 1 */ int main(void) { @@ -42,7 +42,7 @@ int main(void) { // Pop second element using LIFO policy const int head = *(int*)vector_pop(vec).value.element; - printf("Head of vector %d, size is now: %zu\n", head, vector_size(vec)); + printf("Head of vector: %d, size is now: %zu\n", head, vector_size(vec)); // Remove vector from memory vector_destroy(vec); @@ -104,315 +104,9 @@ $ make clean all This will compile the library as well as the `usage.c` file and the unit tests. After that, you can run it by typing `./usage`. -## Technical Details -In this section, you can find a quick overview of the technical aspects (internal design, memory layout, etc.) of this library as well as an -overview about the design choices behind Datum. While both structures use `void*` to represent values, the way they manage memory is orthogonally different -from one another. Let's start with the `Map` data type. - -### Map -`Map` is an hash table implementation that uses open addressing with linear probing for collision resolution and the -[FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) as its hashing function. Resizing is performed automatically -by doubling the capacity when load factor exceeds 75%. Internally, this data structure is represented -by the following structures: - -```c -typedef struct { - char *key; - void *value; - element_state_t state; -} map_element_t; - -typedef struct { - map_element_t *elements; - size_t capacity; - size_t size; - size_t tombstone_count; -} map_t; -``` -where the `key` represent a string used to index the `value`. The state, instead, indicates -whether the entry is empty, occupied or deleted and is primarily used by the garbage collector -for internal memory management. An array of `map_element_t` as well as variables indicating -the capacity, the current size and the tombstone count (that is, the number of deleted entries) -forms a `map_t` data type. - -The keys are **copied** by the hashmap. This means that the hashmap **owns** them and is responsible -to manage their memory. Values, on the other hand, **are stored as pointers**. This means that the hashmap **does NOT own them** and that the caller is responsible -for managing their memory; this includes: allocate enough memory for them, ensure that the pointers remain valid for their whole lifecycle on the map, -delete old values when updating a key and, if the values were heap-allocated, free them before removing them or before destroying the map. - -The `Map` data structures supports the following methods: - -- `map_result_t map_new()`: initialize a new map; -- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map; -- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists; -- `map_result_t map_remove(map, key)`: remove a key from the map if it exists; -- `map_result_t map_clear(map)`: reset the map state; -- `map_result_t map_destroy(map)`: delete the map; -- `size_t map_size(map)`: returns map size (i.e., the number of elements); -- `size_t map_capacity(map)`: returns map capacity (i.e., map total size). - -As you can see, most methods that operates on the `Map` data type return a custom type called `map_result_t` which is defined as follows: - -```c -typedef enum { - MAP_OK = 0x0, - MAP_ERR_ALLOCATE, - MAP_ERR_INVALID, - MAP_ERR_NOT_FOUND -} map_status_t; - -typedef struct { - map_status_t status; - uint8_t message[RESULT_MSG_SIZE]; - union { - map_t *map; - void *element; - } value; -} map_result_t; -``` - -Each method that returns a `map_result_t` indicates whether the operation was successful or not by setting the `status` field and by providing a descriptive message on the `message` field. -If the operation was successful (that is, `status == MAP_OK`), you can either move on with the flow -of the program or read the returned -value from the sum data type. Of course, -you can choose to ignore the return value (if you're brave enough :D), as illustrated in the first example of this document. - -### Vector -`Vector` is a dynamic array with generic data type support, this means that you can store any kind of homogenous value on this data structure. As in the `Map`'s case, -resizing is performed automatically by increasing the capacity by 1.5 times when the array is full. Internally, this data structure is represented as follows: - -```c -typedef struct { - size_t count; - size_t capacity; - size_t data_size; - void *elements; -} vector_t; -``` - -where the `elements` represents the actual dynamic and generic array, the `data_size` -variable indicates the size (in bytes) of the data type while the count and -the capacity represent the number of stored elements and the total -size of the structure, respectively. The dynamic array copies the values upon -insertion, thus **it owns the data** and is therefore responsible for their -allocation and their deletion. - -The dynamic array copies the values upon insertion, thus it is responsible -for their allocation and their deletion. - -The `Vector` data structure supports the following methods: - -- `vector_result_t vector_new(size, data_size)`: create a new vector; -- `vector_result_t vector_push(vector, value)`: add a new value to the vector; -- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists; -- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists; -- `map_result_t vector_sort(map, cmp)`: sort array using `cmp` function; -- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy; -- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes -will overwrite the memory; -- `vector_result_t vector_destroy(vector)`: delete the vector; -- `size_t vector_size(vector)`: return vector size (i.e., the number of elements); -- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size). - -As you can see, most methods that operates on the `Vector` data type return a custom type called -`vector_result_t` which is defined as follows: - -```c -typedef enum { - VECTOR_OK = 0x0, - VECTOR_ERR_ALLOCATE, - VECTOR_ERR_OVERFLOW, - VECTOR_ERR_UNDERFLOW, - VECTOR_ERR_INVALID -} vector_status_t; - -typedef struct { - vector_status_t status; - uint8_t message[RESULT_MSG_SIZE]; - union { - vector_t *vector; - void *element; - } value; -} vector_result_t; -``` - -Each method that returns such type indicates whether the operation was successful or not by -setting the `status` field and by providing a descriptive message on the `message` field. -Just like for the `Map` data structure, if the operation was successful -(that is, `status == VECTOR_OK`), you can either move on with the rest of the program -or read the returned value from the sum data type. - -## Sorting -The `Vector` data structure provides an efficient in-place sorting method called `vector_sort` -which uses a builtin [Quicksort](https://en.wikipedia.org/wiki/Quicksort) implementation. This -function requires an user-defined comparison procedure as its second parameter, which allows -the caller to customize the sorting behavior. It must adhere to the following specification: - -1. Must return `vector_order_t`, which is defined as follows: - -```c -typedef enum { - VECTOR_ORDER_LT = 0x0, // First element should come before the second - VECTOR_ORDER_EQ, // The two elements are equivalent - VECTOR_ORDER_GT // First element should come after the second -} vector_order_t; -``` - -and indicates the ordering relationship between any two elements. - -2. Must accept two `const void*` parameters representing the two elements to compare; -3. Must be self-contained and handle all its own resources. - -Let's look at some examples; for instance, let's sort an integer array in ascending and -descending order: - -```c -#include -#include "src/vector.h" - -vector_order_t cmp_int_asc(const void *x, const void *y) { - int x_int = *(const int*)x; - int y_int = *(const int*)y; - - if (x_int < y_int) return VECTOR_ORDER_LT; - if (x_int > y_int) return VECTOR_ORDER_GT; - - return VECTOR_ORDER_EQ; -} - -vector_order_t cmp_int_desc(const void *x, const void *y) { - return cmp_int_asc(y, x); -} - -/* - * Compile with: gcc main.c src/vector.h - * Output: Before sorting: -8 20 -10 125 34 9 - * After sorting (ascending order): -10 -8 9 20 34 125 - * After sorting (descending order): 125 34 20 9 -8 -10 - */ -int main(void) { - vector_t *v = vector_new(5, sizeof(int)).value.vector; - - int values[] = { -8, 20, -10, 125, 34, 9 }; - for (size_t idx = 0; idx < 6; idx++) { - vector_push(v, &values[idx]); - } - - // Print unsorted array - printf("Before sorting: "); - for (size_t idx = 0; idx < vector_size(v); idx++) { - printf("%d ", *(int*)vector_get(v, idx).value.element); - } - - // Sort array in ascending order - vector_sort(v, cmp_int_asc); - - // Print sorted array - printf("\nAfter sorting (ascending order): "); - for (size_t idx = 0; idx < vector_size(v); idx++) { - printf("%d ", *(int*)vector_get(v, idx).value.element); - } - - // Sort array in descending order - vector_sort(v, cmp_int_desc); - - // Print sorted array - printf("\nAfter sorting (descending order): "); - for (size_t idx = 0; idx < vector_size(v); idx++) { - printf("%d ", *(int*)vector_get(v, idx).value.element); - } - - printf("\n"); - - vector_destroy(v); - - return 0; -} -``` - -Obviously, you can use the `vector_sort` method on custom data types as well. For instance, let's suppose that you have a -struct representing employees and you want to sort them based on their age and based on their name (lexicographic sort): - -```c -#include -#include -#include "src/vector.h" - -typedef struct { - char name[256]; - int age; -} Employee; - -vector_order_t cmp_person_by_age(const void *x, const void *y) { - const Employee *x_person = (const Employee*)x; - const Employee *y_person = (const Employee*)y; - - if (x_person->age < y_person->age) return VECTOR_ORDER_LT; - if (x_person->age > y_person->age) return VECTOR_ORDER_GT; - - return VECTOR_ORDER_EQ; -} - -vector_order_t cmp_person_by_name(const void *x, const void *y) { - const Employee *x_person = (const Employee*)x; - const Employee *y_person = (const Employee*)y; - - const int result = strcmp(x_person->name, y_person->name); - - if(result < 0) return VECTOR_ORDER_LT; - if(result > 0) return VECTOR_ORDER_GT; - - return VECTOR_ORDER_EQ; -} - -/* - * Compile with: gcc main.c src/vector.h - * Output: Sort by age: - * Name: Marco, Age: 25 - * Name: Alice, Age: 28 - * Name: Bob, Age: 45 - * - * Sort by name: - * Name: Alice, Age: 28 - * Name: Bob, Age: 45 - * Name: Marco, Age: 25 - */ -int main(void) { - vector_t *employees = vector_new(5, sizeof(Employee)).value.vector; - - Employee e1 = { .name = "Bob", .age = 45 }; - Employee e2 = { .name = "Alice", .age = 28 }; - Employee e3 = { .name = "Marco", .age = 25 }; - - vector_push(employees, &e1); - vector_push(employees, &e2); - vector_push(employees, &e3); - - // Sort array by age - vector_sort(employees, cmp_person_by_age); - - // Print sorted array - printf("Sort by age:\n"); - for (size_t idx = 0; idx < vector_size(employees); idx++) { - Employee *p = (Employee*)vector_get(employees, idx).value.element; - printf("Name: %s, Age: %d\n", p->name, p->age); - } - - // Sort array by name - vector_sort(employees, cmp_person_by_name); - - // Print sorted array - printf("\nSort by name:\n"); - for (size_t idx = 0; idx < vector_size(employees); idx++) { - Employee *p = (Employee*)vector_get(employees, idx).value.element; - printf("Name: %s, Age: %d\n", p->name, p->age); - } - - vector_destroy(employees); - - return 0; -} -``` +## Documentation +For additional details about this library (internal design, memory +management, data ownership, etc.) go to the `docs/` folder. ## Unit tests Datum provides some unit tests for both the `Vector` and the `Map` data types. To run them, you can issue the following commands: diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..6483877 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,11 @@ +# Documentation +In this folder you can find the technical documentation of the +`Datum` library as well as practical details on how to use it +efficiently and safely. + +At the time being, this documentation includes the following pages: + +- [vector.md](vector.md): vector documentation; +- [map.md](map.md): map documentation; +- [sort.md](sort.md): how to use the `vector_sort` method. + diff --git a/docs/map.md b/docs/map.md new file mode 100644 index 0000000..f061d10 --- /dev/null +++ b/docs/map.md @@ -0,0 +1,75 @@ +# Map Technical Details +In this document you can find a quick overview of the technical +aspects (internal design, memory layout, etc.) of the `Map` data structure. + +`Map` is an hash table that uses open addressing with linear probing for collision +resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function) as its hashing function. Resizing is performed +automatically by doubling the capacity when the load factor exceeds 75%. Internally, +this data structure is represented by the following two structures: + +```c +typedef struct { + char *key; + void *value; + element_state_t state; +} map_element_t; + +typedef struct { + map_element_t *elements; + size_t capacity; + size_t size; + size_t tombstone_count; +} map_t; +``` + +where the `key` variable represent a string used to index the `value`. The `state`, instead, indicates whether the entry is empty, occupied or deleted and is primarily used +by the garbage collector for internal memory management. An array of `map_element_t`, +with the variables indicating the *capacity*, the *current size* and +the *tombstone count* (that is, the number of delete entries), form a `map_t` data type. + +The keys are **copied** by the hashmap; this means that it **owns** them and is therefore +responsible for managing their memory. Values, on the other hand, +**are stored as pointers**. This means that the hashmap **does NOT own them** and that +the caller is responsible for managing their memory; this includes: allocate +enough memory for them, ensure that the pointers remain valid for their whole lifecycle +on the map, delete old values when updating a key and, if the values were heap-allocated, +free them before removing the keys or destroying the map. + +The `Map` data structure supports the following methods: + +- `map_result_t map_new()`: initialize a new map; +- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map; +- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists; +- `map_result_t map_remove(map, key)`: remove a key from the map if it exists; +- `map_result_t map_clear(map)`: reset the map state; +- `map_result_t map_destroy(map)`: delete the map; +- `size_t map_size(map)`: returns map size (i.e., the number of elements); +- `size_t map_capacity(map)`: returns map capacity (i.e., map total size). + +As you can see by the previous function signatures, most methods that operate +on the `Map` data type return a custom type called `map_result_t` which is +defined as follows: + +```c +typedef enum { + MAP_OK = 0x0, + MAP_ERR_ALLOCATE, + MAP_ERR_INVALID, + MAP_ERR_NOT_FOUND +} map_status_t; + +typedef struct { + map_status_t status; + uint8_t message[RESULT_MSG_SIZE]; + union { + map_t *map; + void *element; + } value; +} map_result_t; +``` + +Each method that returns such type indicates whether the operation was successful or not by setting +the `status` field and by providing a descriptive message on the `message` field. If the operation was +successful (that is, `status == MAP_OK`), you can either move on with the rest of the program or read +the returned value from the sum data type. Of course, you can choose to ignore the return value (if you're brave enough :D), as illustrated +in the first part of the README. \ No newline at end of file diff --git a/docs/sort.md b/docs/sort.md new file mode 100644 index 0000000..a76c3c3 --- /dev/null +++ b/docs/sort.md @@ -0,0 +1,173 @@ +# Sorting +As indicated in the [its documentation](/docs/vector.md), the `Vector` data type +provides an efficient in-place sorting function called `vector_sort` that uses +a builtin implementation of the [Quicksort algorithm](https://en.wikipedia.org/wiki/Quicksort). This method requires an user-defined comparison procedure which allows the +caller to customize the sorting behavior. The comparison procedure must adhere to the +following specification: + +1. Must return `vector_order_t`, which is defined as follows: + +```c +typedef enum { + VECTOR_ORDER_LT = 0x0, // First element should come before the second + VECTOR_ORDER_EQ, // The two elements are equivalent + VECTOR_ORDER_GT // First element should come after the second +} vector_order_t; +``` + +and indicates the ordering relationship between any two elements. + +2. Must accept two `const void*` parameters representing two elements to compare; +3. Must be self-contained and handle all its own resources. + +Let's look at some examples. For instance, let's say that we want to sort an array +of integers in ascending and descending order: + +```c +#include +#include "src/vector.h" + +vector_order_t cmp_int_asc(const void *x, const void *y) { + int x_int = *(const int*)x; + int y_int = *(const int*)y; + + if (x_int < y_int) return VECTOR_ORDER_LT; + if (x_int > y_int) return VECTOR_ORDER_GT; + + return VECTOR_ORDER_EQ; +} + +vector_order_t cmp_int_desc(const void *x, const void *y) { + return cmp_int_asc(y, x); +} + +/* + * Compile with: gcc main.c src/vector.h + * Output: Before sorting: -8 20 -10 125 34 9 + * After sorting (ascending order): -10 -8 9 20 34 125 + * After sorting (descending order): 125 34 20 9 -8 -10 + */ +int main(void) { + vector_t *v = vector_new(5, sizeof(int)).value.vector; + + int values[] = { -8, 20, -10, 125, 34, 9 }; + for (size_t idx = 0; idx < 6; idx++) { + vector_push(v, &values[idx]); + } + + // Print unsorted array + printf("Before sorting: "); + for (size_t idx = 0; idx < vector_size(v); idx++) { + printf("%d ", *(int*)vector_get(v, idx).value.element); + } + + // Sort array in ascending order + vector_sort(v, cmp_int_asc); + + // Print sorted array + printf("\nAfter sorting (ascending order): "); + for (size_t idx = 0; idx < vector_size(v); idx++) { + printf("%d ", *(int*)vector_get(v, idx).value.element); + } + + // Sort array in descending order + vector_sort(v, cmp_int_desc); + + // Print sorted array + printf("\nAfter sorting (descending order): "); + for (size_t idx = 0; idx < vector_size(v); idx++) { + printf("%d ", *(int*)vector_get(v, idx).value.element); + } + + printf("\n"); + + vector_destroy(v); + + return 0; +} +``` + +Obviously, you can use the `vector_sort` method on custom data type as well. +For instance, let's suppose that you have a structure representing the employees of +a company and you wish to sort them based on their age and on their name (lexicographic sort): + +```c +#include +#include +#include "src/vector.h" + +typedef struct { + char name[256]; + int age; +} Employee; + +vector_order_t cmp_person_by_age(const void *x, const void *y) { + const Employee *x_person = (const Employee*)x; + const Employee *y_person = (const Employee*)y; + + if (x_person->age < y_person->age) return VECTOR_ORDER_LT; + if (x_person->age > y_person->age) return VECTOR_ORDER_GT; + + return VECTOR_ORDER_EQ; +} + +vector_order_t cmp_person_by_name(const void *x, const void *y) { + const Employee *x_person = (const Employee*)x; + const Employee *y_person = (const Employee*)y; + + const int result = strcmp(x_person->name, y_person->name); + + if(result < 0) return VECTOR_ORDER_LT; + if(result > 0) return VECTOR_ORDER_GT; + + return VECTOR_ORDER_EQ; +} + +/* + * Compile with: gcc main.c src/vector.h + * Output: Sort by age: + * Name: Marco, Age: 25 + * Name: Alice, Age: 28 + * Name: Bob, Age: 45 + * + * Sort by name: + * Name: Alice, Age: 28 + * Name: Bob, Age: 45 + * Name: Marco, Age: 25 + */ +int main(void) { + vector_t *employees = vector_new(5, sizeof(Employee)).value.vector; + + Employee e1 = { .name = "Bob", .age = 45 }; + Employee e2 = { .name = "Alice", .age = 28 }; + Employee e3 = { .name = "Marco", .age = 25 }; + + vector_push(employees, &e1); + vector_push(employees, &e2); + vector_push(employees, &e3); + + // Sort array by age + vector_sort(employees, cmp_person_by_age); + + // Print sorted array + printf("Sort by age:\n"); + for (size_t idx = 0; idx < vector_size(employees); idx++) { + Employee *p = (Employee*)vector_get(employees, idx).value.element; + printf("Name: %s, Age: %d\n", p->name, p->age); + } + + // Sort array by name + vector_sort(employees, cmp_person_by_name); + + // Print sorted array + printf("\nSort by name:\n"); + for (size_t idx = 0; idx < vector_size(employees); idx++) { + Employee *p = (Employee*)vector_get(employees, idx).value.element; + printf("Name: %s, Age: %d\n", p->name, p->age); + } + + vector_destroy(employees); + + return 0; +} +``` \ No newline at end of file diff --git a/docs/vector.md b/docs/vector.md new file mode 100644 index 0000000..477b97a --- /dev/null +++ b/docs/vector.md @@ -0,0 +1,70 @@ +# Vector Technical Details +In this document you can find a quick overview of the technical +aspects (internal design, memory layout, etc.) of the `Vector` data structure. + +`Vector` is a dynamic array with generic data type support; this means that you can store +any kind of homogenous value on this data structure. Resizing is performed automatically +by increasing the capacity by 1.5 times when the array becomes full. Internally, this +data structure is represented by the following structure: + +```c +typedef struct { + size_t size; + size_t capacity; + size_t data_size; + void *elements; +} vector_t; +``` + +where the `elements` variable represents the actual dynamic and generic array, the +`data_size` variable indicates the size (in bytes) of the data type while the `size` +and the `capacity` represent the number of store elements and the total size of +the structure, respectively. The dynamic array copies the values upon insertion, +thus **it owns the data** and is therefore responsible for its allocation and its +deletion. + +At the time being, `Vector` supports the following methods: + +- `vector_result_t vector_new(size, data_size)`: create a new vector; +- `vector_result_t vector_push(vector, value)`: add a new value to the vector; +- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists; +- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists; +- `map_result_t vector_sort(map, cmp)`: sort array using `cmp` function; +- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy; +- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes +will overwrite the memory; +- `vector_result_t vector_destroy(vector)`: delete the vector; +- `size_t vector_size(vector)`: return vector size (i.e., the number of elements); +- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size). + +As you can see by the previous function signatures, most methods that operate +on the `Vector` data type return a custom type called `vector_result_t` which is +defined as follows: + +```c +typedef enum { + VECTOR_OK = 0x0, + VECTOR_ERR_ALLOCATE, + VECTOR_ERR_OVERFLOW, + VECTOR_ERR_UNDERFLOW, + VECTOR_ERR_INVALID +} vector_status_t; + +typedef struct { + vector_status_t status; + uint8_t message[RESULT_MSG_SIZE]; + union { + vector_t *vector; + void *element; + } value; +} vector_result_t; +``` + +Each method that returns such type indicates whether the operation was successful or not +by setting the `status` field and by providing a descriptive message on the `message` +field. If the operation was successful (that is, `status == VECTOR_OK`), you can either +move on with the rest of the program or read the returned value from the sum data type. Of course, you can choose to +ignore the return value (if you're brave enough :D), as illustrated in the first part of the README. + +The documentation for the `vector_sort(map, cmp)` method can be found +in [the following document](/docs/sort.md). \ No newline at end of file