Updated documentation

2025-11-10 10:49:23 +01:00
parent 1589a7d84f
commit 1293006eba
5 changed files with 336 additions and 313 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,11 @@
+# Documentation
+In this folder you can find the technical documentation of the
+`Datum` library as well as practical details on how to use it
+efficiently and safely.
+
+At the time being, this documentation includes the following pages:
+
+- [vector.md](vector.md): vector documentation;  
+- [map.md](map.md): map documentation;   
+- [sort.md](sort.md): how to use the `vector_sort` method.
+
--- a/docs/map.md
+++ b/docs/map.md
@@ -0,0 +1,75 @@
+# Map Technical Details
+In this document you can find a quick overview of the technical
+aspects (internal design, memory layout, etc.) of the `Map` data structure. 
+
+`Map` is an hash table that uses open addressing with linear probing for collision
+resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function) as its hashing function. Resizing is performed
+automatically by doubling the capacity when the load factor exceeds 75%. Internally,
+this data structure is represented by the following two structures:
+
+```c
+typedef struct {
+    char *key;
+    void *value;
+    element_state_t state;
+} map_element_t;
+
+typedef struct {
+    map_element_t *elements;
+    size_t capacity;
+    size_t size;
+    size_t tombstone_count;
+} map_t;
+```
+
+where the `key` variable represent a string used to index the `value`. The `state`, instead, indicates whether the entry is empty, occupied or deleted and is primarily used
+by the garbage collector for internal memory management. An array of `map_element_t`,
+with the variables indicating the *capacity*, the *current size* and
+the *tombstone count* (that is, the number of delete entries), form a `map_t` data type.
+
+The keys are **copied** by the hashmap; this means that it **owns** them and is therefore
+responsible for managing their memory. Values, on the other hand, 
+**are stored as pointers**. This means that the hashmap **does NOT own them** and that
+the caller is responsible for managing their memory; this includes: allocate
+enough memory for them, ensure that the pointers remain valid for their whole lifecycle
+on the map, delete old values when updating a key and, if the values were heap-allocated,
+free them before removing the keys or destroying the map.
+
+The `Map` data structure supports the following methods:
+
+- `map_result_t map_new()`: initialize a new map;  
+- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map;  
+- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists;  
+- `map_result_t map_remove(map, key)`: remove a key from the map if it exists;  
+- `map_result_t map_clear(map)`: reset the map state;  
+- `map_result_t map_destroy(map)`: delete the map;  
+- `size_t map_size(map)`: returns map size (i.e., the number of elements);  
+- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
+
+As you can see by the previous function signatures, most methods that operate
+on the `Map` data type return a custom type called `map_result_t` which is
+defined as follows:
+
+```c
+typedef enum {
+    MAP_OK = 0x0,
+    MAP_ERR_ALLOCATE,
+    MAP_ERR_INVALID,
+    MAP_ERR_NOT_FOUND
+} map_status_t;
+
+typedef struct {
+    map_status_t status;
+    uint8_t message[RESULT_MSG_SIZE];
+    union {
+        map_t *map;
+        void *element;
+    } value;
+} map_result_t;
+```
+
+Each method that returns such type indicates whether the operation was successful or not by setting
+the `status` field and by providing a descriptive message on the `message` field. If the operation was
+successful (that is, `status == MAP_OK`), you can either move on with the rest of the program or read
+the returned value from the sum data type. Of course, you can choose to ignore the return value (if you're brave enough :D), as illustrated
+in the first part of the README.
--- a/docs/sort.md
+++ b/docs/sort.md
@@ -0,0 +1,173 @@
+# Sorting
+As indicated in the [its documentation](/docs/vector.md), the `Vector` data type
+provides an efficient in-place sorting function called `vector_sort` that uses
+a builtin implementation of the [Quicksort algorithm](https://en.wikipedia.org/wiki/Quicksort). This method requires an user-defined comparison procedure which allows the
+caller to customize the sorting behavior. The comparison procedure must adhere to the
+following specification:
+
+1. Must return `vector_order_t`, which is defined as follows:
+
+```c
+typedef enum {
+    VECTOR_ORDER_LT = 0x0, // First element should come before the second
+    VECTOR_ORDER_EQ, // The two elements are equivalent
+    VECTOR_ORDER_GT // First element should come after the second
+} vector_order_t;
+```
+
+and indicates the ordering relationship between any two elements.
+
+2. Must accept two `const void*` parameters representing two elements to compare;  
+3. Must be self-contained and handle all its own resources.
+
+Let's look at some examples. For instance, let's say that we want to sort an array
+of integers in ascending and descending order:
+
+```c
+#include <stdio.h>
+#include "src/vector.h"
+
+vector_order_t cmp_int_asc(const void *x, const void *y) {
+    int x_int = *(const int*)x;
+    int y_int = *(const int*)y;
+
+    if (x_int < y_int) return VECTOR_ORDER_LT;
+    if (x_int > y_int) return VECTOR_ORDER_GT;
+
+    return VECTOR_ORDER_EQ;
+}
+
+vector_order_t cmp_int_desc(const void *x, const void *y) {
+    return cmp_int_asc(y, x);
+}
+
+/*
+ * Compile with: gcc main.c src/vector.h
+ * Output: Before sorting: -8 20 -10 125 34 9 
+ *         After sorting (ascending order): -10 -8 9 20 34 125 
+ *         After sorting (descending order): 125 34 20 9 -8 -10 
+ */
+int main(void) {
+    vector_t *v = vector_new(5, sizeof(int)).value.vector;
+
+    int values[] = { -8, 20, -10, 125, 34, 9 };
+    for (size_t idx = 0; idx < 6; idx++) {
+        vector_push(v, &values[idx]);
+    }
+
+    // Print unsorted array
+    printf("Before sorting: ");
+    for (size_t idx = 0; idx < vector_size(v); idx++) {
+        printf("%d ", *(int*)vector_get(v, idx).value.element);
+    }
+
+    // Sort array in ascending order
+    vector_sort(v, cmp_int_asc);
+
+    // Print sorted array
+    printf("\nAfter sorting (ascending order): ");
+    for (size_t idx = 0; idx < vector_size(v); idx++) {
+        printf("%d ", *(int*)vector_get(v, idx).value.element);
+    }
+
+    // Sort array in descending order
+    vector_sort(v, cmp_int_desc);
+
+    // Print sorted array
+    printf("\nAfter sorting (descending order): ");
+    for (size_t idx = 0; idx < vector_size(v); idx++) {
+        printf("%d ", *(int*)vector_get(v, idx).value.element);
+    }
+
+    printf("\n");
+
+    vector_destroy(v);
+
+    return 0;
+}
+```
+
+Obviously, you can use the `vector_sort` method on custom data type as well. 
+For instance, let's suppose that you have a structure representing the employees of
+a company and you wish to sort them based on their age and on their name (lexicographic sort):
+
+```c
+#include <stdio.h>
+#include <string.h>
+#include "src/vector.h"
+
+typedef struct {
+    char name[256];
+    int age;
+} Employee;
+
+vector_order_t cmp_person_by_age(const void *x, const void *y) {
+    const Employee *x_person = (const Employee*)x;
+    const Employee *y_person = (const Employee*)y;
+
+    if (x_person->age < y_person->age) return VECTOR_ORDER_LT;
+    if (x_person->age > y_person->age) return VECTOR_ORDER_GT;
+
+    return VECTOR_ORDER_EQ;
+}
+
+vector_order_t cmp_person_by_name(const void *x, const void *y) {
+    const Employee *x_person = (const Employee*)x;
+    const Employee *y_person = (const Employee*)y;
+
+    const int result = strcmp(x_person->name, y_person->name);
+
+    if(result < 0) return VECTOR_ORDER_LT;
+    if(result > 0) return VECTOR_ORDER_GT;
+    
+    return VECTOR_ORDER_EQ;
+}
+
+/*
+ * Compile with: gcc main.c src/vector.h
+ * Output: Sort by age:
+ *         Name: Marco, Age: 25
+ *         Name: Alice, Age: 28
+ *         Name: Bob, Age: 45
+ * 
+ *         Sort by name:
+ *         Name: Alice, Age: 28
+ *         Name: Bob, Age: 45
+ *         Name: Marco, Age: 25
+ */
+int main(void) {
+    vector_t *employees = vector_new(5, sizeof(Employee)).value.vector;
+
+    Employee e1 = { .name = "Bob", .age = 45 };
+    Employee e2 = { .name = "Alice", .age = 28 };
+    Employee e3 = { .name = "Marco", .age = 25 };
+    
+    vector_push(employees, &e1);
+    vector_push(employees, &e2);
+    vector_push(employees, &e3);
+
+    // Sort array by age
+    vector_sort(employees, cmp_person_by_age);
+
+    // Print sorted array
+    printf("Sort by age:\n");
+    for (size_t idx = 0; idx < vector_size(employees); idx++) {
+        Employee *p = (Employee*)vector_get(employees, idx).value.element;
+        printf("Name: %s, Age: %d\n", p->name, p->age);
+    }
+
+    // Sort array by name
+    vector_sort(employees, cmp_person_by_name);
+    
+    // Print sorted array
+    printf("\nSort by name:\n");
+    for (size_t idx = 0; idx < vector_size(employees); idx++) {
+        Employee *p = (Employee*)vector_get(employees, idx).value.element;
+        printf("Name: %s, Age: %d\n", p->name, p->age);
+    }
+
+    vector_destroy(employees);
+
+    return 0;
+}
+```
--- a/docs/vector.md
+++ b/docs/vector.md
@@ -0,0 +1,70 @@
+# Vector Technical Details
+In this document you can find a quick overview of the technical
+aspects (internal design, memory layout, etc.) of the `Vector` data structure. 
+
+`Vector` is a dynamic array with generic data type support; this means that you can store
+any kind of homogenous value on this data structure. Resizing is performed automatically
+by increasing the capacity by 1.5 times when the array becomes full. Internally, this 
+data structure is represented by the following structure:
+
+```c
+typedef struct {
+    size_t size;
+    size_t capacity;
+    size_t data_size;
+    void *elements;
+} vector_t;
+```
+
+where the `elements` variable represents the actual dynamic and generic array, the
+`data_size` variable indicates the size (in bytes) of the data type while the `size`
+and the `capacity` represent the number of store elements and the total size of
+the structure, respectively. The dynamic array copies the values upon insertion,
+thus **it owns the data** and is therefore responsible for its allocation and its
+deletion.
+
+At the time being, `Vector` supports the following methods:
+
+- `vector_result_t vector_new(size, data_size)`: create a new vector;  
+- `vector_result_t vector_push(vector, value)`: add a new value to the vector;  
+- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists;  
+- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists;  
+- `map_result_t vector_sort(map, cmp)`: sort array using `cmp` function;  
+- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy;  
+- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes
+will overwrite the memory;  
+- `vector_result_t vector_destroy(vector)`: delete the vector;  
+- `size_t vector_size(vector)`: return vector size (i.e., the number of elements);  
+- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size).
+
+As you can see by the previous function signatures, most methods that operate
+on the `Vector` data type return a custom type called `vector_result_t` which is
+defined as follows:
+
+```c
+typedef enum {
+    VECTOR_OK = 0x0,
+    VECTOR_ERR_ALLOCATE,
+    VECTOR_ERR_OVERFLOW,
+    VECTOR_ERR_UNDERFLOW,
+    VECTOR_ERR_INVALID
+} vector_status_t;
+
+typedef struct {
+    vector_status_t status;
+    uint8_t message[RESULT_MSG_SIZE];
+    union {
+        vector_t *vector;
+        void *element;
+    } value;
+} vector_result_t;
+```
+
+Each method that returns such type indicates whether the operation was successful or not
+by setting the `status` field and by providing a descriptive message on the `message`
+field. If the operation was successful (that is, `status == VECTOR_OK`), you can either
+move on with the rest of the program or read the returned value from the sum data type. Of course, you can choose to 
+ignore the return value (if you're brave enough :D), as illustrated in the first part of the README.
+
+The documentation for the `vector_sort(map, cmp)` method can be found
+in [the following document](/docs/sort.md).