diff --git a/README.md b/README.md index 548305b..bc913b1 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,30 @@ -# Datum [![](https://github.com/ceticamarco/datum/actions/workflows/datum.yml/badge.svg)](https://github.com/ceticamarco/datum/actions/workflows/datum.yml) +
+

Datum

+
Collection of dynamic and generic data structures.
+ +[![](https://github.com/ceticamarco/datum/actions/workflows/datum.yml/badge.svg)](https://github.com/ceticamarco/datum/actions/workflows/datum.yml) +
+ Datum is a collection of dynamic and generic data structures implemented from scratch in C with no external dependencies beyond the standard library. It currently features: -- **Vector**: a growable, contiguous array supporting homogenous data types (both primitives and user-defined types); +- **Vector**: a growable, contiguous array of homogenous generic data types; - **Map**: an associative array that handles generic heterogenous data types; -To learn more about the memory model of this library as well as the technical details -on how to use it efficiently and safely, be sure to read [the design manual](docs/manual.pdf). - ## Usage At its simplest, you can use this library as follows: -### `Vector` +### `Vector`'s usage ```c #include #include "src/vector.h" /* -* Compile with: gcc main.c src/vector.c -* Output: First element: 5 -* Head of vector 6, size is now: 1 -*/ + * Compile with: gcc main.c src/vector.c + * Output: First element: 5 + * Head of vector 6, size is now: 1 + */ int main(void) { // Create an integer vector of initial capacity equal to 5 @@ -30,7 +33,8 @@ int main(void) { // Add two numbers int val = 5; vector_push(vec, &val); - vector_push(vec, &(int){6}); // Equivalent as above + // Equivalent as above + vector_push(vec, &(int){6}); // Print 1st element const int first = *(int*)vector_get(vec, 0).value.element; @@ -47,7 +51,7 @@ int main(void) { } ``` -### `Map` +### `Map`'s usage ```c #include @@ -60,16 +64,15 @@ typedef struct { } Person; /* -* Compile with: gcc main.c src/map.c -* Output: Name: Bob, Surname: Smith, Age: 34 -*/ + * Compile with: gcc main.c src/map.c + * Output: Name: Bob, Surname: Smith, Age: 34 + */ int main(void) { // Create a new map map_t *map = map_new().value.map; - const Person bob = { .name = "Bob", .surname = "Smith", .age = 34 }; - // Add a key to the map + const Person bob = { .name = "Bob", .surname = "Smith", .age = 34 }; map_add(map, "bob", (void*)&bob); // Retrieve 'Bob' and check if it exists @@ -77,8 +80,12 @@ int main(void) { if (bob_res.status == MAP_ERR_NOT_FOUND) { puts("This key does not exist."); } else { - const Person *retr = (const Person*)bob_res.value.element; - printf("Name: %s, Surname: %s, Age: %d\n", retr->name, retr->surname, retr->age); + const Person *ret = (const Person*)bob_res.value.element; + printf("Name: %s, Surname: %s, Age: %d\n", + ret->name, + ret->surname, + ret->age + ); } // Remove map from memory @@ -98,21 +105,142 @@ $ make clean all This will compile the library as well as the `usage.c` file and the unit tests. After that, you can run it by typing `./usage`. ## Technical Details -As stated earlier, refer to [the design manual](docs/manual.pdf) for a comprehensive documentation of this library. Below, there's a quick +In this section, you can find a quick overview of the technical aspects (internal design, memory layout, etc.) of this library as well as an overview about the design choices behind Datum. While both structures use `void*` to represent values, the way they manage memory is orthogonally different from one another. Let's start with the `Map` data type. +### Map `Map` is an hash table implementation that uses open addressing with linear probing for collision resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) as its hashing function. Resizing is performed automatically -by doubling the capacity when load factor exceeds 75%. The keys are **copied** by the hashmap. This means that the hashmap **owns** them and is responsible -to manage their memory. Values, on the other hand, **are stored as pointers**. This means that the hashmap **does NOT own** them and the caller is responsible -to manage their memory; this includes: allocate enough memory for them, ensure that the pointers remain valid for their whole lifecycle on the map, +by doubling the capacity when load factor exceeds 75%. Internally, this data structure is represented +by the following structures: + +```c +typedef struct { + char *key; + void *value; + element_state_t state; +} map_element_t; + +typedef struct { + map_element_t *elements; + size_t capacity; + size_t size; + size_t tombstone_count; +} map_t; +``` +where the `key` represent a string used to index the `value`. The state, instead, indicates +whether the entry is empty, occupied or deleted and is primarily used by the garbage collector +for internal memory management. An array of `map_element_t` as well as variables indicating +the capacity, the current size and the tombstone count (that is, the number of deleted entries) +forms a `map_t` data type. + +The keys are **copied** by the hashmap. This means that the hashmap **owns** them and is responsible +to manage their memory. Values, on the other hand, **are stored as pointers**. This means that the hashmap **does NOT own them** and that the caller is responsible +for managing their memory; this includes: allocate enough memory for them, ensure that the pointers remain valid for their whole lifecycle on the map, delete old values when updating a key and, if the values were heap-allocated, free them before removing them or before destroying the map. -`Vector`, instead, is a dynamic array with generic data type support. This means that you can store any kind of homogenous value on the data structure. As in the `Map`'s case, -resizing is performed automatically by increasing the capacity by 1.5 times when the array is full. The dynamic array copies the values upon insertion, thus it is responsible +The `Map` data structures supports the following methods: + +- `map_result_t map_new()`: initialize a new map; +- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map; +- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists; +- `map_result_t map_remove(map, key)`: remove a key from the map if it exists; +- `map_result_t map_clear(map)`: reset the map state; +- `map_result_t map_destroy(map)`: delete the map; +- `size_t map_size(map)`: returns map size (i.e., the number of elements); +- `size_t map_capacity(map)`: returns map capacity (i.e., map total size). + +As you can see, most methods that operates on the `Map` data type return a custom type called `map_result_t` which is defined as follows: + +```c +typedef enum { + MAP_OK = 0x0, + MAP_ERR_ALLOCATE, + MAP_ERR_INVALID, + MAP_ERR_NOT_FOUND +} map_status_t; + +typedef struct { + map_status_t status; + uint8_t message[RESULT_MSG_SIZE]; + union { + map_t *map; + void *element; + } value; +} map_result_t; +``` + +Each method that returns a `map_result_t` indicates whether the operation was successful or not by setting the `status` field and by providing a descriptive message on the `message` field. +If the operation was successful (that is, `status == MAP_OK`), you can either move on with the flow +of the program or read the returned +value from the sum data type. Of course, +you can choose to ignore the return value (if you're brave enough :D), as illustrated in the first example of this document. + +### Vector +`Vector` is a dynamic array with generic data type support, this means that you can store any kind of homogenous value on this data structure. As in the `Map`'s case, +resizing is performed automatically by increasing the capacity by 1.5 times when the array is full. Internally, this data structure is represented as follows: + +```c +typedef struct { + size_t count; + size_t capacity; + size_t data_size; + void *elements; +} vector_t; +``` + +where the `elements` represents the actual dynamic and generic array, the `data_size` +variable indicates the size (in bytes) of the data type while the count and +the capacity represent the number of stored elements and the total +size of the structure, respectively. The dynamic array copies the values upon +insertion, thus **it owns the data** and is therefore responsible for their +allocation and their deletion. + +The dynamic array copies the values upon insertion, thus it is responsible for their allocation and their deletion. +The `Vector` data structure supports the following methods: + +- `vector_result_t vector_new(size, data_size)`: create a new vector; +- `vector_result_t vector_push(vector, value)`: add a new value to the vector; +- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists; +- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists; +- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy; +- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes +will overwrite the memory; +- `vector_result_t vector_destroy(vector)`: delete the vector; +- `size_t vector_size(vector)`: return vector size (i.e., the number of elements); +- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size). + +As you can see, most methods that operates on the `Vector` data type return a custom type called +`vector_result_t` which is defined as follows: + +```c +typedef enum { + VECTOR_OK = 0x0, + VECTOR_ERR_ALLOCATE, + VECTOR_ERR_OVERFLOW, + VECTOR_ERR_UNDERFLOW, + VECTOR_ERR_INVALID +} vector_status_t; + +typedef struct { + vector_status_t status; + uint8_t message[RESULT_MSG_SIZE]; + union { + vector_t *vector; + void *element; + } value; +} vector_result_t; +``` + +Each method that returns such type indicates whether the operation was successful or not by +setting the `status` field and by providing a descriptive message on the `message` field. +Just like for the `Map` data structure, if the operation was successful +(that is, `status == VECTOR_OK`), you can either move on with the rest of the program +or read the returned value from the sum data type. + ## Unit tests Datum provides some unit tests for both the `Vector` and the `Map` data types. To run them, you can issue the following commands: @@ -124,4 +252,4 @@ $ ./test_map ## License This library is released under the GPLv3 license. You can find a copy of the license with this repository or by visiting -[the following link](https://choosealicense.com/licenses/gpl-3.0/). \ No newline at end of file +[the following link](https://choosealicense.com/licenses/gpl-3.0/).