Updated documentation

This commit is contained in:
2025-11-10 10:49:23 +01:00
parent 1589a7d84f
commit 1293006eba
5 changed files with 336 additions and 313 deletions

320
README.md
View File

@@ -8,8 +8,8 @@
Datum is a collection of dynamic and generic data structures implemented from scratch in C with no external dependencies beyond Datum is a collection of dynamic and generic data structures implemented from scratch in C with no external dependencies beyond
the standard library. It currently features: the standard library. It currently features:
- **Vector**: a growable, contiguous array of homogenous generic data types; - [**Vector**](/docs/vector.md): a growable, contiguous array of homogenous generic data types;
- **Map**: an associative array that handles generic heterogenous data types; - [**Map**](/docs/map.md): an associative array that handles generic heterogenous data types;
## Usage ## Usage
At its simplest, you can use this library as follows: At its simplest, you can use this library as follows:
@@ -23,7 +23,7 @@ At its simplest, you can use this library as follows:
/* /*
* Compile with: gcc main.c src/vector.c * Compile with: gcc main.c src/vector.c
* Output: First element: 5 * Output: First element: 5
* Head of vector 6, size is now: 1 * Head of vector: 6, size is now: 1
*/ */
int main(void) { int main(void) {
@@ -42,7 +42,7 @@ int main(void) {
// Pop second element using LIFO policy // Pop second element using LIFO policy
const int head = *(int*)vector_pop(vec).value.element; const int head = *(int*)vector_pop(vec).value.element;
printf("Head of vector %d, size is now: %zu\n", head, vector_size(vec)); printf("Head of vector: %d, size is now: %zu\n", head, vector_size(vec));
// Remove vector from memory // Remove vector from memory
vector_destroy(vec); vector_destroy(vec);
@@ -104,315 +104,9 @@ $ make clean all
This will compile the library as well as the `usage.c` file and the unit tests. After that, you can run it by typing `./usage`. This will compile the library as well as the `usage.c` file and the unit tests. After that, you can run it by typing `./usage`.
## Technical Details ## Documentation
In this section, you can find a quick overview of the technical aspects (internal design, memory layout, etc.) of this library as well as an For additional details about this library (internal design, memory
overview about the design choices behind Datum. While both structures use `void*` to represent values, the way they manage memory is orthogonally different management, data ownership, etc.) go to the `docs/` folder.
from one another. Let's start with the `Map` data type.
### Map
`Map` is an hash table implementation that uses open addressing with linear probing for collision resolution and the
[FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) as its hashing function. Resizing is performed automatically
by doubling the capacity when load factor exceeds 75%. Internally, this data structure is represented
by the following structures:
```c
typedef struct {
char *key;
void *value;
element_state_t state;
} map_element_t;
typedef struct {
map_element_t *elements;
size_t capacity;
size_t size;
size_t tombstone_count;
} map_t;
```
where the `key` represent a string used to index the `value`. The state, instead, indicates
whether the entry is empty, occupied or deleted and is primarily used by the garbage collector
for internal memory management. An array of `map_element_t` as well as variables indicating
the capacity, the current size and the tombstone count (that is, the number of deleted entries)
forms a `map_t` data type.
The keys are **copied** by the hashmap. This means that the hashmap **owns** them and is responsible
to manage their memory. Values, on the other hand, **are stored as pointers**. This means that the hashmap **does NOT own them** and that the caller is responsible
for managing their memory; this includes: allocate enough memory for them, ensure that the pointers remain valid for their whole lifecycle on the map,
delete old values when updating a key and, if the values were heap-allocated, free them before removing them or before destroying the map.
The `Map` data structures supports the following methods:
- `map_result_t map_new()`: initialize a new map;
- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map;
- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists;
- `map_result_t map_remove(map, key)`: remove a key from the map if it exists;
- `map_result_t map_clear(map)`: reset the map state;
- `map_result_t map_destroy(map)`: delete the map;
- `size_t map_size(map)`: returns map size (i.e., the number of elements);
- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
As you can see, most methods that operates on the `Map` data type return a custom type called `map_result_t` which is defined as follows:
```c
typedef enum {
MAP_OK = 0x0,
MAP_ERR_ALLOCATE,
MAP_ERR_INVALID,
MAP_ERR_NOT_FOUND
} map_status_t;
typedef struct {
map_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
map_t *map;
void *element;
} value;
} map_result_t;
```
Each method that returns a `map_result_t` indicates whether the operation was successful or not by setting the `status` field and by providing a descriptive message on the `message` field.
If the operation was successful (that is, `status == MAP_OK`), you can either move on with the flow
of the program or read the returned
value from the sum data type. Of course,
you can choose to ignore the return value (if you're brave enough :D), as illustrated in the first example of this document.
### Vector
`Vector` is a dynamic array with generic data type support, this means that you can store any kind of homogenous value on this data structure. As in the `Map`'s case,
resizing is performed automatically by increasing the capacity by 1.5 times when the array is full. Internally, this data structure is represented as follows:
```c
typedef struct {
size_t count;
size_t capacity;
size_t data_size;
void *elements;
} vector_t;
```
where the `elements` represents the actual dynamic and generic array, the `data_size`
variable indicates the size (in bytes) of the data type while the count and
the capacity represent the number of stored elements and the total
size of the structure, respectively. The dynamic array copies the values upon
insertion, thus **it owns the data** and is therefore responsible for their
allocation and their deletion.
The dynamic array copies the values upon insertion, thus it is responsible
for their allocation and their deletion.
The `Vector` data structure supports the following methods:
- `vector_result_t vector_new(size, data_size)`: create a new vector;
- `vector_result_t vector_push(vector, value)`: add a new value to the vector;
- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists;
- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists;
- `map_result_t vector_sort(map, cmp)`: sort array using `cmp` function;
- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy;
- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes
will overwrite the memory;
- `vector_result_t vector_destroy(vector)`: delete the vector;
- `size_t vector_size(vector)`: return vector size (i.e., the number of elements);
- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size).
As you can see, most methods that operates on the `Vector` data type return a custom type called
`vector_result_t` which is defined as follows:
```c
typedef enum {
VECTOR_OK = 0x0,
VECTOR_ERR_ALLOCATE,
VECTOR_ERR_OVERFLOW,
VECTOR_ERR_UNDERFLOW,
VECTOR_ERR_INVALID
} vector_status_t;
typedef struct {
vector_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
vector_t *vector;
void *element;
} value;
} vector_result_t;
```
Each method that returns such type indicates whether the operation was successful or not by
setting the `status` field and by providing a descriptive message on the `message` field.
Just like for the `Map` data structure, if the operation was successful
(that is, `status == VECTOR_OK`), you can either move on with the rest of the program
or read the returned value from the sum data type.
## Sorting
The `Vector` data structure provides an efficient in-place sorting method called `vector_sort`
which uses a builtin [Quicksort](https://en.wikipedia.org/wiki/Quicksort) implementation. This
function requires an user-defined comparison procedure as its second parameter, which allows
the caller to customize the sorting behavior. It must adhere to the following specification:
1. Must return `vector_order_t`, which is defined as follows:
```c
typedef enum {
VECTOR_ORDER_LT = 0x0, // First element should come before the second
VECTOR_ORDER_EQ, // The two elements are equivalent
VECTOR_ORDER_GT // First element should come after the second
} vector_order_t;
```
and indicates the ordering relationship between any two elements.
2. Must accept two `const void*` parameters representing the two elements to compare;
3. Must be self-contained and handle all its own resources.
Let's look at some examples; for instance, let's sort an integer array in ascending and
descending order:
```c
#include <stdio.h>
#include "src/vector.h"
vector_order_t cmp_int_asc(const void *x, const void *y) {
int x_int = *(const int*)x;
int y_int = *(const int*)y;
if (x_int < y_int) return VECTOR_ORDER_LT;
if (x_int > y_int) return VECTOR_ORDER_GT;
return VECTOR_ORDER_EQ;
}
vector_order_t cmp_int_desc(const void *x, const void *y) {
return cmp_int_asc(y, x);
}
/*
* Compile with: gcc main.c src/vector.h
* Output: Before sorting: -8 20 -10 125 34 9
* After sorting (ascending order): -10 -8 9 20 34 125
* After sorting (descending order): 125 34 20 9 -8 -10
*/
int main(void) {
vector_t *v = vector_new(5, sizeof(int)).value.vector;
int values[] = { -8, 20, -10, 125, 34, 9 };
for (size_t idx = 0; idx < 6; idx++) {
vector_push(v, &values[idx]);
}
// Print unsorted array
printf("Before sorting: ");
for (size_t idx = 0; idx < vector_size(v); idx++) {
printf("%d ", *(int*)vector_get(v, idx).value.element);
}
// Sort array in ascending order
vector_sort(v, cmp_int_asc);
// Print sorted array
printf("\nAfter sorting (ascending order): ");
for (size_t idx = 0; idx < vector_size(v); idx++) {
printf("%d ", *(int*)vector_get(v, idx).value.element);
}
// Sort array in descending order
vector_sort(v, cmp_int_desc);
// Print sorted array
printf("\nAfter sorting (descending order): ");
for (size_t idx = 0; idx < vector_size(v); idx++) {
printf("%d ", *(int*)vector_get(v, idx).value.element);
}
printf("\n");
vector_destroy(v);
return 0;
}
```
Obviously, you can use the `vector_sort` method on custom data types as well. For instance, let's suppose that you have a
struct representing employees and you want to sort them based on their age and based on their name (lexicographic sort):
```c
#include <stdio.h>
#include <string.h>
#include "src/vector.h"
typedef struct {
char name[256];
int age;
} Employee;
vector_order_t cmp_person_by_age(const void *x, const void *y) {
const Employee *x_person = (const Employee*)x;
const Employee *y_person = (const Employee*)y;
if (x_person->age < y_person->age) return VECTOR_ORDER_LT;
if (x_person->age > y_person->age) return VECTOR_ORDER_GT;
return VECTOR_ORDER_EQ;
}
vector_order_t cmp_person_by_name(const void *x, const void *y) {
const Employee *x_person = (const Employee*)x;
const Employee *y_person = (const Employee*)y;
const int result = strcmp(x_person->name, y_person->name);
if(result < 0) return VECTOR_ORDER_LT;
if(result > 0) return VECTOR_ORDER_GT;
return VECTOR_ORDER_EQ;
}
/*
* Compile with: gcc main.c src/vector.h
* Output: Sort by age:
* Name: Marco, Age: 25
* Name: Alice, Age: 28
* Name: Bob, Age: 45
*
* Sort by name:
* Name: Alice, Age: 28
* Name: Bob, Age: 45
* Name: Marco, Age: 25
*/
int main(void) {
vector_t *employees = vector_new(5, sizeof(Employee)).value.vector;
Employee e1 = { .name = "Bob", .age = 45 };
Employee e2 = { .name = "Alice", .age = 28 };
Employee e3 = { .name = "Marco", .age = 25 };
vector_push(employees, &e1);
vector_push(employees, &e2);
vector_push(employees, &e3);
// Sort array by age
vector_sort(employees, cmp_person_by_age);
// Print sorted array
printf("Sort by age:\n");
for (size_t idx = 0; idx < vector_size(employees); idx++) {
Employee *p = (Employee*)vector_get(employees, idx).value.element;
printf("Name: %s, Age: %d\n", p->name, p->age);
}
// Sort array by name
vector_sort(employees, cmp_person_by_name);
// Print sorted array
printf("\nSort by name:\n");
for (size_t idx = 0; idx < vector_size(employees); idx++) {
Employee *p = (Employee*)vector_get(employees, idx).value.element;
printf("Name: %s, Age: %d\n", p->name, p->age);
}
vector_destroy(employees);
return 0;
}
```
## Unit tests ## Unit tests
Datum provides some unit tests for both the `Vector` and the `Map` data types. To run them, you can issue the following commands: Datum provides some unit tests for both the `Vector` and the `Map` data types. To run them, you can issue the following commands:

11
docs/README.md Normal file
View File

@@ -0,0 +1,11 @@
# Documentation
In this folder you can find the technical documentation of the
`Datum` library as well as practical details on how to use it
efficiently and safely.
At the time being, this documentation includes the following pages:
- [vector.md](vector.md): vector documentation;
- [map.md](map.md): map documentation;
- [sort.md](sort.md): how to use the `vector_sort` method.

75
docs/map.md Normal file
View File

@@ -0,0 +1,75 @@
# Map Technical Details
In this document you can find a quick overview of the technical
aspects (internal design, memory layout, etc.) of the `Map` data structure.
`Map` is an hash table that uses open addressing with linear probing for collision
resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/FowlerNollVo_hash_function) as its hashing function. Resizing is performed
automatically by doubling the capacity when the load factor exceeds 75%. Internally,
this data structure is represented by the following two structures:
```c
typedef struct {
char *key;
void *value;
element_state_t state;
} map_element_t;
typedef struct {
map_element_t *elements;
size_t capacity;
size_t size;
size_t tombstone_count;
} map_t;
```
where the `key` variable represent a string used to index the `value`. The `state`, instead, indicates whether the entry is empty, occupied or deleted and is primarily used
by the garbage collector for internal memory management. An array of `map_element_t`,
with the variables indicating the *capacity*, the *current size* and
the *tombstone count* (that is, the number of delete entries), form a `map_t` data type.
The keys are **copied** by the hashmap; this means that it **owns** them and is therefore
responsible for managing their memory. Values, on the other hand,
**are stored as pointers**. This means that the hashmap **does NOT own them** and that
the caller is responsible for managing their memory; this includes: allocate
enough memory for them, ensure that the pointers remain valid for their whole lifecycle
on the map, delete old values when updating a key and, if the values were heap-allocated,
free them before removing the keys or destroying the map.
The `Map` data structure supports the following methods:
- `map_result_t map_new()`: initialize a new map;
- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map;
- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists;
- `map_result_t map_remove(map, key)`: remove a key from the map if it exists;
- `map_result_t map_clear(map)`: reset the map state;
- `map_result_t map_destroy(map)`: delete the map;
- `size_t map_size(map)`: returns map size (i.e., the number of elements);
- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
As you can see by the previous function signatures, most methods that operate
on the `Map` data type return a custom type called `map_result_t` which is
defined as follows:
```c
typedef enum {
MAP_OK = 0x0,
MAP_ERR_ALLOCATE,
MAP_ERR_INVALID,
MAP_ERR_NOT_FOUND
} map_status_t;
typedef struct {
map_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
map_t *map;
void *element;
} value;
} map_result_t;
```
Each method that returns such type indicates whether the operation was successful or not by setting
the `status` field and by providing a descriptive message on the `message` field. If the operation was
successful (that is, `status == MAP_OK`), you can either move on with the rest of the program or read
the returned value from the sum data type. Of course, you can choose to ignore the return value (if you're brave enough :D), as illustrated
in the first part of the README.

173
docs/sort.md Normal file
View File

@@ -0,0 +1,173 @@
# Sorting
As indicated in the [its documentation](/docs/vector.md), the `Vector` data type
provides an efficient in-place sorting function called `vector_sort` that uses
a builtin implementation of the [Quicksort algorithm](https://en.wikipedia.org/wiki/Quicksort). This method requires an user-defined comparison procedure which allows the
caller to customize the sorting behavior. The comparison procedure must adhere to the
following specification:
1. Must return `vector_order_t`, which is defined as follows:
```c
typedef enum {
VECTOR_ORDER_LT = 0x0, // First element should come before the second
VECTOR_ORDER_EQ, // The two elements are equivalent
VECTOR_ORDER_GT // First element should come after the second
} vector_order_t;
```
and indicates the ordering relationship between any two elements.
2. Must accept two `const void*` parameters representing two elements to compare;
3. Must be self-contained and handle all its own resources.
Let's look at some examples. For instance, let's say that we want to sort an array
of integers in ascending and descending order:
```c
#include <stdio.h>
#include "src/vector.h"
vector_order_t cmp_int_asc(const void *x, const void *y) {
int x_int = *(const int*)x;
int y_int = *(const int*)y;
if (x_int < y_int) return VECTOR_ORDER_LT;
if (x_int > y_int) return VECTOR_ORDER_GT;
return VECTOR_ORDER_EQ;
}
vector_order_t cmp_int_desc(const void *x, const void *y) {
return cmp_int_asc(y, x);
}
/*
* Compile with: gcc main.c src/vector.h
* Output: Before sorting: -8 20 -10 125 34 9
* After sorting (ascending order): -10 -8 9 20 34 125
* After sorting (descending order): 125 34 20 9 -8 -10
*/
int main(void) {
vector_t *v = vector_new(5, sizeof(int)).value.vector;
int values[] = { -8, 20, -10, 125, 34, 9 };
for (size_t idx = 0; idx < 6; idx++) {
vector_push(v, &values[idx]);
}
// Print unsorted array
printf("Before sorting: ");
for (size_t idx = 0; idx < vector_size(v); idx++) {
printf("%d ", *(int*)vector_get(v, idx).value.element);
}
// Sort array in ascending order
vector_sort(v, cmp_int_asc);
// Print sorted array
printf("\nAfter sorting (ascending order): ");
for (size_t idx = 0; idx < vector_size(v); idx++) {
printf("%d ", *(int*)vector_get(v, idx).value.element);
}
// Sort array in descending order
vector_sort(v, cmp_int_desc);
// Print sorted array
printf("\nAfter sorting (descending order): ");
for (size_t idx = 0; idx < vector_size(v); idx++) {
printf("%d ", *(int*)vector_get(v, idx).value.element);
}
printf("\n");
vector_destroy(v);
return 0;
}
```
Obviously, you can use the `vector_sort` method on custom data type as well.
For instance, let's suppose that you have a structure representing the employees of
a company and you wish to sort them based on their age and on their name (lexicographic sort):
```c
#include <stdio.h>
#include <string.h>
#include "src/vector.h"
typedef struct {
char name[256];
int age;
} Employee;
vector_order_t cmp_person_by_age(const void *x, const void *y) {
const Employee *x_person = (const Employee*)x;
const Employee *y_person = (const Employee*)y;
if (x_person->age < y_person->age) return VECTOR_ORDER_LT;
if (x_person->age > y_person->age) return VECTOR_ORDER_GT;
return VECTOR_ORDER_EQ;
}
vector_order_t cmp_person_by_name(const void *x, const void *y) {
const Employee *x_person = (const Employee*)x;
const Employee *y_person = (const Employee*)y;
const int result = strcmp(x_person->name, y_person->name);
if(result < 0) return VECTOR_ORDER_LT;
if(result > 0) return VECTOR_ORDER_GT;
return VECTOR_ORDER_EQ;
}
/*
* Compile with: gcc main.c src/vector.h
* Output: Sort by age:
* Name: Marco, Age: 25
* Name: Alice, Age: 28
* Name: Bob, Age: 45
*
* Sort by name:
* Name: Alice, Age: 28
* Name: Bob, Age: 45
* Name: Marco, Age: 25
*/
int main(void) {
vector_t *employees = vector_new(5, sizeof(Employee)).value.vector;
Employee e1 = { .name = "Bob", .age = 45 };
Employee e2 = { .name = "Alice", .age = 28 };
Employee e3 = { .name = "Marco", .age = 25 };
vector_push(employees, &e1);
vector_push(employees, &e2);
vector_push(employees, &e3);
// Sort array by age
vector_sort(employees, cmp_person_by_age);
// Print sorted array
printf("Sort by age:\n");
for (size_t idx = 0; idx < vector_size(employees); idx++) {
Employee *p = (Employee*)vector_get(employees, idx).value.element;
printf("Name: %s, Age: %d\n", p->name, p->age);
}
// Sort array by name
vector_sort(employees, cmp_person_by_name);
// Print sorted array
printf("\nSort by name:\n");
for (size_t idx = 0; idx < vector_size(employees); idx++) {
Employee *p = (Employee*)vector_get(employees, idx).value.element;
printf("Name: %s, Age: %d\n", p->name, p->age);
}
vector_destroy(employees);
return 0;
}
```

70
docs/vector.md Normal file
View File

@@ -0,0 +1,70 @@
# Vector Technical Details
In this document you can find a quick overview of the technical
aspects (internal design, memory layout, etc.) of the `Vector` data structure.
`Vector` is a dynamic array with generic data type support; this means that you can store
any kind of homogenous value on this data structure. Resizing is performed automatically
by increasing the capacity by 1.5 times when the array becomes full. Internally, this
data structure is represented by the following structure:
```c
typedef struct {
size_t size;
size_t capacity;
size_t data_size;
void *elements;
} vector_t;
```
where the `elements` variable represents the actual dynamic and generic array, the
`data_size` variable indicates the size (in bytes) of the data type while the `size`
and the `capacity` represent the number of store elements and the total size of
the structure, respectively. The dynamic array copies the values upon insertion,
thus **it owns the data** and is therefore responsible for its allocation and its
deletion.
At the time being, `Vector` supports the following methods:
- `vector_result_t vector_new(size, data_size)`: create a new vector;
- `vector_result_t vector_push(vector, value)`: add a new value to the vector;
- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists;
- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists;
- `map_result_t vector_sort(map, cmp)`: sort array using `cmp` function;
- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy;
- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes
will overwrite the memory;
- `vector_result_t vector_destroy(vector)`: delete the vector;
- `size_t vector_size(vector)`: return vector size (i.e., the number of elements);
- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size).
As you can see by the previous function signatures, most methods that operate
on the `Vector` data type return a custom type called `vector_result_t` which is
defined as follows:
```c
typedef enum {
VECTOR_OK = 0x0,
VECTOR_ERR_ALLOCATE,
VECTOR_ERR_OVERFLOW,
VECTOR_ERR_UNDERFLOW,
VECTOR_ERR_INVALID
} vector_status_t;
typedef struct {
vector_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
vector_t *vector;
void *element;
} value;
} vector_result_t;
```
Each method that returns such type indicates whether the operation was successful or not
by setting the `status` field and by providing a descriptive message on the `message`
field. If the operation was successful (that is, `status == VECTOR_OK`), you can either
move on with the rest of the program or read the returned value from the sum data type. Of course, you can choose to
ignore the return value (if you're brave enough :D), as illustrated in the first part of the README.
The documentation for the `vector_sort(map, cmp)` method can be found
in [the following document](/docs/sort.md).