Completed documentation
This commit is contained in:
180
README.md
180
README.md
@@ -1,27 +1,30 @@
|
|||||||
# Datum [](https://github.com/ceticamarco/datum/actions/workflows/datum.yml)
|
<div align="center">
|
||||||
|
<h1>Datum</h1>
|
||||||
|
<h6><i>Collection of dynamic and generic data structures.</i></h6>
|
||||||
|
|
||||||
|
[](https://github.com/ceticamarco/datum/actions/workflows/datum.yml)
|
||||||
|
</div>
|
||||||
|
|
||||||
Datum is a collection of dynamic and generic data structures implemented from scratch in C with no external dependencies beyond
|
Datum is a collection of dynamic and generic data structures implemented from scratch in C with no external dependencies beyond
|
||||||
the standard library. It currently features:
|
the standard library. It currently features:
|
||||||
|
|
||||||
- **Vector**: a growable, contiguous array supporting homogenous data types (both primitives and user-defined types);
|
- **Vector**: a growable, contiguous array of homogenous generic data types;
|
||||||
- **Map**: an associative array that handles generic heterogenous data types;
|
- **Map**: an associative array that handles generic heterogenous data types;
|
||||||
|
|
||||||
To learn more about the memory model of this library as well as the technical details
|
|
||||||
on how to use it efficiently and safely, be sure to read [the design manual](docs/manual.pdf).
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
At its simplest, you can use this library as follows:
|
At its simplest, you can use this library as follows:
|
||||||
|
|
||||||
### `Vector`
|
### `Vector`'s usage
|
||||||
|
|
||||||
```c
|
```c
|
||||||
#include <stdio.h>
|
#include <stdio.h>
|
||||||
#include "src/vector.h"
|
#include "src/vector.h"
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Compile with: gcc main.c src/vector.c
|
* Compile with: gcc main.c src/vector.c
|
||||||
* Output: First element: 5
|
* Output: First element: 5
|
||||||
* Head of vector 6, size is now: 1
|
* Head of vector 6, size is now: 1
|
||||||
*/
|
*/
|
||||||
|
|
||||||
int main(void) {
|
int main(void) {
|
||||||
// Create an integer vector of initial capacity equal to 5
|
// Create an integer vector of initial capacity equal to 5
|
||||||
@@ -30,7 +33,8 @@ int main(void) {
|
|||||||
// Add two numbers
|
// Add two numbers
|
||||||
int val = 5;
|
int val = 5;
|
||||||
vector_push(vec, &val);
|
vector_push(vec, &val);
|
||||||
vector_push(vec, &(int){6}); // Equivalent as above
|
// Equivalent as above
|
||||||
|
vector_push(vec, &(int){6});
|
||||||
|
|
||||||
// Print 1st element
|
// Print 1st element
|
||||||
const int first = *(int*)vector_get(vec, 0).value.element;
|
const int first = *(int*)vector_get(vec, 0).value.element;
|
||||||
@@ -47,7 +51,7 @@ int main(void) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### `Map`
|
### `Map`'s usage
|
||||||
|
|
||||||
```c
|
```c
|
||||||
#include <stdio.h>
|
#include <stdio.h>
|
||||||
@@ -60,16 +64,15 @@ typedef struct {
|
|||||||
} Person;
|
} Person;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Compile with: gcc main.c src/map.c
|
* Compile with: gcc main.c src/map.c
|
||||||
* Output: Name: Bob, Surname: Smith, Age: 34
|
* Output: Name: Bob, Surname: Smith, Age: 34
|
||||||
*/
|
*/
|
||||||
int main(void) {
|
int main(void) {
|
||||||
// Create a new map
|
// Create a new map
|
||||||
map_t *map = map_new().value.map;
|
map_t *map = map_new().value.map;
|
||||||
|
|
||||||
const Person bob = { .name = "Bob", .surname = "Smith", .age = 34 };
|
|
||||||
|
|
||||||
// Add a key to the map
|
// Add a key to the map
|
||||||
|
const Person bob = { .name = "Bob", .surname = "Smith", .age = 34 };
|
||||||
map_add(map, "bob", (void*)&bob);
|
map_add(map, "bob", (void*)&bob);
|
||||||
|
|
||||||
// Retrieve 'Bob' and check if it exists
|
// Retrieve 'Bob' and check if it exists
|
||||||
@@ -77,8 +80,12 @@ int main(void) {
|
|||||||
if (bob_res.status == MAP_ERR_NOT_FOUND) {
|
if (bob_res.status == MAP_ERR_NOT_FOUND) {
|
||||||
puts("This key does not exist.");
|
puts("This key does not exist.");
|
||||||
} else {
|
} else {
|
||||||
const Person *retr = (const Person*)bob_res.value.element;
|
const Person *ret = (const Person*)bob_res.value.element;
|
||||||
printf("Name: %s, Surname: %s, Age: %d\n", retr->name, retr->surname, retr->age);
|
printf("Name: %s, Surname: %s, Age: %d\n",
|
||||||
|
ret->name,
|
||||||
|
ret->surname,
|
||||||
|
ret->age
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Remove map from memory
|
// Remove map from memory
|
||||||
@@ -98,21 +105,142 @@ $ make clean all
|
|||||||
This will compile the library as well as the `usage.c` file and the unit tests. After that, you can run it by typing `./usage`.
|
This will compile the library as well as the `usage.c` file and the unit tests. After that, you can run it by typing `./usage`.
|
||||||
|
|
||||||
## Technical Details
|
## Technical Details
|
||||||
As stated earlier, refer to [the design manual](docs/manual.pdf) for a comprehensive documentation of this library. Below, there's a quick
|
In this section, you can find a quick overview of the technical aspects (internal design, memory layout, etc.) of this library as well as an
|
||||||
overview about the design choices behind Datum. While both structures use `void*` to represent values, the way they manage memory is orthogonally different
|
overview about the design choices behind Datum. While both structures use `void*` to represent values, the way they manage memory is orthogonally different
|
||||||
from one another. Let's start with the `Map` data type.
|
from one another. Let's start with the `Map` data type.
|
||||||
|
|
||||||
|
### Map
|
||||||
`Map` is an hash table implementation that uses open addressing with linear probing for collision resolution and the
|
`Map` is an hash table implementation that uses open addressing with linear probing for collision resolution and the
|
||||||
[FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) as its hashing function. Resizing is performed automatically
|
[FNV-1a algorithm](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) as its hashing function. Resizing is performed automatically
|
||||||
by doubling the capacity when load factor exceeds 75%. The keys are **copied** by the hashmap. This means that the hashmap **owns** them and is responsible
|
by doubling the capacity when load factor exceeds 75%. Internally, this data structure is represented
|
||||||
to manage their memory. Values, on the other hand, **are stored as pointers**. This means that the hashmap **does NOT own** them and the caller is responsible
|
by the following structures:
|
||||||
to manage their memory; this includes: allocate enough memory for them, ensure that the pointers remain valid for their whole lifecycle on the map,
|
|
||||||
|
```c
|
||||||
|
typedef struct {
|
||||||
|
char *key;
|
||||||
|
void *value;
|
||||||
|
element_state_t state;
|
||||||
|
} map_element_t;
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
map_element_t *elements;
|
||||||
|
size_t capacity;
|
||||||
|
size_t size;
|
||||||
|
size_t tombstone_count;
|
||||||
|
} map_t;
|
||||||
|
```
|
||||||
|
where the `key` represent a string used to index the `value`. The state, instead, indicates
|
||||||
|
whether the entry is empty, occupied or deleted and is primarily used by the garbage collector
|
||||||
|
for internal memory management. An array of `map_element_t` as well as variables indicating
|
||||||
|
the capacity, the current size and the tombstone count (that is, the number of deleted entries)
|
||||||
|
forms a `map_t` data type.
|
||||||
|
|
||||||
|
The keys are **copied** by the hashmap. This means that the hashmap **owns** them and is responsible
|
||||||
|
to manage their memory. Values, on the other hand, **are stored as pointers**. This means that the hashmap **does NOT own them** and that the caller is responsible
|
||||||
|
for managing their memory; this includes: allocate enough memory for them, ensure that the pointers remain valid for their whole lifecycle on the map,
|
||||||
delete old values when updating a key and, if the values were heap-allocated, free them before removing them or before destroying the map.
|
delete old values when updating a key and, if the values were heap-allocated, free them before removing them or before destroying the map.
|
||||||
|
|
||||||
`Vector`, instead, is a dynamic array with generic data type support. This means that you can store any kind of homogenous value on the data structure. As in the `Map`'s case,
|
The `Map` data structures supports the following methods:
|
||||||
resizing is performed automatically by increasing the capacity by 1.5 times when the array is full. The dynamic array copies the values upon insertion, thus it is responsible
|
|
||||||
|
- `map_result_t map_new()`: initialize a new map;
|
||||||
|
- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map;
|
||||||
|
- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists;
|
||||||
|
- `map_result_t map_remove(map, key)`: remove a key from the map if it exists;
|
||||||
|
- `map_result_t map_clear(map)`: reset the map state;
|
||||||
|
- `map_result_t map_destroy(map)`: delete the map;
|
||||||
|
- `size_t map_size(map)`: returns map size (i.e., the number of elements);
|
||||||
|
- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
|
||||||
|
|
||||||
|
As you can see, most methods that operates on the `Map` data type return a custom type called `map_result_t` which is defined as follows:
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef enum {
|
||||||
|
MAP_OK = 0x0,
|
||||||
|
MAP_ERR_ALLOCATE,
|
||||||
|
MAP_ERR_INVALID,
|
||||||
|
MAP_ERR_NOT_FOUND
|
||||||
|
} map_status_t;
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
map_status_t status;
|
||||||
|
uint8_t message[RESULT_MSG_SIZE];
|
||||||
|
union {
|
||||||
|
map_t *map;
|
||||||
|
void *element;
|
||||||
|
} value;
|
||||||
|
} map_result_t;
|
||||||
|
```
|
||||||
|
|
||||||
|
Each method that returns a `map_result_t` indicates whether the operation was successful or not by setting the `status` field and by providing a descriptive message on the `message` field.
|
||||||
|
If the operation was successful (that is, `status == MAP_OK`), you can either move on with the flow
|
||||||
|
of the program or read the returned
|
||||||
|
value from the sum data type. Of course,
|
||||||
|
you can choose to ignore the return value (if you're brave enough :D), as illustrated in the first example of this document.
|
||||||
|
|
||||||
|
### Vector
|
||||||
|
`Vector` is a dynamic array with generic data type support, this means that you can store any kind of homogenous value on this data structure. As in the `Map`'s case,
|
||||||
|
resizing is performed automatically by increasing the capacity by 1.5 times when the array is full. Internally, this data structure is represented as follows:
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef struct {
|
||||||
|
size_t count;
|
||||||
|
size_t capacity;
|
||||||
|
size_t data_size;
|
||||||
|
void *elements;
|
||||||
|
} vector_t;
|
||||||
|
```
|
||||||
|
|
||||||
|
where the `elements` represents the actual dynamic and generic array, the `data_size`
|
||||||
|
variable indicates the size (in bytes) of the data type while the count and
|
||||||
|
the capacity represent the number of stored elements and the total
|
||||||
|
size of the structure, respectively. The dynamic array copies the values upon
|
||||||
|
insertion, thus **it owns the data** and is therefore responsible for their
|
||||||
|
allocation and their deletion.
|
||||||
|
|
||||||
|
The dynamic array copies the values upon insertion, thus it is responsible
|
||||||
for their allocation and their deletion.
|
for their allocation and their deletion.
|
||||||
|
|
||||||
|
The `Vector` data structure supports the following methods:
|
||||||
|
|
||||||
|
- `vector_result_t vector_new(size, data_size)`: create a new vector;
|
||||||
|
- `vector_result_t vector_push(vector, value)`: add a new value to the vector;
|
||||||
|
- `vector_result_t vector_set(vector, index, value)`: update the value of a given index if it exists;
|
||||||
|
- `vector_result_t vector_get(vector, index)`: return the value indexed by `index` if it exists;
|
||||||
|
- `vector_result_t vector_pop(vector)`: pop last element from the vector following the LIFO policy;
|
||||||
|
- `vector_result_t vector_clear(vector)`: logically reset the vector. That is, new pushes
|
||||||
|
will overwrite the memory;
|
||||||
|
- `vector_result_t vector_destroy(vector)`: delete the vector;
|
||||||
|
- `size_t vector_size(vector)`: return vector size (i.e., the number of elements);
|
||||||
|
- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size).
|
||||||
|
|
||||||
|
As you can see, most methods that operates on the `Vector` data type return a custom type called
|
||||||
|
`vector_result_t` which is defined as follows:
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef enum {
|
||||||
|
VECTOR_OK = 0x0,
|
||||||
|
VECTOR_ERR_ALLOCATE,
|
||||||
|
VECTOR_ERR_OVERFLOW,
|
||||||
|
VECTOR_ERR_UNDERFLOW,
|
||||||
|
VECTOR_ERR_INVALID
|
||||||
|
} vector_status_t;
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
vector_status_t status;
|
||||||
|
uint8_t message[RESULT_MSG_SIZE];
|
||||||
|
union {
|
||||||
|
vector_t *vector;
|
||||||
|
void *element;
|
||||||
|
} value;
|
||||||
|
} vector_result_t;
|
||||||
|
```
|
||||||
|
|
||||||
|
Each method that returns such type indicates whether the operation was successful or not by
|
||||||
|
setting the `status` field and by providing a descriptive message on the `message` field.
|
||||||
|
Just like for the `Map` data structure, if the operation was successful
|
||||||
|
(that is, `status == VECTOR_OK`), you can either move on with the rest of the program
|
||||||
|
or read the returned value from the sum data type.
|
||||||
|
|
||||||
## Unit tests
|
## Unit tests
|
||||||
Datum provides some unit tests for both the `Vector` and the `Map` data types. To run them, you can issue the following commands:
|
Datum provides some unit tests for both the `Vector` and the `Map` data types. To run them, you can issue the following commands:
|
||||||
|
|
||||||
@@ -124,4 +252,4 @@ $ ./test_map
|
|||||||
|
|
||||||
## License
|
## License
|
||||||
This library is released under the GPLv3 license. You can find a copy of the license with this repository or by visiting
|
This library is released under the GPLv3 license. You can find a copy of the license with this repository or by visiting
|
||||||
[the following link](https://choosealicense.com/licenses/gpl-3.0/).
|
[the following link](https://choosealicense.com/licenses/gpl-3.0/).
|
||||||
|
|||||||
Reference in New Issue
Block a user