9.3 KiB
Datum is a collection of dynamic and generic data structures implemented from scratch in C with no external dependencies beyond the standard library. It currently features:
- Vector: a growable, contiguous array of homogenous generic data types;
- Map: an associative array that handles generic heterogenous data types;
Usage
At its simplest, you can use this library as follows:
Vector's usage
#include <stdio.h>
#include "src/vector.h"
/*
* Compile with: gcc main.c src/vector.c
* Output: First element: 5
* Head of vector 6, size is now: 1
*/
int main(void) {
// Create an integer vector of initial capacity equal to 5
vector_t *vec = vector_new(5, sizeof(int)).value.vector;
// Add two numbers
int val = 5;
vector_push(vec, &val);
// Equivalent as above
vector_push(vec, &(int){6});
// Print 1st element
const int first = *(int*)vector_get(vec, 0).value.element;
printf("First element: %d\n", first);
// Pop second element using LIFO policy
const int head = *(int*)vector_pop(vec).value.element;
printf("Head of vector %d, size is now: %zu\n", head, vector_size(vec));
// Remove vector from memory
vector_destroy(vec);
return 0;
}
Map's usage
#include <stdio.h>
#include "src/map.h"
typedef struct {
char name[256];
char surname[256];
short age;
} Person;
/*
* Compile with: gcc main.c src/map.c
* Output: Name: Bob, Surname: Smith, Age: 34
*/
int main(void) {
// Create a new map
map_t *map = map_new().value.map;
// Add a key to the map
const Person bob = { .name = "Bob", .surname = "Smith", .age = 34 };
map_add(map, "bob", (void*)&bob);
// Retrieve 'Bob' and check if it exists
map_result_t bob_res = map_get(map, "bob");
if (bob_res.status == MAP_ERR_NOT_FOUND) {
puts("This key does not exist.");
} else {
const Person *ret = (const Person*)bob_res.value.element;
printf("Name: %s, Surname: %s, Age: %d\n",
ret->name,
ret->surname,
ret->age
);
}
// Remove map from memory
map_destroy(map);
return 0;
}
For a more exhaustive example, refer to the usage.c file. There, you will find a program with proper error management
and a sample usage for every available method. To run it, first issue the following command:
$ make clean all
This will compile the library as well as the usage.c file and the unit tests. After that, you can run it by typing ./usage.
Technical Details
In this section, you can find a quick overview of the technical aspects (internal design, memory layout, etc.) of this library as well as an
overview about the design choices behind Datum. While both structures use void* to represent values, the way they manage memory is orthogonally different
from one another. Let's start with the Map data type.
Map
Map is an hash table implementation that uses open addressing with linear probing for collision resolution and the
FNV-1a algorithm as its hashing function. Resizing is performed automatically
by doubling the capacity when load factor exceeds 75%. Internally, this data structure is represented
by the following structures:
typedef struct {
char *key;
void *value;
element_state_t state;
} map_element_t;
typedef struct {
map_element_t *elements;
size_t capacity;
size_t size;
size_t tombstone_count;
} map_t;
where the key represent a string used to index the value. The state, instead, indicates
whether the entry is empty, occupied or deleted and is primarily used by the garbage collector
for internal memory management. An array of map_element_t as well as variables indicating
the capacity, the current size and the tombstone count (that is, the number of deleted entries)
forms a map_t data type.
The keys are copied by the hashmap. This means that the hashmap owns them and is responsible to manage their memory. Values, on the other hand, are stored as pointers. This means that the hashmap does NOT own them and that the caller is responsible for managing their memory; this includes: allocate enough memory for them, ensure that the pointers remain valid for their whole lifecycle on the map, delete old values when updating a key and, if the values were heap-allocated, free them before removing them or before destroying the map.
The Map data structures supports the following methods:
map_result_t map_new(): initialize a new map;map_result_t map_add(map, key, value): add a(key, value)pair to the map;map_result_t map_get(map, key): retrieve a values indexed bykeyif it exists;map_result_t map_remove(map, key): remove a key from the map if it exists;map_result_t map_clear(map): reset the map state;map_result_t map_destroy(map): delete the map;size_t map_size(map): returns map size (i.e., the number of elements);size_t map_capacity(map): returns map capacity (i.e., map total size).
As you can see, most methods that operates on the Map data type return a custom type called map_result_t which is defined as follows:
typedef enum {
MAP_OK = 0x0,
MAP_ERR_ALLOCATE,
MAP_ERR_INVALID,
MAP_ERR_NOT_FOUND
} map_status_t;
typedef struct {
map_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
map_t *map;
void *element;
} value;
} map_result_t;
Each method that returns a map_result_t indicates whether the operation was successful or not by setting the status field and by providing a descriptive message on the message field.
If the operation was successful (that is, status == MAP_OK), you can either move on with the flow
of the program or read the returned
value from the sum data type. Of course,
you can choose to ignore the return value (if you're brave enough :D), as illustrated in the first example of this document.
Vector
Vector is a dynamic array with generic data type support, this means that you can store any kind of homogenous value on this data structure. As in the Map's case,
resizing is performed automatically by increasing the capacity by 1.5 times when the array is full. Internally, this data structure is represented as follows:
typedef struct {
size_t count;
size_t capacity;
size_t data_size;
void *elements;
} vector_t;
where the elements represents the actual dynamic and generic array, the data_size
variable indicates the size (in bytes) of the data type while the count and
the capacity represent the number of stored elements and the total
size of the structure, respectively. The dynamic array copies the values upon
insertion, thus it owns the data and is therefore responsible for their
allocation and their deletion.
The dynamic array copies the values upon insertion, thus it is responsible for their allocation and their deletion.
The Vector data structure supports the following methods:
vector_result_t vector_new(size, data_size): create a new vector;vector_result_t vector_push(vector, value): add a new value to the vector;vector_result_t vector_set(vector, index, value): update the value of a given index if it exists;vector_result_t vector_get(vector, index): return the value indexed byindexif it exists;vector_result_t vector_pop(vector): pop last element from the vector following the LIFO policy;vector_result_t vector_clear(vector): logically reset the vector. That is, new pushes will overwrite the memory;vector_result_t vector_destroy(vector): delete the vector;size_t vector_size(vector): return vector size (i.e., the number of elements);size_t vector_capacity(vector): return vector capacity (i.e., vector total size).
As you can see, most methods that operates on the Vector data type return a custom type called
vector_result_t which is defined as follows:
typedef enum {
VECTOR_OK = 0x0,
VECTOR_ERR_ALLOCATE,
VECTOR_ERR_OVERFLOW,
VECTOR_ERR_UNDERFLOW,
VECTOR_ERR_INVALID
} vector_status_t;
typedef struct {
vector_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
vector_t *vector;
void *element;
} value;
} vector_result_t;
Each method that returns such type indicates whether the operation was successful or not by
setting the status field and by providing a descriptive message on the message field.
Just like for the Map data structure, if the operation was successful
(that is, status == VECTOR_OK), you can either move on with the rest of the program
or read the returned value from the sum data type.
Unit tests
Datum provides some unit tests for both the Vector and the Map data types. To run them, you can issue the following commands:
$ make clean all
$ ./test_vector
$ ./test_map
License
This library is released under the GPLv3 license. You can find a copy of the license with this repository or by visiting the following link.