Files
datum/docs/map.md
Marco Cetica bbae7c587e
Some checks failed
clang-build / clang-build (push) Has been cancelled
gcc-build / gcc-build (push) Has been cancelled
Fixed multiple bugs in hashmap implementation.
1. Fixed infinite loop issue that occurred when map was full of
   tombstones;
2. Fixed bug related to tombstone count decrement when adding a new
   key on a deleted slot;
3. Added proper map resize when keys are added or removed;
4. Fixed inconsistent error messages;
5. Added proper NULL check on map_remove method.
2025-11-25 12:11:26 +01:00

76 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Map Technical Details
In this document you can find a quick overview of the technical
aspects (internal design, memory layout, etc.) of the `Map` data structure.
`Map` is an hash table that uses open addressing with linear probing for collision
resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/FowlerNollVo_hash_function) as its hashing function. Resizing is performed
automatically by doubling the capacity when the load factor exceeds 75%. Internally,
this data structure is represented by the following two structures:
```c
typedef struct {
char *key;
void *value;
element_state_t state;
} map_element_t;
typedef struct {
map_element_t *elements;
size_t capacity;
size_t size;
size_t tombstone_count;
} map_t;
```
where the `key` variable represent a string used to index the `value`. The `state`, instead, indicates whether the entry is empty, occupied or deleted and is primarily used
by the garbage collector for internal memory management. An array of `map_element_t`,
with the variables indicating the *capacity*, the *current size* and
the *tombstone count* (that is, the number of delete entries), form a `map_t` data type.
The keys are **copied** by the hashmap; this means that it **owns** them and is therefore
responsible for managing their memory. Values, on the other hand,
**are stored as pointers**. This means that the hashmap **does NOT own them** and that
the caller is responsible for managing their memory; this includes: allocate
enough memory for them, ensure that the pointers remain valid for their whole lifecycle
on the map, delete old values when updating a key and, if the values were heap-allocated,
free them before removing the keys or destroying the map.
The `Map` data structure supports the following methods:
- `map_result_t map_new()`: initialize a new map;
- `map_result_t map_add(map, key, value)`: add a `(key, value)` pair to the map;
- `map_result_t map_get(map, key)`: retrieve a values indexed by `key` if it exists;
- `map_result_t map_remove(map, key)`: remove a key from the map if it exists;
- `map_result_t map_clear(map)`: reset the map state;
- `map_result_t map_destroy(map)`: delete the map;
- `size_t map_size(map)`: returns map size (i.e., the number of elements);
- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
As you can see by the previous function signatures, most methods that operate
on the `Map` data type return a custom type called `map_result_t` which is
defined as follows:
```c
typedef enum {
MAP_OK = 0x0,
MAP_ERR_ALLOCATE,
MAP_ERR_OVERFLOW,
MAP_ERR_INVALID,
MAP_ERR_NOT_FOUND
} map_status_t;
typedef struct {
map_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
map_t *map;
void *element;
} value;
} map_result_t;
```
Each method that returns such type indicates whether the operation was successful or not by setting
the `status` field and by providing a descriptive message on the `message` field. If the operation was
successful (that is, `status == MAP_OK`), you can either move on with the rest of the program or read
the returned value from the sum data type. Of course, you can choose to ignore the return value (if you're brave enough :D) as illustrated
in the first part of the README.