Refactored string_set_at to be immutable and added String documentation

This commit is contained in:
2026-01-12 11:58:32 +01:00
parent 0f8378bf75
commit ead8a6e04e
11 changed files with 221 additions and 55 deletions

View File

@@ -7,4 +7,5 @@ At the time being, this documentation includes the following pages:
- [vector.md](vector.md): vector documentation;
- [map.md](map.md): map documentation;
- [bigint.md](bigint.md): bigint documentation.
- [bigint.md](bigint.md): bigint documentation;
- [string.md](string.md): string documentation.

View File

@@ -46,7 +46,7 @@ The `BigInt` data structure supports the following methods:
- `bigint_result_t bigint_destroy(number)`: delete the big number;
- `bigint_result_t bigint_printf(format, ...)`: `printf` wrapper that introduces the `%B` placeholder to print big numbers. It supports variadic parameters.
As you can see by the previous function signatures, methods that operate on the
As you can see from the previous function signatures, methods that operate on the
`BigInt` data type return a custom type called `bigint_result_t` which is defined as
follows:
@@ -80,7 +80,7 @@ by setting the `status` field and by providing a descriptive message on the `mes
field. If the operation was successful (that is, `status == BIGINT_OK`), you can either
move on with the rest of the program or read the returned value from the sum data type.
Of course, you can choose to ignore the return value (if you're brave enough :D) as
illustrated in the first part of the README.
illustrated on the first part of the README.
The sum data type (i.e., the `value` union) defines four different variables. Each
of them has an unique scope as described below:

View File

@@ -5,7 +5,7 @@ aspects (internal design, memory layout, etc.) of the `Map` data structure.
`Map` is an hash table that uses open addressing with linear probing for collision
resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/FowlerNollVo_hash_function) as its hashing function. Resizing is performed
automatically by doubling the capacity when the load factor exceeds 75%. Internally,
this data structure is represented by the following two structures:
this data structure is represented by the following two layouts:
```c
typedef struct {
@@ -46,7 +46,7 @@ The `Map` data structure supports the following methods:
- `size_t map_size(map)`: returns map size (i.e., the number of elements);
- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
As you can see by the previous function signatures, most methods that operate
As you can see from the previous function signatures, most methods that operate
on the `Map` data type return a custom type called `map_result_t` which is
defined as follows:
@@ -73,4 +73,4 @@ Each method that returns such type indicates whether the operation was successfu
the `status` field and by providing a descriptive message on the `message` field. If the operation was
successful (that is, `status == MAP_OK`), you can either move on with the rest of the program or read
the returned value from the sum data type. Of course, you can choose to ignore the return value (if you're brave enough :D) as illustrated
in the first part of the README.
on the first part of the README.

96
docs/string.md Normal file
View File

@@ -0,0 +1,96 @@
# String Technical Details
In this document you can find a quick overview of the technical
aspects (internal design, memory layout, etc.) of the `String` data structure.
`String` is an immutable string data type with partial UTF-8 support.
This means that methods return a new string instance rather than modifying the string in-place.
Internally, this data structure is represented by the following layout:
```c
typedef struct {
char *data;
size_t byte_size;
size_t byte_capacity;
size_t char_count;
} string_t;
```
where the `data` variable represents the actual string (represented as a pointer to `char`),
the `byte_size` variable indicates the actual size (in bytes) of the string, the
`byte_capacity` variable represents the total number of allocated memory (in bytes) and the
`char_count` variable represent the number of logical characters, that is the number of
symbols.
As mentioned earlier, this library provides partial UTF-8 support. It is able to recognize
UTF-8 byte sequences as individual Unicode code points, which allows it to correctly distinguish
between byte length and character count. It fully supports Unicode symbols and emojis, while
remaining backward compatible with ASCII strings.
However, this data structure does not support localization. In particular, it does not perform
locale-aware conversion; for instance, uppercase/lowercase transformations are limited to ASCII
characters only. As a result, the German scharfes S (`ß`) is not convert to `SS`, the Spanish
`Ñ` is not converted to `ñ` and the Italian `é` (and its variants) is not treated as a single
symbol, but rather as a base letter combined with an accent.
At the time being, `String` supports the following methods:
- `string_result_t string_new(c_str)`: create a new string;
- `string_result_t string_clone(str)`: clone an existing string;
- `string_result_t string_concat(x, y)`: concatenate two strings together;
- `string_result_t string_contains(haystack, needle)`: search whether the `haystack` string contains `needle`;
- `string_result_t string_slice(str, start, end)`: return a slice (a new string) from `str` between `start` and `end` indices (inclusive);
- `string_result_t string_eq(x, y, case_sensitive)`: check whether `x` and `y` are equal;
- `string_result_t string_get_at(str, position)`: get the UTF-8 symbol indexed by `position` from `str`;
- `string_result_t string_set_at(str, position, utf8_char)`: write a UTF-8 symbol into `str` at index `position`;
- `string_result_t string_to_lower(str)`: convert a string to lowercase;
- `string_result_t string_to_upper(str)`: convert a string to uppercase;
- `string_result_t string_reverse(str)`: reverse a string;
- `string_result_t string_trim(str)`: remove leading and trailing white space from a string;
- `string_result_t string_split(str, delim)`: split a string into an array of `string_t` by specifying a separator;
- `string_result_t string_destroy(str)`: remove a string from memory;
- `string_result_t string_split_destroy(split, count)`: remove an array of strings from memory;
- `size_t string_size(str)`: return string character count.
As you can see from the previous function signatures, most methods that operate on the `String`
data type return a custom type called `string_result_t` which is defined as follows:
```c
typedef enum {
STRING_OK = 0x0,
STRING_ERR_ALLOCATE,
STRING_ERR_INVALID,
STRING_ERR_INVALID_UTF8,
STRING_ERR_OVERFLOW
} string_status_t;
typedef struct {
string_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
string_t *string; // For new, clone, slice, reverse, trim
char *symbol; // For get_at
int64_t idx; // For contains
bool is_equ; // For comparison
struct { // For split
string_t **strings;
size_t count;
} split;
} value;
} string_result_t;
```
Each method that returns such type indicates whether the operation was successful or not
by setting the `status` field and by providing a descriptive message on the `message`
field. If the operation was successful (that is, `status == STRING_OK`) you can either
move on with the rest of your program or read the returned value from the sum data type.
Of course, you can choose to ignore the return value (if you're brave enough :D) as illustrated
on the first part of the README.
The sum data type (i.e., the `value` union) defines five different variables.
Each of them has an unique scope as described below:
- `string`: result of `new`, `clone`, `slice`, `reverse` and `trim` functions;
- `symbol`: result of `get_at` function;
- `idx`: result of `contains` function;
- `is_eq`: result of `equ` function. It's true when two strings are equal, false otherwise;
- `split`: result of `split` function. It contains an array of `string_t` and its number of elements.

View File

@@ -5,7 +5,7 @@ aspects (internal design, memory layout, etc.) of the `Vector` data structure.
`Vector` is a dynamic array with generic data type support; this means that you can store
any kind of homogenous value on this data structure. Resizing is performed automatically
by increasing the capacity by 1.5 times when the array becomes full. Internally, this
data structure is represented by the following structure:
data structure is represented by the following layout:
```c
typedef struct {
@@ -39,7 +39,7 @@ At the time being, `Vector` supports the following methods:
- `size_t vector_size(vector)`: return vector size (i.e., the number of elements);
- `size_t vector_capacity(vector)`: return vector capacity (i.e., vector total size).
As you can see by the previous function signatures, most methods that operate
As you can see from the previous function signatures, most methods that operate
on the `Vector` data type return a custom type called `vector_result_t` which is
defined as follows:
@@ -66,7 +66,7 @@ Each method that returns such type indicates whether the operation was successfu
by setting the `status` field and by providing a descriptive message on the `message`
field. If the operation was successful (that is, `status == VECTOR_OK`), you can either
move on with the rest of the program or read the returned value from the sum data type. Of course, you can choose to
ignore the return value (if you're brave enough :D) as illustrated in the first part of the README.
ignore the return value (if you're brave enough :D) as illustrated on the first part of the README.
## Functional methods
`Vector` provides three functional methods called `map`, `filter` and `reduce` which allow the caller to apply a computation to the vector,