Optimized extraction operation and improved CLI UI.
All checks were successful
backup.py / unit-tests (push) Successful in 2m11s
All checks were successful
backup.py / unit-tests (push) Successful in 2m11s
This commit is contained in:
69
README.md
69
README.md
@@ -24,23 +24,18 @@ Then, you can start the backup process with the following command:
|
||||
|
||||
```sh
|
||||
$ sudo ./backup.py --verbose --checksum --backup sources.ini $PWD "very_bad_pw"
|
||||
Copying photos (1/3)...DONE (0.02s)
|
||||
Copying photos (1/3)...DONE (0 seconds)
|
||||
Computing checksums...DONE (0.01s)
|
||||
computing [██████████████████████████████] 100.0% (5/5): 'Screenshot From 2026-01-22....png'
|
||||
|
||||
Copying documents (2/3)...DONE (3.39s)
|
||||
Computing checksums...DONE (1.26s)
|
||||
Copying documents (2/3)...DONE (3 seconds)
|
||||
Computing checksums...DONE (1 seconds)
|
||||
computing [██████████████████████████████] 100.0% (7881/7881): 'master'
|
||||
|
||||
Copying wireguard (3/3)...DONE (0.00s)
|
||||
Computing checksums...DONE (0.00s)
|
||||
Copying wireguard (3/3)...DONE (0 seconds)
|
||||
Computing checksums...DONE (0 seconds)
|
||||
computing [██████████████████████████████] 100.0% (1/1): 'wg0.conf'
|
||||
|
||||
Compressing backup...DONE (22.52s)
|
||||
Compressing backup...DONE (22 seconds)
|
||||
compressing [██████████████████████████████] 100.0% (8355/8354): 'rec2.jpg'
|
||||
|
||||
Encrypting backup...DONE (0.90s)
|
||||
|
||||
Encrypting backup...DONE (0 seconds)
|
||||
+---------------+------------------------------------------------------------------+
|
||||
| File name | '/home/marco/Projects/backup.py/backup-wood-20260129.tar.gz.enc' |
|
||||
+---------------+------------------------------------------------------------------+
|
||||
@@ -59,10 +54,10 @@ To extract an existing backup, you can instead issue the following command:
|
||||
|
||||
```sh
|
||||
$ ./backup.py --verbose --checksum --extract backup-wood-20260129.tar.gz.enc "very_bad_pw" backup-wood-20260129.sha256
|
||||
Decrypting backup...DONE (0.76s)
|
||||
Extracting backup...DONE (6.93s)
|
||||
Decrypting backup...DONE (0 seconds)
|
||||
Extracting backup...DONE (6 seconds)
|
||||
extracting [██████████████████████████████] 100.0% (8355/8355): 'rec2.jpg'
|
||||
Verifying backup...DONE (0.89s)
|
||||
Verifying backup...DONE (0 seconds)
|
||||
verifying [██████████████████████████████] 100.0% (7887/7887): 'master'
|
||||
|
||||
Backup extracted to: '/home/marco/Projects/backup.py/backup.py.tmp'
|
||||
@@ -119,13 +114,55 @@ it follows the procedure listed below:
|
||||
|
||||
1. **Copy phase**: uses Python `shutil.copytree()` to copy files while preserving metadata and
|
||||
symlinks (without following them) and by ignoring special files;
|
||||
2. **Compression**: creates a gzip-compressed tar archive using GNU tar;
|
||||
2. **Compression**: creates a gzip-compressed tar archive using GNU tar and GZIP;
|
||||
3. **Encryption**: encrypts the archive with GPG using AES-256 symmetric encryption;
|
||||
4. **Checksum** (optional): computes SHA256 hashes for each file in the backup archive.
|
||||
|
||||
The backup process creates temporary files in `backup.py.tmp` and `backup.py.tar.gz`, which are
|
||||
automatically cleaned up on completion or interruption (i.e., `C-c`).
|
||||
|
||||
The final backup file consists in two separate parts: an *header* and a *payload*. The former is used
|
||||
to determine whether a given backup archive is valid or not and to store various metadata values while the latter
|
||||
is used to store the actual encrypted content. In particular, the header (16 bytes) consists of the following layout:
|
||||
|
||||
| Offset | Size | Field |
|
||||
|--------|-----------------------|--------------|
|
||||
| 0 | 8 Bytes (`uint64`) | Magic number |
|
||||
| 8 | 8 Bytes (`uint64`) | file count |
|
||||
|
||||
The magic number, which is used to recognize whether a backup file is valid, is equal to `0x424B5F50595F4844` (that is, `BK_PY_HD`). The last
|
||||
8 bytes are instead used to store the element count of the backup file (which is then used on the extraction phase). In other words, this is the
|
||||
structure of a backup file:
|
||||
|
||||
```shell
|
||||
$ xxd -l 42 backup-foo-20260417.tar.gz.enc
|
||||
00000000: 424b 5f50 595f 4844 0000 0000 0000 000a BK_PY_HD........
|
||||
00000010: 2d2d 2d2d 2d42 4547 494e 2050 4750 204d -----BEGIN PGP M
|
||||
00000020: 4553 5341 4745 2d2d 2d2d ESSAGE----
|
||||
```
|
||||
|
||||
As you can see, the first 8 bytes (`424b 5f50 595f 4844`) represents the magic number, while the last 8 represent the number of elements inside
|
||||
the backup file (`0000 0000 0000 000a`), which in this example is equal to `0xA`. On extraction, we should see exactly this number of elements:
|
||||
|
||||
```shell
|
||||
$ ./backup.py -Vce backup-arxtop-20260417.tar.gz.enc "test" backup-arxtop-20260417.sha256
|
||||
Decrypting backup...DONE (0 seconds)
|
||||
Extracting backup...DONE (0 seconds)
|
||||
extracting [██████████████████████████████] 100.0% (10/10): 'ufetch.c'
|
||||
Verifying backup...DONE (0 seconds)
|
||||
verifying [██████████████████████████████] 100.0% (7/7): 'ufetch.c'
|
||||
Backup extracted to: '/home/marco/Projects/backup.py/backup.py.tmp'
|
||||
Elapsed time: 0 seconds
|
||||
```
|
||||
|
||||
You may also be wondering why the count of the two operations (*extracting* and *verifying*) differs. This is due to the fact that the first
|
||||
also includes the directories, while the latter only the actual files.
|
||||
|
||||
## Security concerns
|
||||
This tool is not intended to provide advanced security features such as plausible deniability. Many of the steps of the backup process invalidates
|
||||
this concept *by design*. Additionally, the structure of the final backup file contains some metadata that can be used to determine some information
|
||||
about the backup (namely, the number of elements).
|
||||
|
||||
## Old version
|
||||
This implementation of `backup.py` is a porting of an old backup script originally written in Bash
|
||||
that I developed back in 2018. While this new version should be compatible with old backup archives,
|
||||
|
||||
Reference in New Issue
Block a user