tlwiki/content/archive/pkglists.md

77 lines
4.5 KiB
Markdown

# Pkglists
Sources:
https://git.thurstylark.com/vcsh/pkglists.git/
https://git.thurstylark.com/vcsh/systemd-user.git/
https://github.com/gs93/pakbak
These used to be actual lists created by [an admittedly terrible script](https://git.thurstylark.com/vcsh/pkglists.git/tree/.pkglists/pkglistbu.sh?id=62cc7e34c1354900cf7cc58480d9b4db2cd7309a) that was run nightly by a systemd --user timer, but now I backup `pacman`'s whole local database whenever there's a change using pakbak and `systemd.path` units.
## Pakbak
Pakbak is pretty straightforward: You configure where you want the backups stored, the ammount of backups you want to keep, and then enable `pakbak.path`. This will trigger `pacbak.service` whenever there is a change to `/var/lib/pacman/local` in the filesystem, `pakbak.path` triggers `pakbak.service`, which runs `/usr/lib/systemd/scripts/pakbak`. The pakbak script checks for pacman's lock file before continuing, then creates a tar archive of `/var/lib/pacman/local`.
If later access is needed, untar the archive, and point pacman at that dir (e.g.: to get the package lists from an unrecoverable system). The archive includes the dirs leading up to the actual database, so if simple recovery is the goal, just untar at `/`. For accessing the database in other situations, it might be prudent to add `--strip-components=3` in order to get just the `local` subdir.
### Config
```prettyprint
- Backup the database to this folder
target_folder=/home/thurstylark/.pkglists/$(hostname)/
- Define how long backups should be kept
- Can be a number of days or empty to disable
keep_days=
- Define how many backups should be kept
- If more backup are found, the oldest are deleted
- Can be a number of file or empty to disable
keep_number=1
```
This will keep only one copy of the database around, and delete all the others. Since they are being committed to a git repo, there's no need to keep several copies to have access to the history.
## Backing up
Once pakbak has created the archive, it is added, committed, and pushed to a vcsh git repo by a couple of systemd --user services. This is triggered by any change to the directory that pakbak writes its output to for that host. Activation of this process is handled by `pkglists-commit.path`:
```prettyprint
[Unit]
Description=Path activation for pkglists-commit.service
[Path]
PathChanged=%h/.pkglists/%H
MakeDirectory=true
[Install]
WantedBy=default.target
```
This unit watches `$HOME/.pkglists/$HOSTNAME` for changes, and on any activity, activates `pkglists-commit.service`:
```prettyprint
[Unit]
Description=Add, commit, and push pacman db backups
[Service]
Type=oneshot
RemainAfterExit=no
ExecStartPre=/usr/bin/bash -c 'wait $(pgrep pakbak)'
ExecStartPre=/usr/bin/vcsh pkglists pull
ExecStartPre=/usr/bin/vcsh pkglists add -A %h/.pkglists/%H/*
ExecStartPre=/usr/bin/vcsh pkglists commit -m "Auto-commiting %H pacman db"
ExecStart=/usr/bin/vcsh pkglists push
```
This service waits for pakbak to complete, pulls the pkglists repo, adds any changes to `$HOME/.pkglists/$HOSTNAME`, commits to the repo, then pushes. Thanks to `git-add`'s `-A` option, the add step also includes removal, so that git will remove the old database from the branch, which avoids the possibility of having multiple databases exist when pulling from or cloning the repo.
This is where the main magic happens that allows automatic distributed backups. For each host that has this set up, they also are housing the backups. The origin branch is _technically_ the canonical master, but since this is git, recovery is very easy to access from any machine. Also, since the trigger is literally any time pacman's local database changes (read: installation or removal of a package), the chances of all the clients having the latest revision of the repo becomes much higher.
## Caveats
- I'm not a big fan of using my own `systemd.path` unit since one is already provided by pakbak, but since A) this process deals with files in a home folder, and B) there isn't any forseeable situation where a `systemd --user` instance _won't_ be running when the database gets updated, I opted for `systemd --user` units to manage this part of the solution, and `--user` units can't depend on `--system` units. This could easily be fixed by installing a copy of pakbak's units to `/usr/lib/systemd/user/` in the `PKGBUILD` and running it from there.
- All of this could probably be replaced with my own `systemd --user` units or a pacman hook, but for the moment, I'm more concerned with getting it working.