Unlike flatpak, the apk packages + appstream approach does not make it possible to
roll back apps conveniently after an upgrade went wrong. Because updating an
app may have updated depending libraries at the same time, and other programs
may depend on the new library.
But I argue that this is not something we should optimize for. I propose a
rollback of the whole (package manager maintained) system with btrfs and
snapshots. Snapper seems to be the appropriate tool, and it is packaged in
Alpine. I have not tested it personally.
I imagine that we would allow bringing up a rollback menu during boot with some key combination. If activated, it allows booting into a previous btrfs snapshot.
So this is just the bare bones idea, but what do you think about it, before we get into all the details and edge cases?
EDIT: How about, we make this all optional, in case postmarketOS is running with a downstream kernel where btfs isn't running stable. Or in case users just don't need this feature for their use case.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
One downside I see is the improved disk usage. For example, it would keep older versions of the Firefox profile (that includes copies of every Firefox sqlite db) and older version of every package you installed in the past. Phones don't have unlimited disk space!
You also suddenly can't delete something easily or reliably: another copy may be kept in btrfs snapshots. You don't get free space again when removing something! Worse: should you want to really delete something, you suddenly have to remember to remove it in other subvolumes. You can't shred a file anymore!
I'm not sure I want to handle cleaning up btrfs subvolumes manually.
Regarding the older version of the packages, we could simply add a symlink from /etc/apk/cache (IIRC) to /var/cache/apk or wherever we want. But we are back to the same problem: it requires cleaning up manually. It creates a new chore!
a better solution would be to store the older versions of the packages. I think upstream did try to do that for some time, but apparently gave up because of the disk size requirements.
I would exclude the /home partition from the snapshot, possibly keep using ext4 for that.
Agreed with the space issue. The motivation was, to provide a fallback in case something severely goes wrong during the update and you can't boot anymore. So maybe we could simply remove the last snapshot after being booted into the new system for 10 minutes?
How exactly? It seems incompatible to the package manager approach to me.
There is even an alpine linux on ostree, but as it says in the description, it is about "atomic upgrades". So it seems to replace apk with ostree.
What I would like to have is an approach that works together with apk, it would let apk just do its job of updating everything and roll the whole package-manager based system back in case of emergency so users don't need to reflash.
However, I've changed the title of this issue to remove btrfs and snapper. The underlying technology to implement this feature is not so important to me, as long as it is reliable, maintainable and has a reasonable amount of effort.
AFAIK journaling (ext4) and copy on write (btrfs) are already protecting against power loss.
In any case, I think it is important not only to protect against power loss. I really want to avoid the situation that something goes wrong after upgrading and then people can't boot their phones anymore and must reflash. This is not acceptable for end-users.
In case it's useful, I want to note that OpenSUSE has been using btrfs for this purpose for quite some time, so perhaps some of its ideas and tools could be useful.
By default the installer suggests / on btrfs and /home on xfs.
The update program zypper has a plugin that creates pre- and post- snapshots of / under /.snapshots named with incrementing numbers. A snapshot basically has a small file containing some metadata (date it was created, whether it's a pre- or post- snapshot, text comment) and then the actual btrfs subvolume.
grub is made aware of all snapshots so it presents them in the boot menu as read-only snapshots to boot from.
snapper is used to manually list the snapshots, delete unneeded ones, etc. It's equivalent to doing it directly with btrfs subvol plus metadata from the metadata file. It also has a snapper rollback # subcommand that copies the snapshot # into a new writable snapshot and marks it as the default for the next boot.
There are systemd timer jobs that can be used to invoke snapper to take periodic snapshots based on time (daily, weekly, etc) if the user wants, and to clean up / retain old snapshots if the user wants.
So I am trying to implement this on my phone using p-boot as a replacement for grub and came upon a stumbling block: the current initramfs does not support passing in mount options, which would be needed for selecting the btrfs boot/root subvolumes.
The p-boot demo image completely sidesteps this issue by using megis custom kernel without initramfs, booting straight to the init process on the root fs. I reckon it has at least btrfs baked in. Unfortuntely this approach does not work with the latest pmos kernel out of the box and is not really compatible with fde.
So I would rather add mount options to mkinitfs' init_functions.sh script, which allow selecting a btrfs subvolume/snapshot.
A strong motivation for me is to be able to experiment on my daily driver phone with some reassurance if things go sideways.
Another motivation is to multi-boot with a stable release pmos channel without having to reflash.
First order of business: add bootfs/rootfs btrfs support to initramfs
Next up: add mount options
Extras: add btrfs options to pmbootstrap, add some scripts to handle automated snapshots and retention/cleanup, package p-boot for pinephone
There is some complicated logic around setup, fde and resizing the root partition. Adding to that without breaking anything would be hard without proper testing.
Heck, doing anything to the boot process is a potential disaster.
Are automated tests for the boot process already a thing with pmos? Or is there a test plan for manual tests?
Also if I add a PR, what needs to happen for it to be accepted?
This is not going to fly. Packaging for it might be accepted, but a solution to this should be as device-independent as possible. We have already decided to not use p-boot on the PinePhone. See #977 (closed)
There is some complicated logic around setup, fde and resizing the root partition. Adding to that without breaking anything would be hard without proper testing. Heck, doing anything to the boot process is a potential disaster.
Yes, but that's why we have edge for testing changes before they go into stable releases, so don't worry about that.
Are automated tests for the boot process already a thing with pmos? Or is there a test plan for manual tests?
We are just doing manual testing still. I think there are some ideas floating around for automated testing, but nothing that's actually deployed. That said, opening an MR (merge request) for people to try out is a great way to get people to test it!
Also if I add a PR, what needs to happen for it to be accepted?
If it looks good as-is, nothing. That said, usually people will have comments and discuss your changes and might request changes, in which case you can push your new changes to the MR and they get added there (no need to open a new MR just for that). But apart from that, nothing really. We don't use contributor licence agreements or anything like that.
This is not going to fly. Packaging for it might be accepted, but a solution to this should be as device-independent as possible. We have already decided to not use p-boot on the PinePhone. See #977 (closed)
Good point.
But to support booting different snapshots we will need some sort of boot menu.
The options I see so far are:
extending p-boot to work on other platforms (lots of work and lots of testing I guess)
chain-loading from u-boot into something like grub2 (slower than it already is and needs work on grub2 to support selection via hardware keys and even just to show something graphical..)
hacking the u-boot bootmenu to work with hardware keys or even touch (I guess this requires some work which would be hard to upstream and the result would be rather ugly)
make the graphical choice a feature of the initrd (makes sense since it already can do complicated things to enable fde, would also be the easiest path to enable this cross-platform)
I would tend to go with the last option. Arguably an upgrade which breaks the initrd might break the roll-back mechanism, but the same would be true for upgrades to any of the other options. (the initrd has more moving parts than something specialised would have though)
I had F2FS eat two disks on two different systems using fde, presumably when power was lost suddenly on them. "Ate" as in, it corrupted the luks header and there was no option for recovering it. Yes, I know F2FS and btrfs are not at all the same filesystem, and yes I know that suddenly losing power is not an ideal scenario for any filesystem, but my point is that these new filesystems are likely unproven in many situations that devices may face in the real world.
I'm not going to invoke btrfs's long history of eating data (I assume folks are familiar with that), but I think we should really test the stability of this filesystem with things like fde before making any decision to design major pmOS components/features around it.
That testing should include no fewer than 3 people running pmOS on btrfs w/ fde, 3 different devices full-time, for... I don't know.. 6 months? (1 full stable release cycle)
My experience with btrfs only concerns laptops and servers (so grain of salt), all using LUKS FDE.
My daily laptop has been on btrfs since early 2021, specifically due to multiple lost ext4 filesystems caused by sudden poweroffs. (I blame a buggy motherboard)
In the first 1.5 years, btrfs had some serious space constraints. I got ENOSPC errors (out of space) when the filesystem was 75-80% full, leaving me with a read-only system until I chrooted into it and removed files while offline.
It could also get rather slow back then, but the situation's gotten better (especially for ssd).
Last night, I tried installing a game that turned out to be rather large. Larger than the available free space. The installation process gracefully exited when the FS had 256 KiB of free space left. -And importantly, I could still use the system. No more ENOSPC.
I volunteer for testing btrfs w/FDE on my Oneplus 6 running Edge with PlaMo
Your ENOSPC was probably because you were not running periodic balance. Eg OpenSUSE by default has a cronjob (systemd service + timer to be precise) to do that once a month.
True, I didn't run periodic balances. At the time there was a belief that balancing a single-disk filesystem was unnecessary. I did run weekly scrubs though.
On the most recent version of my system (I run Fedora Atomic KDE) I can't find systemd timers for either scrub or balance, so I suppose it's not as necessary anymore in any case.
btrfs subvol list -a / is correct, and there is a subvol by default, but you'll have to create additional subvols for it to show that default subvol.
But also you should create a separate root subvol anyway for the reason in this link as OpenSUSE does. And then you'll also need to create a volume to store snapshots. So putting it all together:
# Create directories for installing base systemmkdir -p /mnt/full/ /mnt/root/# Create device for installing base systemtruncate --size=1G fslosetup --find --show fs # /dev/loop0# Prepare base system fsmkfs.btrfs /dev/loop0mount /dev/loop0 /mnt/full# Create root subvolbtrfs subvol create /mnt/full/@# Create snapshots subvolbtrfs subvol create /mnt/full/@/.snapshots# Mount root subvolmount /dev/loop0 /mnt/root -o subvol=/@# Install base systemmkdir -p /mnt/root/usr/ /mnt/root/etc/ # ...# Unmount root subvolumount /mnt/root# Create snapshot of base systemmkdir -p /mnt/full/@/.snapshots/1/btrfs subvol snapshot /mnt/full/@ /mnt/full/@/.snapshots/1/snapshot# Remount root subvol from snapshotmount /dev/loop0 /mnt/root -o subvol=/@/.snapshots/1/snapshotmount /dev/loop0 /mnt/root/.snapshots -o subvol=/@/.snapshots# Use it as usualbtrfs subvol list -a /mnt/root/# ID 256 gen 8 top level 5 path <FS_TREE>/@# ID 257 gen 8 top level 256 path <FS_TREE>/@/.snapshots# ID 258 gen 8 top level 257 path <FS_TREE>/@/.snapshots/1/snapshot
I see, thanks both! The subvol thing halfway revealed itself once I made a snapper snapshot.
@Arnavion Are we supposed to run this on the device?
some of the commands rely on options not available in our stripped-down busybox utils, only available in their full GNU utils counterparts.
btw, I've got a working port of snap-pac in pmOS (no repo yet), but it takes two pre- and two post- snapshots, rather than one of each.
It seems apk executes every script in /etc/apk/commit_hooks.d/ both before and after a commit, unless the hook has some specific sign telling apk to run it only pre or only post commit.
Well it's not so much about "automating" as much as it has to be done by the OS installer. The "Install base system" step in my list is what populates root with pmOS files, ie /mnt/root is where pmbootstrap would set up the chroot and install alpine-base etc.
Your ENOSPC was probably because you were not running periodic balance. Eg OpenSUSE by default has a cronjob (systemd service + timer to be precise) to do that once a month.
Since kernel 5.19 it can do this automatically.
Since Linux kernel 5.19 there is a sysfs knob to enable automatic block group reclaim. This is essentially the kernel automatically balancing individual block groups as they fall under a certain threshold.
According to snapper.io snapper can also work with thin-provisioned LVM volumes. While I don't know if pmOS' filesystem-in-an-image approach plays nice with LVM, that seems like a possible pathway to enable rollback-ability also on kernels not supporting btrfs.
I ported snap-pac from Arch and named the fork apk-snap. It's still under development, but should cover our need for automatic snapshots triggered by apk.