Improve our build infrastructure

Guess what Today I started writing a huge issue about the direction of postmarketOS, and the number one thing that needs to be addressed in my opinion is the build infrastructure. Let me just copy paste what I wrote here, and when I open the new issue (might take me a few days), refer to this one.

Building Infrastructure That Scales

Right now, the "build infrastructure" consists of one dedicated computer that builds packages with an i3 CPU located at my home. This is a rather slow CPU for this purpose, building a plasma mobile or KDE Framework update for all supported architectures may take a whole day. The software that builds the packages are the pmOS-repo-scripts, of which I am the only user. This means, maintenance is up to me alone.

The packages are only built when I manually trigger the build. I'm trying to do that ASAP after new commits have been merged to the pmaports repositry, but when I'm away from my laptop with the SSH keys, I can't do it. And it's not really transparent to everybody else to see whether the builds have been started, are running, have failed or completed. And if something fails, again it's up to me to fix it, which cuts into my total time of postmarketOS development.

So this is a great improvement over having everybody build the packages their selves like in the early days (remember how @PureTryOut provided a repository for all the KDE packages, because they took forever to build? ). But still this does not scale enough. What I would like to improve here:

Use a battle tested, widely used "repository build program" instead of the pmOS repo scripts, so we can ~~yell at somebody else~~ gratefully collaborate with fixing bugs and get upstream support in case something breaks
That repository build program should have a transparent status for everybody to see what the build computer(s) are doing, if they are stuck, what the build log looks like of where they are stuck (something like build.alpinelinux.org)
Scale over multiple machines easily
(Only after we have a better build program running:) get server(s) with more power up and get few but multiple people access to maintain them. And I would really prefer to have that in a trusted environment, not just "somewhere in the cloud".

I think something like OpenSuSE's OBS would be nice, but we would need to add apk support ourselves (they support rpm and deb already). Maybe we could even use their hosted infrastructure for building then, if they were fine with that.

There's of course Alpine's build system and this might be our most realistic bet for now. But it has its shortcomings (we can't cross compile with that like we do with the pmOS build scripts, and compared with OBS, builds are not sandboxed from each other - ncopa made a wishlist somehwere on the mailinglist but I can't find the thread right now). Also it seems Alpine would like to replace it rather sooner than later with a new system.

kaniini and soracle from Adélie have started work on abuildd, but from what I can tell this is not even in the proof of concept stage.

(Also great point with the package database site, I did not write that down yet, and that's very important for transparency and usability!)

So looking at this alone, I think the fastest way to deal with this would be setting up Alpine's stack on multiple arches and using that as repository. It would be amazing if someone could set it up for x86_64 in a virtual machine, document the steps if they are not documented much and generally learn how it works. Once that works, we can do the real setup together. Hardware will probably not the problem, multiple people have offered that we can build on their machines. Any heroes around that feel like they're up for this task?

By Oliver Smith on 2018-10-03T17:39:11

helpful contribution to the discussion from @michitux in #postmarketOS:

have you seen https://bugs.alpinelinux.org/issues/9134#note-22

my answer:

I have not, thanks for linking that :D actually it would be great if someone could evaluate these tools. drone.io as well as sircmpwn's sr.ht thing that he's promoting every now and then. if it fullfills the requirements, I'd be happy to use that as well.

By Oliver Smith on 2018-10-03T17:35:59

@sircmpwn: if you don't mind getting dragged in from the side here, maybe you have an opinion to share?

By Oliver Smith on 2018-10-03T17:37:33

Sure thing, @ollieparanoid. If anyone working with postmarket wants an account on sr.ht to experiment with, please send me an email: sir@cmpwn.com. Meanwhile, I'll address the specifics from your post...

This is a rather slow CPU for this purpose, building a plasma mobile or KDE Framework update for all supported architectures may take a whole day.

In theory I have the infrastructure to support building large packages reasonably quickly, but in practice not all of it is up and running. I look to use-cases like yours to set priorities, so I can focus on bringing up more build cycles sooner rather than later if needed. I also currently have a hard limit on build times at 45 minutes, but that is easily addressed. You can run some test builds on builds.sr.ht if you'd like to get an idea for performance and identify the bottlenecks, and I'm happy to help here as necessary.

One hang-up will be multi-arch builds. Today only amd64 is supported. I intend to add support for multiple architectures, with emulated ARM support coming soon, and RISC-V support as well. If you want native ARM builds on hardware, that might be a tall order, but assuming you use cross compilers you can get set up on amd64 no problem.

I'm trying to do that ASAP after new commits have been merged to the pmaports repositry, but when I'm away from my laptop with the SSH keys, I can't do it.

It should be possible to wire up Gitlab commits and merge requests to builds.sr.ht. When you get a merge request, we can run a clean build of the affected packages to make sure it works and you can merge with confidence. Then, once merged, we can automatically run a fresh build and push the new packages to your repository.

That repository build program should have a transparent status for everybody to see what the build computer(s) are doing, if they are stuck, what the build log looks like of where they are stuck

✓

Scale over multiple machines easily

✓

(Only after we have a better build program running:) get server(s) with more power up and get few but multiple people access to maintain them. And I would really prefer to have that in a trusted environment, not just "somewhere in the cloud".

✓

I colocate the infrastructure in a local datacenter here in Philadelphia. They're locked in a rack, behind three locked doors (with different pins), and a sophisticated alarm system. They're under my personal supervision, and I can happily take the sysadmin and hardware maintenance burdens off of your shoulders.

If you want to maintain it yourself, though, sr.ht is 100% open source so that would be entirely possible.

So looking at this alone, I think the fastest way to deal with this would be setting up Alpine's stack on multiple arches and using that as repository. It would be amazing if someone could set it up for x86_64 in a virtual machine, document the steps if they are not documented much and generally learn how it works.

Here's a script for making an Alpine linux VM suitable for booting with qemu/KVM:

https://git.sr.ht/~sircmpwn/builds.sr.ht/tree/master/images/alpine/genimg

The Alpine image automatically rebuilds and deploys itself daily:

https://builds.sr.ht/~sircmpwn/alpine/edge

By Drew DeVault on 2018-10-03T18:03:30

I don't think the build part is the hardest part of this, it could work on the existing gitlab-ci but with runners hosted by the project or with something like sr.ht (which looks awesome).

The hard part is having the chain of trust to correctly sign the packages in the repository. I have no idea if sr.ht deals with this.

By Martijn Braam on 2018-10-03T18:22:29

I'm not sure that integration with gitlab-ci is in the cards right now, it would be pretty complicated and have little value-add for most sr.ht users.

Regarding trust, though, sr.ht does deal with that. I use builds.sr.ht to automate building, signing, and publishing Alpine packages myself, here's an example:

https://builds.sr.ht/~sircmpwn/job/8108

builds.sr.ht can store secrets for you (like private keys) and populate the build environment with them only if the build was submitted by an authorized user.

By Drew DeVault on 2018-10-03T18:24:36

@SirCmpwn: wow, thanks for the super quick reply, I'll do a detailed reply tomorrow. But in short: this sounds awesome!

By Oliver Smith on 2018-10-03T19:22:24

mentioned in issue #83 (closed)

By Oliver Smith on 2018-10-03T19:41:26

In theory I have the infrastructure to support building large packages reasonably quickly, but in practice not all of it is up and running. I look to use-cases like yours to set priorities, so I can focus on bringing up more build cycles sooner rather than later if needed. I also currently have a hard limit on build times at 45 minutes, but that is easily addressed. You can run some test builds on builds.sr.ht if you'd like to get an idea for performance and identify the bottlenecks, and I'm happy to help here as necessary.

This sounds great! Sent you an e-mail regarding an account for test builds.

One hang-up will be multi-arch builds. Today only amd64 is supported. I intend to add support for multiple architectures, with emulated ARM support coming soon, and RISC-V support as well. If you want native ARM builds on hardware, that might be a tall order, but assuming you use cross compilers you can get set up on amd64 no problem.

Your build system seems to be quite flexible - maybe it makes sense to hook up pmbootstrap in the scripts that we would use for building postmarketOS packages. That's what we are using right now after all, and it handles cross compiling (pmbootstrap build --arch=armhf hello-world) (wiring up distcc and cross compilers together with qemu, see the linked wiki page for details if interested).

What I don't understand yet is, how dependency resolving (across multiple architectures, noarch package device-example-example may depend on linux-example-example which is exclusive for armhf) would work. Does sr.ht know which packages it needs to build, or would we just say: "build the whole binary packages repository"?

If the latter is the case, then we would still need the pmOS-repo-scripts (or replace it with something equivalent, but then we wouldn't be able to cross compile that easily). But nevertheless, using sr.ht would still be a huge improvement because the output would be visible for everyone, it would be easy to start new builds, we could actually implement triggers (new commit to master -> start building) and you've even offered to let us use your hardware.

It should be possible to wire up Gitlab commits and merge requests to builds.sr.ht. When you get a merge request, we can run a clean build of the affected packages to make sure it works and you can merge with confidence. Then, once merged, we can automatically run a fresh build and push the new packages to your repository.

I'm not sure that integration with gitlab-ci is in the cards right now, it would be pretty complicated and have little value-add for most sr.ht users.

The gitlab CI scripts we have set up right now to build packages from new merge requests do their job well (example). It might be a nice addition to build merge requests on sr.ht as well, but not important at all as what we have is working. The important part is that building the binary packages starts as soon as something gets pushed to relevant branches (currently master only, but we're aiming to have multiple branches).

It would be cool to have a badge (a dynamically generated svg image like Travis CI and other services offer it) that reflects the sr.ht build status if that's possible. GitLab offers a nice way to integrate them nowadays: https://gitlab.com/help/user/project/badges

I colocate the infrastructure in a local datacenter here in Philadelphia. They're locked in a rack, behind three locked doors (with different pins), and a sophisticated alarm system. They're under my personal supervision, and I can happily take the sysadmin and hardware maintenance burdens off of your shoulders.

Perfect!

Regarding trust, though, sr.ht does deal with that. I use builds.sr.ht to automate building, signing, and publishing Alpine packages myself

builds.sr.ht can store secrets for you (like private keys) and populate the build environment with them only if the build was submitted by an authorized user.

This sounds great.

I'm really excited to try out sr.ht, right now this sounds like a magic silver bullet that would solve all our binary repository problems

By Oliver Smith on 2018-10-04T06:09:18

@MartijnBraam wrote:

I don't think the build part is the hardest part of this, it could work on the existing gitlab-ci but with runners hosted by the project or with something like sr.ht (which looks awesome).

The hard part is having the chain of trust to correctly sign the packages in the repository. I have no idea if sr.ht deals with this.

To be frank, that was my idea as well. We could set gitlab runners up on various machines owned by trusted pmos contributors, and maybe some build bots. The build jobs would become part of .gitlab-CI.yaml, maybe with a filter to only build master, or something like this. Smart use of the Gitlab cache and Makefiles could avoid complete recompilations. Having multiple jobs allows them to run in parallel on multiple runners, so if there's a faster runner, it should receive more jobs (as far as I know, needs testing).

For example I have quite a beasty Ryzen 2700X, and wouldn't mind sharing some idle time to help with compilation. Runners would come and go, and the ones online when a commit is pushed would pick up the jobs.

Now, maybe the Gitlab/distributed build infrastructure isn't robust enough yet, or adapted to this kind of setup, and using a monolithic build system would be fine as well (as was proposed in the thread). As-is, I can see a couple issues with the approach I suggested:

If a runner gets shut down during compilation, I don't think Gitlab gracefully handles this
I would like to set up a gitlab runner with a very low priority on my computer, I don't know whether this is possible or easy to do.
There's no telling if someone tampers with the code, so restricting this to trusted contributors sounds like a good idea (we would need a proof of work/correctness for compilers, which is off-topic, but very interesting).

Regarding chain-of-trust issues, a dedicated runner could perform the signing job once the packages are built, probably regardless of whether we use gitlab runners or a true build farm. This can be achieved trough tagging a special runner that is the only one with access to the secrets, and hosted in a safe place.

By Mayeul C. on 2018-10-04T07:00:40

Your build system seems to be quite flexible - maybe it makes sense to hook up pmbootstrap in the scripts that we would use for building postmarketOS packages. That's what we are using right now after all, and it handles cross compiling (pmbootstrap build --arch=armhf hello-world) (wiring up distcc and cross compilers together with qemu, see the linked wiki page for details if interested).

Yeah, that seems like it would be pretty straightforward.

What I don't understand yet is, how dependency resolving (across multiple architectures, noarch package device-example-example may depend on linux-example-example which is exclusive for armhf) would work. Does sr.ht know which packages it needs to build, or would we just say: "build the whole binary packages repository"?

sr.ht basically gives you a way of writing scripts, installing packages on the host system (e.g. Alpine packages), examining the environment, etc. So it can do whatever you want. You can specify a list of packages and build just those, or you can build everything (though imo you're better off submitting separate jobs for building each package or set of packages), or you can do some more complex dependency resolution type of thing. If you can write a script which builds packages in the manner you wish, you can run it on builds.sr.ht. I figure what you probably want to do is some minor git magic to find out which packages were modified in the last commit and rebuild those automatically.

Also, if the sr.ht build manifests and hooks are not sophisticated enough for your needs, you can also use the API to submit jobs and generate the manifests. You could, for example, use the basic hooks to submit a job which examines the last commit and does some thinking about which packages to build, which can then submit new jobs with generated manifests, perhaps so that you can distribute the load of building packages into several jobs (to avoid overwhelming builds.sr.ht - it is shared infrastructure so mega-builds are not appreciated).

It would be cool to have a badge

Yeah, this is planned but not a priority.

I'm really excited to try out sr.ht, right now this sounds like a magic silver bullet that would solve all our binary repository problems

Be aware that I've made some promises which aren't backed up by implementation yet - some of these things are possible in theory and I'd like to accellerate working on them if it would help postmarket out. Bear in mind that sr.ht is alpha-quality software, and while it may be a magic bullet eventually we're going to have to work together to make sure it suits your needs. Projects like postmarketOS are squarely in my target demographic, so I would be happy to work closely with you on making sure that it suits your needs. For now, I sent you an invite - register an account, poke around, ask questions, and start experimenting. As you hit roadblocks, let's talk about them and work on getting them removed.

By Drew DeVault on 2018-10-04T11:24:35

@MayeulC:

Now, maybe the Gitlab/distributed build infrastructure isn't robust enough yet, or adapted to this kind of setup

That's my opinion TBH. That is a creative idea, but I'd rather go with something straight forward where we get support with the infrastructure.

@SirCmpwn:

You could, for example, use the basic hooks to submit a job which examines the last commit and does some thinking about which packages to build, which can then submit new jobs with generated manifests, perhaps so that you can distribute the load of building packages into several jobs (to avoid overwhelming builds.sr.ht - it is shared infrastructure so mega-builds are not appreciated).

Seems like that's what we need to do then, because otherwise we will have mega-builds for sure (right now it works like this: try to build all packages, and let it automatically skip the package which exist in the binary repository). But this is feasible with some code changes.

Be aware that I've made some promises which aren't backed up by implementation yet - some of these things are possible in theory and I'd like to accellerate working on them if it would help postmarket out. Bear in mind that sr.ht is alpha-quality software, and while it may be a magic bullet eventually we're going to have to work together to make sure it suits your needs. Projects like postmarketOS are squarely in my target demographic, so I would be happy to work closely with you on making sure that it suits your needs. For now, I sent you an invite - register an account, poke around, ask questions, and start experimenting. As you hit roadblocks, let's talk about them and work on getting them removed.

Thanks for the heads up, I'll adjust my expectations. Nevertheless, this sounds great and I appreciate your willingness to work together on this and that you're providing this infrastructure at all.

I've registrered with the link you gave me, and clicked around a bit. Next week I'll take a deeper look and start figuring out the tasks in detail that would need to be done from our end to migrate to sr.ht.

By Oliver Smith on 2018-10-05T08:00:14

Alright, I gave this some thought and came up with the following concept. To properly manage and display the current queue we will need to implement a few things on our side. I've tried to find a balance between making it as simple as possible, but still supporting multiple branches and arches.

What do you think, everyone?

@sircmpwn: I guess sr.ht already provides everything we need, assuming there is a button in place that can restart jobs (let's say downloading sources failed, then we'd want to simply restart it). But it would be nice if you could read through it once, and look out for things that might be missing in sr.ht, and give feedback in general.

@MartijnBraam: I guess we already have a MySQL database or similar for the wiki, which we could also use for the queue? And would you be interested in helping out with this?

components

sr.ht jobs:
- queue_update (parameter: branch)
- build package (parameters: pkgname, arch, branch)
postmarketos.org:
- queue_server with api calls (new python script):
  - update queue
  - update job status
  - list queue
- database
- web interface

web interface

This can be implemented once everything else is working. A simple JS frontend that queries the "list queue" api call from the queue_server and displays the information nicely:

| master
|| arch   | queue | current |
|| x86_64 | 11    | hello-world-0.1.0-r2 (log, aport, commit) |

|3.8
...

master, 3.8: this is the branch, links to the pmaports branch on gitlab
current package: the package is red on failure
log: link to sr.ht build job
aport: link to the aport at pmaports gitlab
commit: link to the commit on gitlab
explain somewhere that packages will only be switched out if the queue is at 0 (so we have atomic upgrades for all KDE and QT packages)

database

queue table:

id | aport            | pkgver | pkgrel | branch | arch   | commit | srht_id | status
1  | main/hello-world | 0.1.0  | 2      | master | x86_64 | asdfg  | 12345   | BUILDING

status enum:

WAITING
BUILDING
FAILED

log table:

id | action | details
1  | queue_update | ...
2  | build_start | ...

(log the "update queue" and "update job status" api calls and what they did, for easy debugging)

algorithm

new commit pushed to a specific branch
gitlab triggers queue_update job on sr.ht
- queue_update uses pmbootstrap to figure out new packages to be built
- queue_update sends the new queue to the queue_server (together with the branch name)
- queue_server:
  - must only process one queue update at once to avoid race conditions
  - for each package in the new queue:
    - if the package is in the queue already
      - with a different version:
        
        if it's running: stop via sr.ht api
        
        remove from queue
        
        add the package again with the new version
      - with the same version:
        
        keep the entry
    - if the package is not in the queue:
      - add it to the queue
  - if no sr.ht job is running for that branch + arch, then create a new one for the first entry in the queue (with that branch + arch)
one sr.ht job starts
- tell queue_server that it started
  - returns if the job is valid or not (so we don't manually restart outdated jobs)
  - abort if the job is invalid
- set up pmbootstrap
- save the private key to sign the packages from a sr.ht protected variable into the apk config folder
- checks out the pmaports branch
- runs pmbootstrap build --strict --arch=$arch $pkgname
  - let it access the branch+arch temp folder, so it can build upon updated dependencies that are not yet published (additional /etc/apk/repositories line?)
- on success: upload the package to the branch+arch temp folder
- tell queue_server that it was successful or not
  - queue_server:
    - on error: set queue entry status to FAILED
    - on success: remove from queue
      - if it was not the last entry from this branch+arch: start the next one
      - if it was the last entry from this branch+arch:
        
        publish all temp packages from that branch+arch
        
        remove old packages

By Oliver Smith on 2018-10-10T06:05:19

After stepping away from this big write up and coming back, I think that it would be useful to explicitly state in the web interface, how many packages are not published yet as binary packages. And which commit was the last one where all packages have been published. Something like that:

| master
|| arch   | current build                             | build queue | publish queue | last published
|| x86_64 | hello-world-0.1.0-r2 (log, aport, commit) | 11          | 20            | 854fe4  

| 3.8
...

But in the end, these are details and it would be easy to change the database layout etc. from above into one that holds the information for this table. The API calls etc stay the same, so I'm interested in what you all think of the big picture instead.

By Oliver Smith on 2018-10-10T06:40:36

This looks like it would work to me. The only hangup on my side is:

if it's running: stop via sr.ht api

Jobs can be stopped, but not via the API. This would be trivial for me to add, though.

Also, you might be able to get away with not having a database or web view if you want, with a few changes you should be able to store the extra state you need on sr.ht and query it via the API. Would take some more work on my end, though, while the design you wrote up would require very few changes to sr.ht.

By Drew DeVault on 2018-10-10T11:51:08

Also, you might be able to get away with not having a database or web view if you want, with a few changes you should be able to store the extra state you need on sr.ht and query it via the API. Would take some more work on my end, though, while the design you wrote up would require very few changes to sr.ht.

Thanks for the offer and for reading through the draft above. Could you write up how it would look like when having the state stored in sr.ht? I wonder if this could be made generic enough that it would be useful for other use cases besides the postmarketOS binary repository. To me it looks like it's specific to our use case, hence it would make more sense to do it outside of sr.ht.

By Oliver Smith on 2018-10-11T20:26:51

Well, builds.sr.ht has a list of builds both running and historical, which you can organize with tags and such. I can easily add an API which lets you append arbitrary metadata, like links to the published package on your mirrors, which would show up in the build detail page. Could also let you do arbitrary metadata like a JSON payload, which could be redownloaded and used by other tools and possibly queried against in searches both on the web and with the API.

By Drew DeVault on 2018-10-11T21:23:27

Let's go with a regular database for now.

It would be great if you could implement the stop-job-via-api feature though.

By Oliver Smith on 2018-10-15T06:21:11

Cancelling via the API is now supported: simply POST to builds.sr.ht/api/jobs/:job_id/cancel.

By Drew DeVault on 2018-10-20T01:13:29

I've started writing an implementation for the webinterface on build.postmarketos.org on https://gitlab.com/postmarketOS/build.postmarketos.org

The first job has been sent through it to sr.ht now: https://builds.sr.ht/~martijnbraam/job/8842

now it needs an implementation on the pmbootstrap side

By Martijn Braam on 2018-10-20T19:00:40

Great work, guys!

I've been working on the pmbootstrap code. Getting this integrated properly will need a lot of refactoring. I'd roughly estimate this to take me 2 weeks (as there are other postmarketOS tasks I need to take care of).

@MartijnBraam: would you like to implement the shell or python scripts running on sr.ht for the jobs as well? Right now your example job ends in "pmbootstrap: command not found". You could install it like done in the pmaports ci script.

By Oliver Smith on 2018-10-21T09:25:20

@ollieparanoid the build order thing is the easiest done with using a graph library. I've implemented basically the same thing before for my configuration manager. The dependency resolving in a graph is a reverse-topological-sort of the dependency graph

By Martijn Braam on 2018-10-22T18:42:05

By Drew DeVault on 2018-10-23T01:45:15

@MartijnBraam wrote:

the build order thing is the easiest done with using a graph library. I've implemented basically the same thing before for my configuration manager. The dependency resolving in a graph is a reverse-topological-sort of the dependency graph

The actual sorting is the easy part - the harder part is getting pmbootstrap to output the packages that are needed before building anything. Right now it looks up the dependencies and starts building the necessary packages at the same time, but what we need is having them ahead of time, which is quite a big change. And in order not to have a lot of code duplicated, we would want to use the ahead of time dependency look-up when doing the regular builds as well, that's where the rewriting comes in. If you are interested, I could upload my WIP version to a separate branch.

@SirCmpwn: the screenshot looks cool - but what do you want to say with it? (In general I like how the build steps are clearly separated. It's much harder to see what's going on in Travis, GitLab CI and Jenkins where it is all in one listing.)

By Oliver Smith on 2018-10-23T05:31:29

Oh, now I see it - you have aarch64 support now? awesome!

Just to make sure: are your aarch64 machines capable of compiling for armhf/armv7 as well? AFAIK this is possible with almost every aarch64 CPU, but there are few ones where it does not work.

By Oliver Smith on 2018-10-23T05:35:25

It only supports aarch64 for now, and I haven't pushed the changes to production yet, but I will be adding armhf soon. My goal is to support every architecture each upstream distro supports, when practical.

By Drew DeVault on 2018-10-23T11:53:50

mentioned in issue build.postmarketos.org#3 (closed)

By Oliver Smith on 2018-10-25T07:55:27

Still working on it. But let's close this thread, we have a separate project and issue tracker: https://gitlab.com/postmarketOS/build.postmarketos.org/issues

By Oliver Smith on 2019-09-07T12:27:47

closed

By Oliver Smith on 2019-09-07T12:27:48