Troubleshooting guide for end-users (and related wiki cleanups)
TL;DR - I've been working on a new troubleshooting page: Draft:Troubleshooting. I've found that there are many more-or-less related things that are worth discussing, and I'd like to gather some feedback on the form and layout of the Troubleshooting page; both of these will be discussed in this issue.
Introduction
At the moment, the Troubleshooting page is a mix of development advice and user advice, mostly focusing on the development/pmbootstrap side of things. Some attempts have been made to add more user-facing documentation to it (see e.g. Caleb's work on this), but it's still a bit of a mess.
Currently, information actually useful to debugging issues with a running system is scattered across various pages, and many things are simply not mentioned (e.g. where to look for log files for UIs, how to debug crashing programs, etc).
Historically, much of the wiki is dedicated to either development info or things related to downstream kernels; see e.g.
- The WiFi page which is 90% "how to get WiFi working on downstream or find firmware",
- The Audio page which talks about getting audio routes from Android and such,
- ...and much of the Troubleshooting page being for porting advice or pmbootstrap debugging.
As we move towards the goal of making pmOS daily-drivable, I think it's time to focus more on the user-facing documentation:
- It gives new users (and existing users - I learned a lot myself while writing my proposed rewrite!) the instructions and information necessary for troubleshooting their own systems;
- With this information, we can point them to the right places to report the issues and mention what to report;
- The load on support channels could be lightened a bit, or at least supported by good documentation.
The proposal
The Troubleshooting page is rewritten to contain information useful to end-users: how to debug a running install, as well as some help for issues that one might encounter while installing with pmbootstrap. (In my draft, this is the Draft:Troubleshooting page.)
The old Troubleshooting page is moved to a subpage (currently named Draft:Troubleshooting/Development issues, name is subject to change). Some information only relevant to building downstream kernels is moved to Troubleshooting/Downstream kernels (currently named Troubleshooting:kernel, see next paragraph).
The existing troubleshooting subpages are:
- Cleaned up and in most cases merged into the relevant articles for the component they concern (e.g. troubleshooting for audio components would either be in the Audio article, or in an Audio/Troubleshooting subpage - for my reasoning, see the "Component articles" section),
- or, if they're kept, they're renamed to get rid of the
:
separator in favor of the more correct/
separator.
Information only relevant to downstream kernels is kept separate from information relevant to either mainline kernels or both downstream and mainline (think userspace daemons like PulseAudio/Pipewire). This could be done in various ways and it's not set in stone yet, see "Downstream kernel-specific information" section.
Troubleshooting for end-users
The very first step was to re-organize the Troubleshooting page, though it ended up being a complete rewrite. My draft can be found at User:Knuxify/Draft:Troubleshooting.
My idea was to split the page up into separate sections, for the different things that one might try to troubleshoot; separate section for app/interface issues, boot issues, pmbootstrap issues, etc. I initially considered whether not to split them up into separate pages, but at the moment there just isn't enough content to warrant it.
I also wanted to minimize the amount of switching between pages that one has to do to find the right information, but it's a careful balancing act to accidentally not put everything on one page.
Nonetheless, I think some things might benefit from having their own page; notably, I've considered rolling pmbootstrap troubleshooting into a separate subpage of the main pmbootstrap article, with maybe a few common issues on the Troubleshooting page for folks installing postmarketOS with pmbootstrap, but not necessarily interested in the development bits.
Boot issues are another example, since we already have Troubleshooting:boot for those, though it doesn't go much further than "determine if it actually boots or not". Probably we could either move everything to the boot subpage, or move some stuff from the boot subpage to the main page and leave the detailed instructions to the subpage.
I also couldn't decide what to do with the "Getting logs" section; I didn't want to roll up every single log location into one section, since e.g. somebody trying to debug an application doesn't need to know how to find logs for boot failures, and vice versa. I ended up putting just the kernel log and syslog in the main "Getting logs" section since it's useful to pretty much everyone, while leaving links to separate "getting logs" subsections for boot and applications, which makes the section less wordy, but does make me wonder if it's not forcing people to click around too much.
In general, I'd like feedback on the layout of the page. How does it feel to browse? I tried to make the order of sections flow somewhat naturally (except for pmbootstrap being at the end, since one would expect it to be an install step - but since it's irrelevant to on-device issues, which the page focuses on, I moved it to the end. Maybe it being not on-device is a good point for moving it out?)
Troubleshooting for developers
The draft for the main Troubleshooting page is done, but I didn't really work on touching up the main Troubleshooting page - I haven't yet had a proper idea about what to do with it, and I would like to hear others' opinions. My WIP page with some TODOs can be found at User:Knuxify/Draft:Troubleshooting/Development issues. As mentioned earlier, the name is subject to change.
I made a spreadsheet with some personal notes on the Troubleshooting page sections as well as the Troubleshooting:* pages: https://docs.google.com/spreadsheets/d/1Xwldd9005cj1T04r02N5zUKbDImAE6XsKrW8_xZYYIY/edit?usp=sharing (see tabs at the bottom for the two pages). These are all just my personal opinions, some of the notes could be wrong...
I think the second page in particular ties into another point I wanted to make:
Component pages
The first thing to mention about component pages I've described in the linked spreadsheet, quote:
Worth noting is the phenomenon of Troubleshooting pages for specific components (e.g. HID buttons), while other components have entirely separate articles (e.g. WiFi, Audio). Some have both! (Troubleshooting:audio and Audio (though the Troubleshooting page for it should be dropped), Troubleshooting:Display and Display, etc. We should find and integrate these together. Maybe make a separate article for downstream kernel quirks and move all of these articles under it (since much of the advice is downstream-specific).
I've kept a list of these pages in the "Issues with particular components" section of my Troubleshooting rewrite, along with the state of these pages in my view.
It's probably worth looking at what other wikis do for this. Out of curiosity I've checked the Arch wiki (generally considered to be one of the better sources for Linux troubleshooting information (citation needed)). On the example of audio - it's a particularly difficult one because of the amount of parts involved. The Arch wiki has a Sound system page that has callbacks to other pages; there's a page for ALSA, with Troubleshooting as a subpage; on the other hand, the page for Pipewire only has an inline "Troubleshooting" section, and IIRC that's the case for many more components, including those outside of audio.
I'm not saying that we should copy the Arch wiki wholesale - notably, we have a lot of information specific to pmOS, like mobile device quirks or downstream kernel notes. But we can definitely take some hints from their page organization (...though maybe someone disagrees with this opinion? Again, feedback appreciated).
And speaking of downstream kernels:
Downstream kernel-specific information
As I've mentioned at the beginning of this issue, there's plenty of places on the wiki that focus primarily on downstream kernel specifics. Other than the Audio and WiFi pages, many of the Troubleshooting subpages like Troubleshooting:touchscreen contain downstream kernel info as well.
I want to be clear: this is fine! Most ports in pmOS are downstream, although there are plenty of efforts to make things run (close to) mainline where possible. This information is useful to people who want to get their downstream ports working to at least some ability.
Nonetheless, in some cases, these details are unnecessary to regular users; think how the WiFi page is 90% notes on getting WiFi working on many downstream kernels with many vendors, with only some small notes on the userspace portions, few troubleshooting instructions, and not even making mention of e.g. NetworkManager vs iwd.
Much of this is a matter of reorganization, but I mention downstream kernels specifically because they require a lot of stuff that doesn't apply to mainline, is related to ugly hacks on the part of the vendors, calls back to Android stuff that is irrelevant without a downstream kernel, etc., and since historically these made up the majority of pmOS ports, and pmOS porters made up most of the wiki contributors, the information about downstream ports is mixed into regular articles.
I guess it's a similar case as with the troubleshooting subpage/section choice from earlier; if it's not a lot, it can be put in a small section at the end, if it's more it can be moved to a subpage.
In closing
I've been planning this out for the past few months, and my opinions have shifted around a bit during that time, so sorry if some things seem a bit disjointed. Hopefully you can piece together what I was trying to articulate :)
My opinions are definitely skewed here (and I admit, I might've colored the "issue" a bit too much), so I'd appreciate hearing others' point of view on this.
What do you think about these proposals? What do you think about the Troubleshooting page rewrite? Do you have any suggestions/ideas for the issues I mentioned above? Do you agree/disagree with my reasoning, and why?