~$ uname-aLinux pine64-pinephone 5.17.0-rc8 #1-postmarketos-allwinner SMP PREEMPT Mon Mar 21 08:33:05 UTC 20 aarch64 GNU/Linux~$ doas rtcwake -m mem -s 60doas (user@pine64-pinephone) password: rtcwake: assuming RTC uses UTC ...rtcwake: wakeup from "mem" using /dev/rtc0 at Mon Mar 28 11:23:45 2022rtcwake: write error
using other parameters for -m shows that the timer is set, but the suspending to ram fails.
What device are you using?
pine64-pinephone
On what postmarketOS version did you encounter the issue?
As a workaround (to keep battery life) I've replaced the call to rtcwake by zzz
Suspend to ram with zzz only works if the phone was powered through usb while booting (there are some error messages in dmesg when suspend fails but I don't remember and am currently using my phone ;)
Administratorchanged title from suspend to ram fails on new pinephone kernel to suspend to ram fails on new pinephone kernel when charger is not plugged in·
Imported
changed title from suspend to ram fails on new pinephone kernel to suspend to ram fails on new pinephone kernel when charger is not plugged in
Administratorchanged title from suspend to ram fails on new pinephone kernel when charger is not plugged in to suspend to ram fails on new pinephone kernel (often when charger is not plugged in)·
Imported
changed title from suspend to ram fails on new pinephone kernel when charger is not plugged in to suspend to ram fails on new pinephone kernel (often when charger is not plugged in)
The issue remains the same for me after installing that and rebooting: rtcwake -m mem fails maybe half of the time (with write error), but will always succeed if I plug in the charger.
It does seem like whether the charger was plugged in when it booted and/or having a high battery percentage makes it more likely that it will work.
Doing a cold boot doesn't seem to have an effect. (If this is relevant data: My modem switch is switched off).
In dmesg, I get a bunch of
[ XX.XXXXXX] musb_bus_suspend 2521⋮ trying to suspend as a_wait_bcon while active
(Which it seems @fdlamotte isn't getting, except when using zzz?)
Plugging in the charger stops all those messages from endlessly flowing and emits some anx7688 messages about HARD_RST-ing some "idiot firmware is bored" (which I guess I find mildly amusing).
I now have an hint on why rtcwake did not work, I was not using the -m mem flag, now I have the exact same behavior with rtcwake and zzz => time to get rid of zzz ;)
I was about to say that I only tested loginctl suspend and not rtcwake. So I tested both now... and both are working for me. I've tried multiple times in a row.
user@pine64-pinephone ~ % doas rtcwake -m mem -s 5rtcwake: assuming RTC uses UTC ...rtcwake: wakeup from "mem" using /dev/rtc0 at Sun Apr 3 16:37:25 2022user@pine64-pinephone ~ %
user@pine64-pinephone ~ % cat /sys/firmware/devicetree/base/modelPine64 PinePhone (1.2)% user@pine64-pinephone ~ % uname -a Linux pine64-pinephone 5.17.0-rc8 #2-postmarketos-allwinner SMP PREEMPT Sat Apr 2 15:09:08 UTC 202 aarch64 GNU/Linux
I ran these tests on phosh, installed rtcwake from sxmo-utils.
I powered off the phone and powered it on again without being connected to the charger, as it was mentioned above that this may influence it. Then I tested again, and same result, it works flawlessly for me.
I wanted to try loginctl but I figured out that I didn't have elogind package installed.
After install, before pluging usb-c loginctl "suspend does" nothing (not even messages in dmesg), after pluging the phone it works.
I have not dig much more but just wanted to report that if that helps. Don't know about all these daemons but it could be something with sxmo-images base install ...
Just did some tests on another pinephone (also 1.2b with no SIM).
First updated the os, kernel to 5.17 and with sxmo same symptoms as with the other phone.
Installed phosh over sxmo, after reboot "rtcwake -m mem" did the same as with sxmo, same messages in dmesg. Also tried to wait for the phone to suspend and it never happened (always possible to ssh into the phone). After plugging usb-c, the phone correctly enters crust (and can't ssh). So I have the same behavior with phosh and sxmo ...
I don't think we should downgrade the kernel. This would only be a workaround, and there's already another workaround (using zzz instead of rtcwake) that doesn't have such a big impact.
But the proper thing would be figuring out why this is caused.
So I can't reproduce it on phosh (see above)... some ideas:
Could it be a bug in rtcwake? If zzz works and rtcwake does not, one way to look further at it could be running strace on both and figuring out that way what they do differently.
Could it be a bug in sxmo? Just guessing, but maybe there's logic to prevent suspend when certain processes are running (like the sleep-inhibitor on which we depend in postmarketos-base-ui), and that doesn't work right and therefore prevents suspend? It might be worth testing with phosh too booted from sd card for the ones who can reproduce this, just to rule this out.
Are you sure that the problem is caused by a kernel upgrade? Meaning, can somebody downgrade the kernel on just their phone and verify that it works again?
Maybe bisect the kernel to figure out which patch caused this?
You're probably right, it must be our fault, as we are not that many having the issue ... I may try with a fresh install if I find some time and dig in the differences ... what I am sure is that the issue appeared right after upgrading into 5.17, as I use my phone a lot, it appeared immediately.
Over the difference between rtcwake and zzz, on my install rtcwake does not work worse or better than zzz, I was only using rtcwake badly (without the -m mem option) ...
The only workaround that I found (apart from downgrading the kernel) is to plug the charging cable to stop the messages with musb_bus_suspend. But then it works ok for me so it is ok ... I just prey for not having to reboot my phone without a usb-c plug at hand ;)
In my case suspend to RAM does work with version 5.17.0-rc8, from the linux-postmarketos-allwinner-5.17.0_git20220313 -r0 package, but it does not work with version -r1.
I am using Phosh and I have waked enabled to make gnome-clocks wake the phone when an alarm has to ring.
Suspend to RAM is managed by pmos-tweaks.
Edit: @earboxer I have a 1.2b board.
Edit2: I am getting trying to suspend as a_wait_bcon while active messages with -r0 too. I probably didn't notice before because the phone suspends in the end and the only symptom is that it asks for the SIM PIN maybe once per day.
With phosh suspend works (with loginctl suspend as I didn't want to mess with wifi and install things), with sxmo it doesn't (with rtcwake -m mem -s 10)
I downgraded 'linux-postmarketos-allwinner=5.16.4_git20220131-r2' and I confirm I dont have the issue anymore.
It looks like we have multiple problems with this 5.17.0_git20220313-r1. Dmesg output lots of trying to suspend as a_wait_bcon while active and also some of OOM killer triggers.
Here are the full dmesg output before and after the downgrade (containing two output files):
amended: as you can see, connecting the usb charger made the trying to suspend as a_wait_bcon while active to stop. It allowed me to suspend. After that I may suspend 1/2 times if I disconnect the charger.
This particular driver (musb) logs debug messages via Linux tracing infrastructure,and not to dmesg.https://elixir.bootlin.com/linux/latest/source/drivers/usb/musb/musb_trace.c#L13So you'd need to have a kernel with tracing enabled, and then enable it fromkernel command line arguments, via option: trace_event=musb:*Then you can get the trace from /sys/kernel/debug/trace/...It will be hard to get the trace, due to the nature of the bug. Trace bufferis a cyclic buffer, so it will quickly overflow, and you'll just get thelog for the loop itself, not what happened before it was entered.So you'd have to patch the kernel and add some mechanism to hard-breakthe loop, when you detect it happening inside the musb_bus_suspendfunction. (for example via a static counter variable, allow to run thatfunction 100 times, and then preventing the loop somehow, which willdepend on what info is gathered from the trace.
Here's a patch that may prevent the loop after a few cycles, to be able to get useful trace:
You'll have to disable all logging to all consoles, for it to not disturb the USB timings and introduce its own issues, and just dump the log via dmesg.
If loop stopping code will not work, then it will be hard to get a useful log of what happened prior to the loop.
So for everybody who can reproduce this problem, I've updated !3050 (closed) to temporarily include 0001-usb-musb-add-loop-blocker.patch and set the kernel parameter. We already had CONFIG_TRACING_SUPPORT=y in the kernel config.
It might be that adding the kernel parameter didn't work since there (at least used to be) a limit for max characters we can put in the u-boot config for the kernel command-line, and we were close to the limit before. So it might get cut off and not work, I didn't have time to test that.
Please attempt to create a useful trace:
wait until !3050 (closed) is built (or build it yourself)
mrtest add 3050 to install the new kernel and device package
reboot
verify that the kernel parameter was added as intended (if it didn't work, attempt to fix that... maybe using the musb-printk.patch is easier?)
reproduce the bug
get the trace from /sys/kernel/debug/trace/...
EDIT: the kernel from the MR is lacking tracing kernel config options, so the trace can't be produced with it yet. see below.
So I tried the patch above via mrtest and it works:
pine64-pinephone:~# dmesg | grep "Kernel command line"[ 0.000000] Kernel command line: init=/init.sh rw console=tty0 console=ttyS0,115200 earlycon=uart,mmio32,0x01c28000 panic=10 consoleblank=0 loglevel=1 PMOS_NO_OUTPUT_REDIRECT PMOS_FORCE_PARTITION_RESIZE pmos_boot=/dev/mmcblk2p1 pmos_root=/dev/mmcblk2p2 trace_event=musb:*pine64-pinephone:~# dmesg | grep -A 2 -B 2 "loop prevented"[ 193.203584] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 193.203644] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 193.203652] musb_bus_suspend 2531: loop prevented[ 193.203709] done.[ 193.203760] PM: suspend exit
But no trace event is recorded:
pine64-pinephone:~# ls /sys/kernel/debug/tracels: cannot access '/sys/kernel/debug/trace': No such file or directorypine64-pinephone:~# gunzip -c /proc/config.gz | grep CONFIG_TRACING_SUPPORTCONFIG_TRACING_SUPPORT=y
Though as I am new to kernel debugging I may have overlooked something trivial.
I too got no trace. This seems to suspend appropriately as well. But my modem doesn't come back after suspend now.
[ 448.403779] usb 2-1: device descriptor read/64, error -110[ 464.024009] usb 2-1: device descriptor read/64, error -110[ 464.260046] usb 2-1: new high-speed USB device number 13 using ehci-platform[ 469.652086] usb 2-1: device descriptor read/64, error -110[ 485.272315] usb 2-1: device descriptor read/64, error -110[ 485.380382] usb usb2-port1: attempt power cycle[ 485.588310] usb 2-1: new high-speed USB device number 14 using ehci-platform[ 496.356474] usb 2-1: device not accepting address 14, error -110[ 496.484492] usb 2-1: new high-speed USB device number 15 using ehci-platform
edit - modem came back just as I wrote this. On my main install, I've kept the kernel at 5.16.4, and have had no suspend or modem loss problems.
My experience was pretty much the same as @edorigatti's, but the suspend only happens if I call rtcwake -m mem -s 60 a second time. (not included in the above log).
Looks like it blocks the suspend initially, and then musb_bus_suspend keeps getting called after PM: suspend exit. Hrm. -16 is EBUSY from musb_bus_suspend
[ 399.878155] PM: suspend entry (deep)[ 399.878351] Filesystems sync: 0.000 seconds[ 399.879284] Freezing user space processes ... (elapsed 0.002 seconds) done.[ 399.882041] OOM killer disabled.[ 399.882047] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.[ 399.883528] printk: Suspending console(s) (use no_console_suspend to debug)[ 399.885172] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 399.885220] PM: dpm_run_callback(): usb_dev_suspend+0x0/0x20 returns -16[ 399.885259] usb usb5: PM: failed to suspend async: error -16[ 400.405015] PM: Some devices failed to suspend, or early wake event detected[ 400.425746] OOM killer enabled.[ 400.426003] Restarting tasks ... done.[ 400.428614] PM: suspend exit[ 400.432661] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 400.433007] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 400.433077] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active
I guess you don't have tracing properly enabled in the kernel, or you may need to mount tracefs somewhere manually. For me it's enough to mount debugfs and it contains tracing subdirectory. I'm not sure what's up with your kernel.
I enabled CONFIG_TRACING (see !3063 (53b64bff)), and built (with .ci/build_changed_aports.py aarch64) and installed it, but I still don't have a trace anywhere in /sys/kernel/debug.
When I try to mount it, I get
doas mount -t tracefs tracefs /tmp/tracingmount: /tmp/tracing: unknown filesystem type 'tracefs'. dmesg(1) may have more information after failed mount system call.
Just because you added it to config file, doesn't mean ther kenel config system will accept it. It's best enabled via menuconfig, because these options have some interdependncies, which are easier handled that way, rather than manually. (If you search for it in /proc/config.gz you'll likely find it still disabled)
Anyway, insetead of fighting with tracing, it may be easier to use the patch to enabled logging to kernel log.
[ 55.761452] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 55.761466] port status 00000104[ 55.761529] Root port suspended, power e0[ 55.761537] bogus rh suspend? a_wait_bcon[ 55.761542] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 55.761556] port status 00000104[ 55.761620] Root port suspended, power e0[ 55.761628] bogus rh suspend? a_wait_bcon[ 55.761634] musb_bus_suspend 2523: trying to suspend as a_wait_bcon while active[ 55.761643] musb_bus_suspend 2531: loop prevented
Yes. I compared it with a working suspend/resume and it's pretty much the same other than ->is_active being true for you, which makes the initial suspend fail and then produces the loop somehow. Next step would be figuring out why is_active is true.
I've updated !3050 (closed) to just log to dmesg. I think we should merge this to master in the meantime since this obviously affects a lot of people, but waiting a bit so people can review the MR.
So it happened to me again today with 0.5.9, so it doesn't fix the issue, but it seems way more stable (one problem over 4 days, while it was more than daily with 0.6.1)
Now that's an interesting coincidence, because the modem is on another USB port that doesn't have anything to do with musb. (Another data point: I don't run the modem at all in my testing.)
In dmesg posted by @earboxer there are no entries that would show startup of the modem. So that's another weirdness.
Yeah, I was running with the modem switched off (and could pretty reliably reproduce the issue). I've since switched to using this as my main phone (with SIM card and modem on) so I can test again and get another dmesg log for you if you want.
(but currently the workaround patch has presumably been running fine for me, though maybe I'm more proactive about keeping it plugged in since I rely on it more).
I've spent an evening figuring out how to move forward with this and concluded that it's best to downgrade the kernel on edge from 5.17 to the 5.15 based kernel we use on stable (v21.12).
Summary:
I'm seeing this myself now, sometimes. It seems that it doesn't happen when the battery is nearly fully charged, only when it's about half empty. This lines up with reports here and here.
Looking at git log; as megi said, there's no relevant changes. The only change in the musb dir touches omap2430.c (23907106, which we don't use on the pinephone), everything else is the same since December 2021 (ce1d37cb).
The "loop prevented" hack catches the issue sometimes, but not always as reported here. So this means there's at least one other code path where this can happen.
I've also read some of the musb kernel driver code and the traces posted above, and given the above problems it really doesn't seem easy to figure this out. A bisect would probably help if somebody could do it, of course it takes quite some time and one would need to be able to reliably test the problem on each revision of the bisect.
I think we should stay on the 5.15 based kernel in edge, until we have reason to believe that the bug was fixed (hopefully somebody can do the bisect), or until some time passes and we just try the next kernel. Then give that next kernel a lot of testing, especially related to suspend, before merging it. This is not a great solution, but given that we've been trying to figure this out collectively for 3 weeks already and all have limited time to spend on this, it seems the best we can do.
I'd find the common ancestor, make sure that the bug can't be reproduced there, switch to 5.17-megi, make sure the bug can be reproduced, and then start the bisect.
try with mainline, if you can reproduce it (might be possible, because it seems to happen in peripheral mode of the USB port, which I think is default on mainline - you'd just have to make sure to compile both host and peripheral modes into the musb driver)
if yes, then bisect
otherwise:
pick out only minimal relevant changes for making pp USB work (anx7688 driver + some small relevant changes + latest DT for pinephone from my branch)
make it into a single patch
bisect mainline as above, but apply this patch at each step (making it minimal will ensure it's unlikely to cause conflicts)
I can replicate the behavior reliably (boot with a USB cable connected, disconnect the cable, reboot the phone and wait for suspend to be triggered) and my rig can build a kernel in less than 15 minutes.
However, I'm not proficient with git and not a kernel expert either.
So, if someone can take care of a branch with the bisected kernel sources, I could handle the building and testing part of each bisected rev.
I'll leave this issue intentionally open since the downgrade is of course only a workaround.
By Oliver Smith on 2022-04-21T21:08:14
Administratorchanged title from suspend to ram fails on new pinephone kernel (often when charger is not plugged in) to suspend to ram fails on new pinephone kernel >= 5.16.x (often when charger is not plugged in)·
Imported
changed title from suspend to ram fails on new pinephone kernel (often when charger is not plugged in) to suspend to ram fails on new pinephone kernel >= 5.16.x (often when charger is not plugged in)
For what it's worth I've been on 5.16.4 and haven't experienced it for the past week. I'm not sure what triggered it the first time but it has not happened since. I am also on 0.6.1 of the FOSS modem firmware.
What's weird is that HCI side of the musb driver where this bug happens supposedly should not be used at all, because A64 uses a switch that delegates USB host mode to standard ehci0/ohci0 HW interfaces, which are implemented in a completely different driver.
In fact HCI side of musb HW block is cut off by the HW switch when USB0 is in host mode. And in peripheral mode, I don't see why musb driver would be calling some functions for host side implementation at all, or why they would not be gated somehow at runtime. I believe musb HCI should not be active at all ever, in host or peripheral modes. Maybe
It doesn't help that there are no docs for musb HW, and the driver is not exaclty grokable without understanding how USB OTG is supposed to work in detail.
What might be interesting to try is to cut out host mode code, but still allow to set peripheral-mode = "otg" in DT.
My issue is maybe not the same, but it could help for testing and debugging.
I have noticed that with FOSS Firmware 0.6.1 and the latest kernels (both the 5.17 and the recently downgraded 5.16 version), that my phone was warming up while being idle, and the battery was draining very fast, or draining while plugged in. The only "clue" I was able to get is the following "loop" logs from the firmware linux
It is just an extract, it continues until I was actually connecting via adb shell. What I did then and that seems to mitigate the issue is to change the power control option to on in /usr/lib/udev/rules.d/80-modem-eg25.rules (the default installed by the package is auto).
For me this problem is almost then always reproducible by :
having FOSS 0.6.1 firmware
having auto in /usr/lib/udev/rules.d/80-modem-eg25.rules
turning off completely the phone
plugin in the charging USB cable
plugin in the cable results the phone to boot
then once booter, pressing the power button to "dim" the screen
after some time, it will get warmer and the battery is not charging/discharging
Heating without any apparent system load visible in htop can be comming up from interrupt "storms", or some other hidden things inside the kernel that are not accounted for.
You can check /proc/interrupts to see if there are some interrupts that are accumulating very quickly (at rate >1000/s) when there's no apparent system activity. (both in the pinephone and in the modem)
The messages you posted are separated by many seconds apart. Likely not enough to indicate any problem that would result in heating te phone too much.
@megous Thank you a lot for the patch. This could get my PinePhone daily-driver ready again.
One question about the kernel version: I dual-use pmOS edge and 21.12 on two different SD-cards with SWMO installed. The edge version shows the bug as expected. But the 21.12 version shows the bug too with a 5.15.13 kernel. Is it possible that the bug also affects kernel versions <= 5.16.4?
In this case the downgrade to 5.16.4 would be useless and it should be considered to patch the stable version of pmOS too, if this makes sense.