Applications disappearing/crashing with weird rcu error message on PinePhone [possible BTRFS corruption maybe related? or not, who knows]
Describe your issue
Every hour or so, I see a heavier application disappear with no trace, no SIGSEGV info in dmesg
, but instead I see this in the kernel log:
[Wed Jul 3 20:45:37 2024] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P16405 } 6 jiffies s: 4605 root: 0x0/T
[Wed Jul 3 20:45:37 2024] rcu: blocking rcu_node structures (internal RCU debug):
[Wed Jul 3 20:45:42 2024] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P12530 } 6 jiffies s: 4641 root: 0x0/T
[Wed Jul 3 20:45:42 2024] rcu: blocking rcu_node structures (internal RCU debug):
This seems to cause the tasks to die immediately, whatever it is, and it's happening a lot. Is it some kind of kernel bug or did I do something wrong?
Update: Sadly, I also keep seeing new BTRFS corruption pop up but I tried different brand SD Cards and memtester
and I don't think it's an SD Card or RAM issue. So I'm wondering if there is some deep chipset timing bug still that corrupts data and causes these rcu stalls:
[ 174.297869] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4822 P4933 } 6 jiffies s: 253 root: 0x0/T
[ 174.297949] rcu: blocking rcu_node structures (internal RCU debug):
[ 367.507066] BTRFS info (device dm-0): scrub: started on devid 1
[ 1897.775836] hrtimer: interrupt took 46667 ns
[ 4972.991891] BTRFS error (device dm-0): unable to fixup (regular) error at logical 97421819904 on dev /dev/mapper/root physical 98605006848
[ 4972.992594] BTRFS warning (device dm-0): checksum error at logical 97421819904 on dev /dev/mapper/root, physical 98605006848, root 257, inode 150744, offset 2215936, length 4096, links 1 (path: ellie/Videos/<path removed since irrelevant to issue>)
[ 6614.136596] BTRFS error (device dm-0): unable to fixup (regular) error at logical 133518196736 on dev /dev/mapper/root physical 134701383680
[ 6614.349954] BTRFS warning (device dm-0): checksum error at logical 133518196736 on dev /dev/mapper/root, physical 134701383680, root 257, inode 149666, offset 655360, length 4096, links 1 (path: ellie/Music/<path removed since irrelevant to issue>)
[ 9736.481347] BTRFS info (device dm-0): scrub: finished on devid 1 with status: 0
Update: It may be related to the instability issues seen here: #805 (closed) But as @Arnavion points out maybe not the GPU, but some other component.
What's the expected behaviour?
Applications don't disappear along to some rcu
error being logged
What's the current behaviour?
Applications die along with some rcu
error as seen above
How to reproduce your issue?
I'm really not sure how or why this started happening, but I wonder if it's related to these fixes here: #805 (closed) While I did freshly reinstall postmarketOS to switch to btrfs to avoid silent SD Card corruption, there are no btrfs errors or warnings happening. Also, during all these events syncthing was running and heavily pushing ten thousands of files and therefore should have been hitting the filesystem way more than firefox and VLC, and it never died. It only seems to affect graphical things, like some GPU driver error.
What device are you using?
PinePhone Allwinner 3GB RAM version
On what postmarketOS version did you encounter the issue?
-
edge ( master
branch) -
v24.06
-
v23.12
(supported until 2024-07-16) -
I confirm that the issue still is present after running sudo apk upgrade -a
On what environment did you encounter the issue?
Environments
-
GNOME Shell on Mobile -
Phosh -
Plasma Mobile -
Sxmo (Wayland/Sway) Please post the output of sxmo_version.sh
-
Other: Please fill out
How did you get postmarketOS image?
-
from https://images.postmarketos.org -
I built it using pmbootstrap -
It was preinstalled on my device
What's the build date of the image? (in yyyy-mm-dd format)
A few days ago