What's the build date of the image? (in yyyy-mm-dd format)
Additional information
in addition, this device is concerned by this modification :
#3363 (closed)
but i dont think it's the reason for this crash. It's "less serious" than the 10mins minimum touchscreen freeze. It doesnt crashes each time after the wakeup screen process.
As usual: backtraces with debug symbols please. @lm2lm2 it would make sense to provide them upfront as otherwise there's basically no way to make that actionable.
@agx so if I understand well, i have to recompile some packages with pmbootstrap? or just to "gdb" the specified process? as the whole GUI crash, i dont know which process i'm supposed to chase. thanks!
First step would be to enable coredumps as described in the link. It really is the same as with calls back then, please refer to that discussion as I don't have the bandwidth to guide you through that again atm.
i think i did it for the wiki link, but without doing a whole gcc (it requires >100MB on a non that large spacedisk device)
what i dont know, is the name of the process i'd like to backtrace, and how to do it, if it's possible to do it (like for gnome-calls) without killing the GUI or something.
i remember last time, on phosh, with the .profile file, two or three lines made some "logs" being gathered, but i dont know at all if it's the right process.
I don't think you need to build anything. Get the coredump, check which process it belongs to. Install debug packages (or use debuginfod). Run gdb on the core dump to provide the backtrace.
Progress . It doesn't tell us very much yet but it tells us something:
the crash is likely in phosh (I say likely as from the information available it could still be that the session exits for another reason and the phosh crash is just an artifact (unlikely but possible)
the crash is not in phosh itself but in gtk's gtk_widget_device_is_shadowed
That could still mean that the crash is triggered by phosh but it could also be a bug in GTK. Given that we've not heard from other platforms experiencing the crash it could be one of pmOSes GTK patches but that is just guessing at this point in time.
Could you load the core file again and type bt full so we can see where the crash originates (along with some details)? As you reported touch related problems with the kernel driver it could also be that e.g. a stuck touch point triggers this (that would also explain why we're not seeing it elsewhere)
(I'm very much looking forward to when we have systemd here too as that makes this kind of debugging a magnitude simpler).
Given that we've not heard from other platforms experiencing the crash it could be one of pmOSes GTK patches but that is just guessing at this point in time.
Hi, just jumping in here to say that we don't have any patches on top of GTK anymore
(I'm very much looking forward to when we have systemd here too as that makes this kind of debugging a magnitude simpler).
Hi, just jumping in here to say that we don't have any patches on top of GTK anymore
Thanks for chiming in. I was imprecise here, sorry. I meant to say alpine/pmoOS . There's two touch related patches still at https://github.com/alpinelinux/aports/tree/master/main/gtk%2B3.0 . I think both are needed (and look o.k. form a quick scan) and just wanted to point out that e.g. a patch getting somehow mangled during rebasing or a similar misshappening could cause that. It's highly unlikely as we're not seeing it on other devices but before reporting e.g. upstream we would want to check it still crashes without those patches. Sorry for causing confusion.
@agx
i figured out that the "blocked" environnment on the video comes from the fact gdb is launched from ssh/console with phosh as target, but as it's still started.
when i stop gdb, it retakes on the proper way.
what i dont understand, is :
-how to gdb the phosh process, without having to grab it "on the run", means while phosh already runs, as "killall" it immediately restarts it
-how to get both "bt all" + "run" args directly within the gdb command (possible?)
-how to get it gdb from the postmarketos boot process, with |tee to grab everything to a file.
a bit like if a script ordered from rc.local, to make phosh being processed by gdb to output everything needed.
does it looks like right?
thank you
attachment : the "#0 __syscall_cp_asm () at src/thread/aarch64/syscall_cp.s:28"
with "bt full" (after it froze)
Only of the information in there is not enough attaching to the running process is needed. That said if you want to attach gdb to the running process either do so over ssh as then there's no problem if the process freezes for a moment. This also allows to e.g. interrupt the process.
Seems you didn't pick up the core file. That said: I'd not focus on this bug but rather on the kernel touch problems. My assumption is that the fact that you remove the module and reinsert it triggers a bug in GTK. That is still worth fixing but fixing the kernel side would likely resolve your touch not working issues and this crash.
The core file is the file that contains the core dump. See my link to the pmOS documentation above. You can read up more on that e.g. on wikipedia https://en.wikipedia.org/wiki/Core_dump
the thing is, in my point of view, i did well what the wiki says, so on my side i would not understand why the output doesnt fits with the requirement.
So now, except "upload" all specific files generated within /tmp/, i admit that i'm lost.
The pulseaudio ones look better now (since pa is in
PATH).‘phosh‘isn′tin‘
PATH` hence:
gdb </path/to/binary/phosh> <corefile> for the phosh ones (I recommend to spend a couple of minutes to read up on how gdb works (e.g. https://cgi.cse.unsw.edu.au/~learn/debugging/modules/gdb_coredumps/) as that makes this way easier and faster for you with less turn arounds, a web search brings up a more good tutorials).
That looks better but it still lacks debug symbols for most of the libraries, with those installed (and thus having more function names in there) we could try to make a better guess. Note that there's no need to redo the crashes / core files, adding the debug symbols and redoing the backtraces is enough.
That said my comment from above still holds: it'll likely go away when fixing the kernel side.
just checked, various thing : while having a SIP call, the crash happened within few minutes after starting it.
strange : having just before about 15min of cellphone call (not sip line, mobile line), about 15min without any interruption.. i dont know if it's normal;
It still doesn't contain a backtrace with debug symbols. Also note: #3371 (comment 460052) .
EDIT: You had better backtraces before that had function names insted of ?? (), see #3371 (comment 461291) and my comment on that:
Note that there's no need to redo the crashes / core files, adding the debug symbols and redoing the backtraces is enough.
I recommend to learn about how to do these things with your own custom gtk programs, crash them apply gdb, etc. it will then be easy to reproduce for phosh once you know how it is supposed to work.
Looks much better. The interesting bit is gtk_menu_button_toggled. Does this maybe only happen after you suspend the device via the top left power menu? (The power menu is the only GtkMenuButton we have in phosh.
you suspend the device via the top left power menu?
well i rarely use it to suspend the device : either i let the inactivity timer doing it, or i use the power button itself.. sometimes i just use the menu to reboot, when im not in ssh/root..
for the "crash", it happens.. when i dont use it or when i try to unlock the device, within the 15 seconds..the touchsreen is then weird, only the top part responds, as i can showup the notification bar to the down, but i can't swipe in the other sense eg to make the keypad for pincode appear.
it takes about 1 min to get the gui completely available then. Maybe i might to a video of this?
In the first place I'd look at the kernel in case the driver still has the issues you mentioned in #3363 (closed). That will fix a heap of problems (likely including this one).
If you rather want to continue on the Phosh/GTK side:
If your line numbers in GTK match mine we're crashing on
which is taken from the gesture in event_controller_grab_notify.
so the device isn't valid anymore (which would match the fact that you forcefully add/remove the touch driver in your workaround (#3363 (closed)), GTK freed it's memory but this part of GTK doesn't know about it.
But the actual question is why we end up in gtk_popover_show () at all. We have gtk_toggle_button_toggled further up in the backtrace. But there's no toggle button on the actual lock screen but there is one in the the audio settings (when you pull down the top-bar). Is that somehow involved in what you do? (EDIT: but that button has no menu so I still assume what you're triggering is the power menu as that is a GtkMenuButton and has a popover so would match the backtrace). Likely you have a stuck touch point, maybe try https://gitlab.gnome.org/World/Phosh/phoc/-/merge_requests/615 , we might be lucky and spot that on screen).
(but let me repeat: fixing (or adding a work around) to the kernel will likely make this disappear too).
It seems the backtrace didn't pick up the gdb macros shipped by glib (Debian has them at /usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.8200.4-gdb.py`). This would give us more information about the types and signals involved. Is there an extra package needed to get those? Maybe the glib-dev package? If so could you redo the backtrace with that installed (no need to repeat the crash, just need to rerun the gdb command).
. But there's no toggle button on the actual lock screen but there is one in the the audio settings (when you pull down the top-bar). Is that somehow involved in what you do?
well, when the crash happen, i have to verify that the touchscreen work or not, for this i swipe either from bottom>top, or top>bottom, as most ofen it's top>bottom.. because sometimes when i do bottom>top it requires several timse to "grab" the keylock pad. so i do more often top>bottom and i guess it's causing the crash, but sometimes it's also in unlocked screen, just after having it unlocked, where it crash at the main screen with several apps opened.
what is also really strange, is that when the bug occurs, only swipe from top>bottom works (swiping from border of screen), but not on the screen itself, no way to disable/enable things on notification's panel buttons..
please notice that this device was restarted few hours ago, and the volup/power combination has not been used since, even if the screen in on /3 minutes in the root crontab. Mainly i use the volup thing only when it doesnt responds for minutes, without doing what happens in the video.
As noted above: If you enable touch point debugging you'd see if the compositor recognizes the touch point (i.e. if the kernel is behaving correctly). Phoc 0.45 will allow to change toggle that at runtime
That worked, nice! This shows us that PhoshTopPanel is the parent of our Popover menu so our guess with the power button menu was right. Other parts show that PhoshHome is in the mix too when notifying the grab.
Can you check which variable is actually triggering the issue? I assume it's device. A print *device should tell.
Is that still with the touch controller hack to reload the module?