List items
Items from the current list are shown below.
Gecko
13 Oct 2023 : Day 58 #
This morning I'm on the train to work after mulling the crash from yesterday. The line that's causing trouble, right at the top of the stack is this one:
Looking at the parameters passed on the stack to PresShell::Observe() we can see that aData is a null pointer. So dereferencing it is going to cause a crash.
This is a "look-and-feel-changed" message, and the fact this has only started happening since we sorted out some of the JavaScript suggests that it's being triggered by something in the JavaScript. But that doesn't mean that's where the error is emanating from of course. It pleases me that the "look and feel" code is part of the code I had to mess around with earlier. That means it's somewhat of a known quantity.
The fact this is a "received" message means its origin is going to be masked by the message queue. This means there's no point looking further down the call stack to try to determine where it was sent from. Instead I'm going to grep the code for "look-and-feel-changed" and see what the possibilities are.
It's worth noting that there aren't any instances of this in the embedlite-embedding code, so this didn't come directly from the JavaScript we were just hacking around with.
There are quite a few in the gecko coded though:
The line in nsLookAndFeel.cpp that also happens to be one of the lines added by the patch, is notable in that it sends in a nullptr for the data value. That could be the null value that PresShell is choking on.
The patch hasn't changed since ESR 78 so I'm curious to know why it didn't cause trouble in previous versions. The answer to this lies in the PressShell code from ESR 78:
Rather than setting it explicitly, there is now this method that's part of the LookAndFeel class. It's static but also public and inherited by nsXPLookAndFeel which is itself inherited by our Qt nsLookAndFeel class, so we should be able to access it directly.
So I have replaced this: obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr); With this new call that essentially wraps the same code in a function:
The build is indeed quicker, although copying the library, including all the debug symbols, over to my phone took an age. Admittedly not a seven-hour age, but still quite some time.
On executing the browser many (but not all) of the JavaScript errors are now gone. There's no segfault any more and no other crashes. The rendering still doesn't work, but we do at least now also have logging:
As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
auto kind = widget::ThemeChangeKind(aData[0]); ThemeChanged(kind);I've intentionally included the line before as well here, because it doesn't look like the segmentation fault is happening inside ThemeChanged(), but instead is happening on the line before. Maybe the compiler optimised them together?
Looking at the parameters passed on the stack to PresShell::Observe() we can see that aData is a null pointer. So dereferencing it is going to cause a crash.
This is a "look-and-feel-changed" message, and the fact this has only started happening since we sorted out some of the JavaScript suggests that it's being triggered by something in the JavaScript. But that doesn't mean that's where the error is emanating from of course. It pleases me that the "look and feel" code is part of the code I had to mess around with earlier. That means it's somewhat of a known quantity.
The fact this is a "received" message means its origin is going to be masked by the message queue. This means there's no point looking further down the call stack to try to determine where it was sent from. Instead I'm going to grep the code for "look-and-feel-changed" and see what the possibilities are.
It's worth noting that there aren't any instances of this in the embedlite-embedding code, so this didn't come directly from the JavaScript we were just hacking around with.
There are quite a few in the gecko coded though:
$ grep -rIn ""look-and-feel-changed"" * gecko-dev/layout/base/PresShell.cpp:1004: os->AddObserver(this, "look-and-feel-changed", false); gecko-dev/layout/base/PresShell.cpp:1337: os->RemoveObserver(this, "look-and-feel-changed"); gecko-dev/layout/base/PresShell.cpp:9873: if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) { gecko-dev/browser/base/content/browser.js:8677: Services.obs.addObserver(this, "look-and-feel-changed"); gecko-dev/browser/base/content/browser.js:8694: Services.obs.removeObserver(this, "look-and-feel-changed"); gecko-dev/browser/base/content/browser.js:8702: if (topic != "look-and-feel-changed") { gecko-dev/widget/nsXPLookAndFeel.cpp:902: obs->NotifyObservers(nullptr, "look-and-feel-changed", kind); gecko-dev/widget/qt/nsLookAndFeel.cpp:94: obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr); rpm/0093-sailfishos-gecko-Add-support-for-prefers-color-schem.patch:95: + obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr);Only the last three of these are actually of interest. It's worth noting patch 0093 "Add support for prefers-color-scheme" is one of the ones we've applied. It adds support for dark mode, which doesn't seem essential for the build to complete so I'm not sure why I applied it, but there it is.
The line in nsLookAndFeel.cpp that also happens to be one of the lines added by the patch, is notable in that it sends in a nullptr for the data value. That could be the null value that PresShell is choking on.
The patch hasn't changed since ESR 78 so I'm curious to know why it didn't cause trouble in previous versions. The answer to this lies in the PressShell code from ESR 78:
if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) { ThemeChanged(); return NS_OK; }Notice there's no kin parameter being passed in here. That means that what wasn't expecting a data parameter before, is expecting one now. Let's find out why that is.
$ git blame layout/base/PresShell.cpp -L 9873 bb457d5b09e9a layout/base/PresShell.cpp (Emilio Cobos Álvarez 2020-05-22 23:03:43 +0000 9873) if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) { d622f54db0014 layout/base/PresShell.cpp (Emilio Cobos Álvarez 2020-10-27 10:24:40 +0000 9874) // See how LookAndFeel::NotifyChangedAllWindows encodes this. 0080b2b390d92 layout/base/PresShell.cpp (Emilio Cobos Álvarez 2021-05-13 01:53:08 +0200 9875) auto kind = widget::ThemeChangeKind(aData[0]); d622f54db0014 layout/base/PresShell.cpp (Emilio Cobos Álvarez 2020-10-27 10:24:40 +0000 9876) ThemeChanged(kind); 7f498141e7939 layout/base/PresShell.cpp (Adam Gashlin 2019-09-19 19:05:01 +0000 9877) return NS_OK; 7f498141e7939 layout/base/PresShell.cpp (Adam Gashlin 2019-09-19 19:05:01 +0000 9878) } $ git log -1 d622f54db0014 commit d622f54db00143472ac60aecf7e18728c6a5908c Author: Emilio Cobos ÁlvarezFrom this diff we can see there's a new file widget/ThemeChangeKind.h with a new enum:Date: Tue Oct 27 10:24:40 2020 +0000 Bug 1668875 - Distinguish theme changes that can and cannot affect style/layout. r=tnikkel This should make the optimization landed earlier in this bug apply for some of the NotifyThemeChanged() calls in nsWindow.cpp which are causing all the extra invalidations. If we know that system colors/fonts didn't change, we can avoid doing a bunch of reflow work and the patch from earlier in the bug can avoid re-rasterizing images too. Differential Revision: https://phabricator.services.mozilla.com/D94425
enum class ThemeChangeKind : uint8_t { // This is the cheapest change, no need to forcibly recompute style and/or // layout. MediaQueriesOnly = 0, // Style needs to forcibly be recomputed because some of the stuff that may // have changed, like system colors, are reflected in the computed style but // not in the specified style. Style = 1 << 0, // Layout needs to forcibly be recomputed because some of the stuff that may // have changed is layout-dependent, like system font. Layout = 1 << 1, // The union of the two flags above. StyleAndLayout = Style | Layout, // For IPC serialization purposes. AllBits = Style | Layout, };The new parameter is supposed to be set as one of these enum values.
Rather than setting it explicitly, there is now this method that's part of the LookAndFeel class. It's static but also public and inherited by nsXPLookAndFeel which is itself inherited by our Qt nsLookAndFeel class, so we should be able to access it directly.
So I have replaced this:
NotifyChangedAllWindows(widget::ThemeChangeKind::StyleAndLayout);It's time to set things compiling. Building using mach will take at least a couple of hours, so I'm attempting to use the shortcut described on the "Working with Browser" page. Not just the compile step but also the libxul.so creation step. This should be a lot quicker.
The build is indeed quicker, although copying the library, including all the debug symbols, over to my phone took an age. Admittedly not a seven-hour age, but still quite some time.
On executing the browser many (but not all) of the JavaScript errors are now gone. There's no segfault any more and no other crashes. The rendering still doesn't work, but we do at least now also have logging:
$ EMBED_CONSOLE=1 sailfish-browser [D] unknown:0 - Using Wayland-EGL library "libGLESv2_adreno.so" not found library "eglSubDriverAndroid.so" not found greHome from GRE_HOME:/usr/bin libxul.so is not found, in /usr/bin/libxul.so Created LOG for EmbedLiteTrace [...] CONSOLE message: OpenGL compositor Initialized Succesfully. Version: OpenGL ES 3.2 V@415.0 (GIT@248cd04, I42b5383e2c, 1569430435) (Date:09/25/19) Vendor: Qualcomm Renderer: Adreno (TM) 610 FBO Texture Target: TEXTURE_2D JavaScript error: file:///usr/lib64/mozembedlite/components/ EmbedLiteChromeManager.js, line 213: NS_ERROR_FILE_NOT_FOUND: JSScript: ContextMenuHandler.js loaded JSScript: SelectionPrototype.js loaded JSScript: SelectionHandler.js loaded JSScript: SelectAsyncHelper.js loaded JSScript: FormAssistant.js loaded JSScript: InputMethodHandler.js loaded EmbedHelper init called [...]Tomorrow I'll continue working on the rendering pipeline.
As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comments
Uncover Disqus comments