List items

Items from the current list are shown below.


13 Oct 2023 : Day 58 #
This morning I'm on the train to work after mulling the crash from yesterday. The line that's causing trouble, right at the top of the stack is this one:
    auto kind = widget::ThemeChangeKind(aData[0]);
I've intentionally included the line before as well here, because it doesn't look like the segmentation fault is happening inside ThemeChanged(), but instead is happening on the line before. Maybe the compiler optimised them together?

Looking at the parameters passed on the stack to PresShell::Observe() we can see that aData is a null pointer. So dereferencing it is going to cause a crash.

This is a "look-and-feel-changed" message, and the fact this has only started happening since we sorted out some of the JavaScript suggests that it's being triggered by something in the JavaScript. But that doesn't mean that's where the error is emanating from of course. It pleases me that the "look and feel" code is part of the code I had to mess around with earlier. That means it's somewhat of a known quantity.

The fact this is a "received" message means its origin is going to be masked by the message queue. This means there's no point looking further down the call stack to try to determine where it was sent from. Instead I'm going to grep the code for "look-and-feel-changed" and see what the possibilities are.

It's worth noting that there aren't any instances of this in the embedlite-embedding code, so this didn't come directly from the JavaScript we were just hacking around with.

There are quite a few in the gecko coded though:
$ grep -rIn ""look-and-feel-changed"" *
  os->AddObserver(this, "look-and-feel-changed", false);
  os->RemoveObserver(this, "look-and-feel-changed");
  if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) {
  Services.obs.addObserver(this, "look-and-feel-changed");
  Services.obs.removeObserver(this, "look-and-feel-changed");
  if (topic != "look-and-feel-changed") {
  obs->NotifyObservers(nullptr, "look-and-feel-changed", kind);
  obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr);
  +            obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr);
Only the last three of these are actually of interest. It's worth noting patch 0093 "Add support for prefers-color-scheme" is one of the ones we've applied. It adds support for dark mode, which doesn't seem essential for the build to complete so I'm not sure why I applied it, but there it is.

The line in nsLookAndFeel.cpp that also happens to be one of the lines added by the patch, is notable in that it sends in a nullptr for the data value. That could be the null value that PresShell is choking on.

The patch hasn't changed since ESR 78 so I'm curious to know why it didn't cause trouble in previous versions. The answer to this lies in the PressShell code from ESR 78:
  if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) {
    return NS_OK;
Notice there's no kin parameter being passed in here. That means that what wasn't expecting a data parameter before, is expecting one now. Let's find out why that is.
$ git blame layout/base/PresShell.cpp -L 9873
bb457d5b09e9a layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2020-05-22 23:03:43 +0000  9873)
  if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) {
d622f54db0014 layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2020-10-27 10:24:40 +0000  9874)
  // See how LookAndFeel::NotifyChangedAllWindows encodes this.
0080b2b390d92 layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2021-05-13 01:53:08 +0200  9875)
  auto kind = widget::ThemeChangeKind(aData[0]);
d622f54db0014 layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2020-10-27 10:24:40 +0000  9876)
7f498141e7939 layout/base/PresShell.cpp
  (Adam Gashlin          2019-09-19 19:05:01 +0000  9877)
  return NS_OK;
7f498141e7939 layout/base/PresShell.cpp
  (Adam Gashlin          2019-09-19 19:05:01 +0000  9878)

$ git log -1 d622f54db0014
commit d622f54db00143472ac60aecf7e18728c6a5908c
Author: Emilio Cobos Álvarez 
Date:   Tue Oct 27 10:24:40 2020 +0000

    Bug 1668875 - Distinguish theme changes that can and cannot affect style/layout. r=tnikkel
    This should make the optimization landed earlier in this bug apply for
    some of the NotifyThemeChanged() calls in nsWindow.cpp which are causing
    all the extra invalidations.
    If we know that system colors/fonts didn't change, we can avoid doing a
    bunch of reflow work and the patch from earlier in the bug can avoid
    re-rasterizing images too.
    Differential Revision:
From this diff we can see there's a new file widget/ThemeChangeKind.h with a new enum:
enum class ThemeChangeKind : uint8_t {
  // This is the cheapest change, no need to forcibly recompute style and/or
  // layout.
  MediaQueriesOnly = 0,
  // Style needs to forcibly be recomputed because some of the stuff that may
  // have changed, like system colors, are reflected in the computed style but
  // not in the specified style.
  Style = 1 << 0,
  // Layout needs to forcibly be recomputed because some of the stuff that may
  // have changed is layout-dependent, like system font.
  Layout = 1 << 1,
  // The union of the two flags above.
  StyleAndLayout = Style | Layout,
  // For IPC serialization purposes.
  AllBits = Style | Layout,
The new parameter is supposed to be set as one of these enum values.

Rather than setting it explicitly, there is now this method that's part of the LookAndFeel class. It's static but also public and inherited by nsXPLookAndFeel which is itself inherited by our Qt nsLookAndFeel class, so we should be able to access it directly.

So I have replaced this: obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr); With this new call that essentially wraps the same code in a function:
It's time to set things compiling. Building using mach will take at least a couple of hours, so I'm attempting to use the shortcut described on the "Working with Browser" page. Not just the compile step but also the creation step. This should be a lot quicker.

The build is indeed quicker, although copying the library, including all the debug symbols, over to my phone took an age. Admittedly not a seven-hour age, but still quite some time.

On executing the browser many (but not all) of the JavaScript errors are now gone. There's no segfault any more and no other crashes. The rendering still doesn't work, but we do at least now also have logging:
$ EMBED_CONSOLE=1 sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "" not found
library "" not found
greHome from GRE_HOME:/usr/bin is not found, in /usr/bin/
Created LOG for EmbedLiteTrace
CONSOLE message:
OpenGL compositor Initialized Succesfully.
Version: OpenGL ES 3.2 V@415.0 (GIT@248cd04, I42b5383e2c, 1569430435) (Date:09/25/19)
Vendor: Qualcomm
Renderer: Adreno (TM) 610
FBO Texture Target: TEXTURE_2D
JavaScript error: file:///usr/lib64/mozembedlite/components/
  EmbedLiteChromeManager.js, line 213: NS_ERROR_FILE_NOT_FOUND: 
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Tomorrow I'll continue working on the rendering pipeline.

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.


Uncover Disqus comments