flypig.co.uk

List items

Items from the current list are shown below.

Gecko

12 Oct 2023 : Day 57 #
We continue today with the render pipeline, following application of a few additional patches yesterday. We also set the values correctly for a couple of preferences (or, more accurately, changed the hard-coded values that are currently acting as proxies for them).

After building packages overnight I've been testing them this morning. Yesterday the initial crash happened due to an attempt to call a SwapChain method before having instantiated the class. Recall that we hit a segmentation fault as a result with the following backtrace.
Thread 35 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 9321]
mozilla::gl::SwapChain::OffscreenSize (this=0x0) at 
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLScreenBuffer.cpp:129
129       return mPresenter->mBackBuffer->mFb->mSize;
(gdb) bt
#0  mozilla::gl::SwapChain::OffscreenSize (this=0x0) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLScreenBuffer.cpp:129
#1  0x0000007fbcc8149c in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7f8859d7f0, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#2  0x0000007fba8d1bec in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7f88737560, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:82
#3  0x0000007fba8d1c48 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7f8859d7f0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007fba8d1cd4 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7f8859d7f0, x=, y=, 
    width=, height=) at /usr/src/debug/
    xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/CompositorBridgeParent.cpp:794
I wasn't able to replicate this backtrace on ESR 78, and the reason turned out to be because ESR 91 was set to render off-screen.

After applying the patches and preference fixes there's no longer a crash. But do we still get this backtrace? I've abridged the output for clarity, but this is what happens:
$ EMBED_CONSOLE=1 gdb sailfish-browser
GNU gdb (GDB) Mer (8.2.1+git9)
(gdb) b EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
(gdb) r
Starting program: /usr/bin/sailfish-browser 
[...]
Thread 34 "Compositor" hit Breakpoint 1, non-virtual thunk to mozilla::
  embedlite::EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
  (mozilla::layers::BaseTransactionId) ()
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.h:58
58	  virtual void CompositeToDefaultTarget(VsyncId aId) override;
(gdb) bt
#0  non-virtual thunk to mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget(mozilla::layers::BaseTransactionId
    ) () at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/
    mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.h:58
#1  0x0000007fba8d1bec in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7f88709ef0, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:82
#2  0x0000007fba8d1c48 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7f8877d1c0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#3  0x0000007fba8d1cd4 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7f8877d1c0, x=,
    y=, width=, height=) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:794
#4  0x0000007fba8ca870 in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
[...]
(gdb) 
I'm a little surprised to see that this still matches up with the path from yesterday: ResumeCompositionAndResize() calling ResumeComposition() calling ForceComposeToTarget() calling CompositeToDefaultTarget().

In comparison on ESR 78 the first hit is ResumeComposition() calling CompositeToDefaultTarget() directly:
Thread 40 "Compositor" hit Breakpoint 1, non-virtual thunk to mozilla::
    embedlite::EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
    (mozilla::layers::BaseTransactionId) ()
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/mobile/
    sailfishos/embedthread/EmbedLiteCompositorBridgeParent.h:58
58        virtual void CompositeToDefaultTarget(VsyncId aId) override;
(gdb) bt
#0  non-virtual thunk to mozilla::embedlite::EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
    (mozilla::layers::BaseTransactionId) () at /usr/src/
    debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.h:58
#1  0x0000007ff2a729b0 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=0x7fb89be110)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
    obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#2  0x0000007ff2a662ec in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
    obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1188
[...]
#15 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
It's always hard to follow the callstack back before any of the Runnable steps because these are events that have potentially been scheduled on a separate thread. Looking at the code though, it looks like the resize may be triggered by a call to CompositorBridgeParent::ScheduleResumeOnCompositorThread(). It resizes depending on whether it has dimension arguments or not.

So I've placed a breakpoint on this method and kicked off the execution again for both versions.

On the ESR 91 side I get this:
Thread 1 "sailfish-browse" hit Breakpoint 2, mozilla::layers::
    CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7f8877d670, 
    x=0, y=0, width=1080, height=2520) at /usr/src/debug/
    xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/CompositorBridgeParent.cpp:829
829	                                                              int height) {
(gdb) bt
#0  mozilla::layers::CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7f8877d670, x=0, y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:829
#1  0x0000007fbcc81b74 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    ResumeRendering (this=0x7f8877d670)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.cpp:295
#2  0x0000007fbcc99ccc in mozilla::embedlite::EmbedLiteWindowParent::
    ResumeRendering (this=)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/EmbedLiteWindowParent.cpp:100
#3  0x0000007fbcc84688 in mozilla::embedlite::EmbedLiteWindow::
    ResumeRendering (this=)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    EmbedLiteWindow.cpp:70
#4  0x000000555559004c in _start ()
(gdb) 
While on the ESR 78 side I get this:
Thread 1 "sailfish-browse" hit Breakpoint 2, mozilla::layers::
    CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7fb89c5e20, 
    x=0, y=0, width=1080, height=2520) at /usr/src/debug/
    xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:814
814       MonitorAutoLock lock(mResumeCompositionMonitor);
(gdb) bt
#0  mozilla::layers::CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7fb89c5e20, x=0, y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/
    layers/ipc/CompositorBridgeParent.cpp:814
#1  0x0000007ff4d83188 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    ResumeRendering (this=0x7fb89c5e20)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/mobile/
    sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:279
#2  0x000000555559004c in _start ()
(gdb) 
It's also notable that on the ESR 78 side I'm getting this, which isn't appearing for ESR 91:
OpenGL compositor Initialized Succesfully.
Version: OpenGL ES 3.2 V@0502.0 (GIT@704ecd9a2b, Ib3f3e69395, 1609240670) (Date:12/29/20)
Vendor: Qualcomm
Renderer: Adreno (TM) 619
FBO Texture Target: TEXTURE_2D
These are all hinting at significant and important differences between the two. Plenty to look into.

But I have to stop for a bit; I'll return to this later.

[...]

Now I'm back and trying to focus again. I've placed a couple of breakpoints on CompositorOGL::Initialize(), which is where the "OpenGL compositor Initialized Succesfully" text is being printed from. My guess is that this will fire on ESR 78 but not on ESR 91. I'm hoping that the ESR 78 backtrace will offer up some hints.

In fact the breakpoint fires for both executables. Here's the ESR 91 backtrace:
Thread 34 "Compositor" hit Breakpoint 2, mozilla::layers::CompositorOGL::
    Initialize (this=0x7eb8110650, out_failureReason=0x7f9c078560)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:380
380	bool CompositorOGL::Initialize(nsCString* const out_failureReason) {
(gdb) bt
#0  mozilla::layers::CompositorOGL::Initialize (this=0x7eb8110650,
    out_failureReason=0x7f9c078560)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:380
#1  0x0000007fba8e03c4 in mozilla::layers::CompositorBridgeParent::NewCompositor
    (this=this@entry=0x7f884c91c0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1493
#2  0x0000007fba8eb440 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f884c91c0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1436
#3  0x0000007fba8eb570 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f884c91c0,
    aBackendHints=..., aId=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1546
#4  0x0000007fbcc81dac in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f884c91c0, aBackendHints=..., 
    aId=...) at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.cpp:86
#5  0x0000007fba27f8a4 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f884c91c0, msg__=...) at
    PCompositorBridgeParent.cpp:1285
[...]
#20 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
The backtrace from ESR 78 is basically identical, just with slightly different line numbers.

Stepping through the method makes clear that it is firing the same debug output on both after all. For some reason it's not showing on the ESR 91 version, but it is getting logged. So this isn't the smoking gun I thought it might be.

This isn't quite a dead end though: it raises a new question about log output. While logging is clearly broken in ESR 91, it's worth getting it to work before proceeding. Having logging working could pay dividends in the long-run. So I'm diverting briefly to try to address this.

Let's get straight in to it. When we execute the browser we get immediate errors that look like this:
JavaScript error: file:///usr/lib64/mozembedlite/components/
  EmbedLiteConsoleListener.js, line 251: TypeError:
  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
  ContentPermissionManager.js, line 94: TypeError:
  XPCOMUtils.generateNSGetFactory is not a function
These are going to stop a lot of the front-end and logging capabilities from working, even if they wouldn't necessarily stop the browser from rendering.

Looking through the code it quickly becomes clear that XPCOMUtils.generateNSGetFactory() has been moved from the XPCOMUtils.jsm file to the ComponentUtils.jsm file. Therefore we have to add a line and amend a line, so that instead of this:
const { XPCOMUtils } = ChromeUtils.import("resource://gre/modules/XPCOMUtils.jsm");
const { Services } = ChromeUtils.import("resource://gre/modules/Services.jsm");
[...]
this.NSGetFactory = XPCOMUtils.generateNSGetFactory([$EmbedLiteConsoleListener]);
We have this:
const { ComponentUtils } = ChromeUtils.import("resource://gre/modules/ComponentUtils.jsm");
const { XPCOMUtils } = ChromeUtils.import("resource://gre/modules/XPCOMUtils.jsm");
const { Services } = ChromeUtils.import("resource://gre/modules/Services.jsm");
[...]
this.NSGetFactory = ComponentUtils.generateNSGetFactory([$EmbedLiteConsoleListener]);
Just changing this in EmbedLiteConsoleListener.js is enough to get logging working. I should quickly sort it for all of the files shown in the logs though for good measure. So that's all of these files that where generating the error:
EmbedLiteConsoleListener.js
ContentPermissionManager.js
EmbedLiteChromeManager.js
EmbedLiteErrorPageHandler.js
EmbedLiteFaviconService.js
EmbedLiteOrientationChangeHandler.js
EmbedLiteSearchEngine.js
EmbedLiteSyncService.js
EmbedLiteWebrtcUI.js
EmbedPrefService.js
EmbedliteDownloadManager.js
LoginsHelper.js
PrivateDataManager.js
UserAgentOverrideHelper.js
After updating these files the debug output is both cleaner and more informative. But the browser also now crashes with the following output:
$ EMBED_CONSOLE=1 sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "libGLESv2_adreno.so" not found
library "eglSubDriverAndroid.so" not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[...]
OpenGL compositor Initialized Succesfully.
Version: OpenGL ES 3.2 V@415.0 (GIT@248cd04, I42b5383e2c, 1569430435) (Date:09/25/19)
Vendor: Qualcomm
Renderer: Adreno (TM) 610
FBO Texture Target: TEXTURE_2D
Segmentation fault (core dumped)
And with the following backtrace.
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 7247]
mozilla::PresShell::Observe (this=0x7f8895ec90, aSubject=,
    aTopic=0x7fbe27b780 "look-and-feel-changed", aData=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/base/PresShell.cpp:9876
9876        ThemeChanged(kind);
(gdb) bt
#0  mozilla::PresShell::Observe (this=0x7f8895ec90, aSubject=,
    aTopic=0x7fbe27b780 "look-and-feel-changed", aData=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/base/PresShell.cpp:9876
#1  0x0000007fb9dab038 in nsObserverList::NotifyObservers (this=,
    aSubject=aSubject@entry=0x0, 
    aTopic=aTopic@entry=0x7fbe27b780 "look-and-feel-changed", someData=someData@entry=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/ds/nsTArray.h:413
#2  0x0000007fb9db6718 in nsObserverService::NotifyObservers (this=0x7f88046f30,
    aSubject=0x0, aTopic=0x7fbe27b780 "look-and-feel-changed", aSomeData=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/ds/nsObserverService.cpp:291
#3  0x0000007fbc0c0060 in nsLookAndFeel::Observer::Observe (this=,
    aTopic=, aData=)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/widget/qt/nsLookAndFeel.cpp:94
#4  0x0000007fb9dab038 in nsObserverList::NotifyObservers (this=,
    aSubject=aSubject@entry=0x0, 
    aTopic=aTopic@entry=0x7f88a092d8 "ambience-theme-changed",
    someData=someData@entry=0x7f889b37c8 u"dark")
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/ds/nsTArray.h:413

[...]
#34 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
I can only apologise for all of the backtraces today. These aren't so much rabbit holes as full-blown rabbit warrens. But this is the nature of the work right now, and I'm afraid it's likely to continue to be until this rendering is working. Probably even a lot beyond that.

As for now, these horrific backtraces have led us somewhere useful today, but actually getting to the bottom of it will have to wait until tomorrow.

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.

Comments

Uncover Disqus comments