flypig.co.uk

Gecko-dev Diary

Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.

Latest code changes are in the gecko-dev sailfishos-esr91 branch.

There is an index of all posts in case you want to jump to a particular day.

Gecko RSS feed Click the icon for the Gecko-dev Diary RSS feed.

Gecko

5 most recent items

22 Feb 2024 : Day 164 #
Working on the WebView implementation, yesterday we reached the point where the WebView component no longer crashed the app hosting it. We did this by ensuring the correct layer manager was used for rendering.

But now we're left with a bunch of errors. The ones that need fixing immediately are the following:
[W] unknown:7 - file:///usr/share/harbour-webview/qml/harbour-webview.qml:7:30:
    Type WebViewPage unavailable 
         initialPage: Component { WebViewPage { } } 
                                  ^
[W] unknown:13 - file:///usr/share/harbour-webview/qml/pages/
    WebViewPage.qml:13:5: Type WebView unavailable 
         WebView { 
         ^
[W] unknown:141 - file:///usr/lib64/qt5/qml/Sailfish/WebView/
    WebView.qml:141:9: Type TextSelectionController unavailable 
             TextSelectionController { 
             ^
[W] unknown:14 - file:///usr/lib64/qt5/qml/Sailfish/WebView/Controls/
    TextSelectionController.qml:14:1: module "QOfono" is not installed 
     import QOfono 0.2 
     ^
This cascade of errors all reduces to the last:
[W] unknown:14 - file:///usr/lib64/qt5/qml/Sailfish/WebView/Controls/
    TextSelectionController.qml:14:1: module "QOfono" is not installed 
     import QOfono 0.2 
     ^
The reason for this is also clear. The spec file for sailfish-components-webview makes clear that libqofono 0.117 or above is needed. I don't have this on my system for whatever reason (I'll need to investigate), but to work around this I hacked the spec file so that it wouldn't refuse to install on a system with a lower version, like this:
diff --git a/rpm/sailfish-components-webview.spec
           b/rpm/sailfish-components-webview.spec
index 766933ba..c311ebcf 100644
--- a/rpm/sailfish-components-webview.spec
+++ b/rpm/sailfish-components-webview.spec
@@ -18,7 +18,7 @@ Requires: sailfishsilica-qt5 >= 1.1.123
 Requires: sailfish-components-media-qt5
 Requires: sailfish-components-pickers-qt5
 Requires: embedlite-components-qt5 >= 1.21.2
-Requires: libqofono-qt5-declarative >= 0.117
+Requires: libqofono-qt5-declarative >= 0.115
 
 %description
 %{summary}.
There's no build-time requirement, so I thought I might get away with it. But clearly not.

It seems a bit odd that a text selector component should be requiring an entire separate phone library in order to work. Let's take a look at why.

The ofono code comes at the end of the file. There are two OfonoNetworkRegistration components called cellular1Status and cellular2Status. These represent the state of the two SIM card slots in the device. You might ask why there are only two; can't you have more than two SIM card slots? Well, yes, but I guess this is a problem for future developers to deal with.

These two components feed into the following Boolean value at the top of the code:
    readonly property bool _canCall: cellular1Status.registered
        || cellular2Status.registered
Later on in the code we see this being used, like this:
        isPhoneNumber = _canCall && _phoneNumberSelected
So what's this all for? When you select some text the browser will present you with some options for what to do with it. Copy to clipboard? Open a link? If it thinks it's a phone number it will offer to make a call to it for you. Unless you don't have a SIM card installed. So that's why libqofono is needed here.

You might wonder how it knows it's a phone number at all. The answer to this question isn't in the sailfish-components-webview code. The answer is in embedlite-components, in the SelectionPrototype.js file where we find this code:
  _phoneRegex: /^\+?[0-9\s,-.\(\)*#pw]{1,30}$/,

  _getSelectedPhoneNumber: function sh_getSelectedPhoneNumber() {
    return this._isPhoneNumber(this._getSelectedText().trim());
  },

  _isPhoneNumber: function sh_isPhoneNumber(selectedText) {
    return (this._phoneRegex.test(selectedText) ? selectedText : null);
  },
So the decision about whether something is a phone number or not comes down to whether it satisfies the regex /^\+?[0-9\s,-.\(\)*#pw]{1,30}$/ and whether you have a SIM card installed.

But that's a bit of a diversion. We only care about this new libqofono. Why is this newer version needed and why don't I have it on my system? Let's find out when and why it was changed. $ git blame import/controls/TextSelectionController.qml -L 14,14 16ef5cdf4 (Pekka Vuorela 2023-01-05 12:09:27 +0200 14) import QOfono 0.2 $ git log -1 16ef5cdf4 commit 16ef5cdf44c2eafd7d93e17a41927ef5da700c2b Author: Pekka Vuorela <pekka.vuorela@jolla.com> Date: Thu Jan 5 12:09:27 2023 +0200 [components-webview] Migrate to new qofono import. JB#59690 Also dependency was missing. The actual change here was pretty small.
$ git diff 16ef5cdf44c2eafd7d93e17a41927ef5da700c2b~ \
    16ef5cdf44c2eafd7d93e17a41927ef5da700c2b
diff --git a/import/controls/TextSelectionController.qml
           b/import/controls/TextSelectionController.qml
index 5c8f2845..71bd83cc 100644
--- a/import/controls/TextSelectionController.qml
+++ b/import/controls/TextSelectionController.qml
@@ -11,7 +11,7 @@
 
 import QtQuick 2.1
 import Sailfish.Silica 1.0
-import MeeGo.QOfono 0.2
+import QOfono 0.2
 
 MouseArea {
     id: root
diff --git a/rpm/sailfish-components-webview.spec
           b/rpm/sailfish-components-webview.spec
index 9a2a3154..5729a8d9 100644
--- a/rpm/sailfish-components-webview.spec
+++ b/rpm/sailfish-components-webview.spec
@@ -18,6 +18,7 @@ Requires: sailfishsilica-qt5 >= 1.1.123
 Requires: sailfish-components-media-qt5
 Requires: sailfish-components-pickers-qt5
 Requires: embedlite-components-qt5 >= 1.21.2
+Requires: libqofono-qt5-declarative >= 0.117
 
 %description
 %{summary}.
The import has been updated as have the requirements. But there's been no change to the code, so the libqofono version requirement is probably only needed to deal with the name change of the import.

None of this seems essential for ESR 91. My guess is that this change has gone into the development code but hasn't yet made it into a release. So I'm going to hack around it for now (being careful not to commit my hacked changes into the repository).

I've already amended the version number in the spec file, so to get things to work I should just have to reverse this change:
-import MeeGo.QOfono 0.2
+import QOfono 0.2
I can do that on-device. This should do it:
sed -i -e 's/QOfono/MeeGo.QOfono/g' \
    /usr/lib64/qt5/qml/Sailfish/WebView/Controls/TextSelectionController.qml
Great! That's removed the QML error. But now the app is back to crashing again before it gets to even try to render something on-screen:
$ harbour-webview 
[D] unknown:0 - QML debugging is enabled. Only use this in a safe environment.
[...]
UserAgentOverrideHelper app-startup
CONSOLE message:
[JavaScript Error: "Unexpected event profile-after-change" {file:
    "resource://gre/modules/URLQueryStrippingListService.jsm" line: 228}]
observe@resource://gre/modules/URLQueryStrippingListService.jsm:228:12

Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
Segmentation fault
So it's back to the debugger again. But this will have to wait until this evening.

[...]

It's the evening and time to put the harbour-webview example through the debugger.
$ gdb harbour-webview 
GNU gdb (GDB) Mer (8.2.1+git9)
[...]
Thread 36 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 24061]
mozilla::gl::SwapChain::OffscreenSize (this=0x0)
    at gfx/gl/GLScreenBuffer.cpp:129
129       return mPresenter->mBackBuffer->mFb->mSize;
(gdb) bt
#0  mozilla::gl::SwapChain::OffscreenSize (this=0x0)
    at gfx/gl/GLScreenBuffer.cpp:129
#1  0x0000007ff3667930 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc4be8da0, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#2  0x0000007ff12b808c in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7fc4d0c0b0, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    LayersTypes.h:82
#3  0x0000007ff12b80e8 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7fc4be8da0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007ff12b8174 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7fc4be8da0, x=<optimized out>,
    y=<optimized out>, width=<optimized out>, height=<optimized out>)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:794
#5  0x0000007ff12b0d10 in mozilla::detail::RunnableMethodArguments<int, int,
    int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void
    (mozilla::layers::CompositorBridgeParent::*)(int, int, int, int),
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>,
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul,
    2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
[...]
#17 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
(gdb) 
This is now a proper crash, not something induced intentionally by the code. Here's the actual code causing the crash taken from GLSCreenBuffer.cpp:
const gfx::IntSize& SwapChain::OffscreenSize() const {
  return mPresenter->mBackBuffer->mFb->mSize;
}
The problem here being that the SwapChain object itself is null. So we should look in the calling method to find out what's going on there. Here's the relevant code this time from EmbedLiteCompositorBridgeParent.cpp:
void
EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget(VsyncId aId)
{
  GLContext* context = static_cast<CompositorOGL*>(state->mLayerManager->
      GetCompositor())->gl();
[...]
  if (context->IsOffscreen()) {
    MutexAutoLock lock(mRenderMutex);
    if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize
      && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) {
      return;
    }
  }
With a bit of digging we can see that the value being returned by context->GetSwapChain() is null:
(gdb) frame 1
#1  0x0000007ff3667930 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc4be8da0, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
290     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:
    No such file or directory.
(gdb) p context
$2 = (mozilla::gl::GLContext *) 0x7ed819ee00
(gdb) p context->GetSwapChain()
Cannot evaluate function -- may be inlined
(gdb) p context.mSwapChain
$3 = {
  mTuple = {<mozilla::detail::CompactPairHelper<mozilla::gl::SwapChain*,
    mozilla::DefaultDelete<mozilla::gl::SwapChain>,
    (mozilla::detail::StorageType)1, (mozilla::detail::StorageType)0>> =
    {<mozilla::DefaultDelete<mozilla::gl::SwapChain>> = {<No data fields>},
    mFirstA = 0x0}, <No data fields>}}
(gdb) p context.mSwapChain.mTuple
$4 = {<mozilla::detail::CompactPairHelper<mozilla::gl::SwapChain*,
    mozilla::DefaultDelete<mozilla::gl::SwapChain>,
    (mozilla::detail::StorageType)1, (mozilla::detail::StorageType)0>> =
    {<mozilla::DefaultDelete<mozilla::gl::SwapChain>> = {<No data fields>},
    mFirstA = 0x0}, <No data fields>}
(gdb) p context.mSwapChain.mTuple.mFirstA
$5 = (mozilla::gl::SwapChain *) 0x0
(gdb) 
You may recall that way back in the first three weeks of working on Gecko I hit a problem with the rendering pipeline. The GLScreenBuffer structure that the WebView has been using for a long time had been completely removed and replaced with this SwapChain class.

At the time I struggled with how to rearrange the code so that it compiled. I made changes that I couldn't test. And while I did get it to compile, these changes are now coming back to haunt me. Now I need to actually fix this rendering pipeline properly.

There's a bit of me that is glad I'm finally having to do this. I really want to know how it's actually supposed to work.

Clearly the first task will be to figure out why the mSwapChain member of GLContext is never being set. With any luck this will be at the easier end of the difficulty spectrum.

I'm going to try to find where mSwapChain is being — or should be being — set. To do that I'll need to find out where the context is coming from. The context is being passed by CompositorOGL so that would seem to be a good place to start.

Looking through the CompositoryOGL.cpp file we can see that the mGLContext member is being initialised from a value passed in to CompositorOGL::Initialize(). The debugger can help us work back from there.
(gdb) break CompositorOGL::Initialize
Breakpoint 1 at 0x7ff11b0c3c: file gfx/layers/opengl/CompositorOGL.cpp,
    line 380.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
Thread 36 "Compositor" hit Breakpoint 1, mozilla::layers::CompositorOGL::
    Initialize (this=0x7ee0002f50, out_failureReason=0x7f1faac520)
    at gfx/layers/opengl/CompositorOGL.cpp:380
380     bool CompositorOGL::Initialize(nsCString* const out_failureReason) {
(gdb)
Ah! This is interesting. It's not being passed in because there are two different overloads of the CompositorOGL::Initialize() method and the code is using the other one. In this other piece of code the context is created directly:
bool CompositorOGL::Initialize(nsCString* const out_failureReason) {
  ScopedGfxFeatureReporter reporter("GL Layers");

  // Do not allow double initialization
  MOZ_ASSERT(mGLContext == nullptr || !mOwnsGLContext,
             "Don't reinitialize CompositorOGL");

  if (!mGLContext) {
    MOZ_ASSERT(mOwnsGLContext);
    mGLContext = CreateContext();
  }
[...]
Let's see what happens with the context creation.
Thread 36 "Compositor" hit Breakpoint 5, mozilla::layers::CompositorOGL::
    CreateContext (this=this@entry=0x7ee0002f50)
    at gfx/layers/opengl/CompositorOGL.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() {
(gdb) n
231       nsIWidget* widget = mWidget->RealWidget();
(gdb) 
232       void* widgetOpenGLContext =
(gdb) 
234       if (widgetOpenGLContext) {
(gdb) 
248       if (!context && gfxEnv::LayersPreferOffscreen()) {
(gdb) b GLContextProviderEGL::CreateHeadless
Breakpoint 6 at 0x7ff1133740: file gfx/gl/GLContextProviderEGL.cpp, line 1245.
(gdb) c
Continuing.

Thread 36 "Compositor" hit Breakpoint 6, mozilla::gl::GLContextProviderEGL::
    CreateHeadless (desc=..., out_failureId=out_failureId@entry=0x7f1faed1c8)
    at gfx/gl/GLContextProviderEGL.cpp:1245
1245        const GLContextCreateDesc& desc, nsACString* const out_failureId) {
(gdb) n
1246      const auto display = DefaultEglDisplay(out_failureId);
(gdb) 
1247      if (!display) {
(gdb) p display
$8 = std::shared_ptr<mozilla::gl::EglDisplay> (use count 1, weak count 2)
    = {get() = 0x7ee0004cb0}
(gdb) n
1250      mozilla::gfx::IntSize dummySize = mozilla::gfx::IntSize(16, 16);
(gdb) b GLContextEGL::CreateEGLPBufferOffscreenContext
Breakpoint 7 at 0x7ff11335b8: file gfx/gl/GLContextProviderEGL.cpp, line 1233.
(gdb) c
Continuing.

Thread 36 "Compositor" hit Breakpoint 7, mozilla::gl::GLContextEGL::
    CreateEGLPBufferOffscreenContext (
    display=std::shared_ptr<mozilla::gl::EglDisplay> (use count 2, weak count 2)
    = {...}, desc=..., size=..., 
    out_failureId=out_failureId@entry=0x7f1faed1c8)
    at gfx/gl/GLContextProviderEGL.cpp:1233
1233        const mozilla::gfx::IntSize& size, nsACString* const
    out_failureId) {
(gdb) b GLContextEGL::CreateEGLPBufferOffscreenContextImpl
Breakpoint 8 at 0x7ff1133160: file gfx/gl/GLContextProviderEGL.cpp, line 1185.
(gdb) c
Continuing.

Thread 36 "Compositor" hit Breakpoint 8, mozilla::gl::GLContextEGL::
    CreateEGLPBufferOffscreenContextImpl (
    egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 3, weak count 2) =
    {...}, desc=..., size=..., useGles=useGles@entry=false, 
    out_failureId=out_failureId@entry=0x7f1faed1c8)
    at gfx/gl/GLContextProviderEGL.cpp:1185
1185        nsACString* const out_failureId) {
(gdb) n
1186      const EGLConfig config = ChooseConfig(*egl, desc, useGles);
(gdb) 
1187      if (config == EGL_NO_CONFIG) {
(gdb) 
1193      if (GLContext::ShouldSpew()) {
(gdb) 
1197      mozilla::gfx::IntSize pbSize(size);
(gdb) 
1307    include/c++/8.3.0/bits/shared_ptr_base.h: No such file or directory.
(gdb) 
1208      if (!surface) {
(gdb) 
1214      auto fullDesc = GLContextDesc{desc};
(gdb) 
1215      fullDesc.isOffscreen = true;
(gdb) 
1217          egl, fullDesc, config, surface, useGles, out_failureId);
(gdb) b GLContextEGL::CreateGLContext
Breakpoint 9 at 0x7ff1132548: file gfx/gl/GLContextProviderEGL.cpp, line 618.
(gdb) c
Continuing.

Thread 36 "Compositor" hit Breakpoint 9, mozilla::gl::GLContextEGL::
    CreateGLContext (egl=std::shared_ptr<mozilla::gl::EglDisplay>
    (use count 4, weak count 2) = {...}, desc=...,
    config=config@entry=0x55558fc450, surface=surface@entry=0x7ee0008f40,
    useGles=useGles@entry=false, out_failureId=out_failureId@entry=0x7f1faed1c8)
    at gfx/gl/GLContextProviderEGL.cpp:618
618         nsACString* const out_failureId) {
(gdb) n
621       std::vector<EGLint> required_attribs;
(gdb) 
We're getting down into the depths now. It's surprisingly thrilling to be seeing this code again. I recall that this GLContextEGL::CreateGLContext() method is where a lot of the action happens.

But my head is full and this feels like a good place to leave things. Inside this method might be the right place to initialise mSwapChain, but it's definitely not happening here.

Tomorrow I'll do a sweep of the other code to check whether any attempt is being made to initialise it somewhere else. If not I'll add in some initialisation code to see what happens.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
21 Feb 2024 : Day 163 #
We're making good progress with the WebView rendering pipeline. The first issue to fix, which we've been looking at for the last couple of days, has been ensuring the layer manger is of the Client type, rather than the WebRender type. There's a new WEBRENDER_SOFTWARE feature that was introduced between ESR 78 and ESR 91 which is causing the trouble. In previous builds we disabled the WEBRENDER feature, but now with the new feature it's being enabled again. we need to ensure it's not enabled.

So the key questions to answer today are: how was WEBRENDER being disabled on ESR 78; and can we do something equivalent for WEBRENDER_SOFTWARE on ESR 91.

In the gfxConfigureManager.cpp file there are a couple of encouraging looking methods called gfxConfigManager::ConfigureWebRender() and gfxConfigManager::ConfigureWebRenderSoftware(). These enable and disable the web renderer and software web renderer features respectively. Unsurprisingly, the latter is a new method for ESR 91, but the former is available in both ESR 78 and ESR 91, so I'll concentrate on that one first.

When looking at the code in these we also need to refer back to the initialisation method, because that's where some key variables are being created:
void gfxConfigManager::Init() {
[...]
  mFeatureWr = &gfxConfig::GetFeature(Feature::WEBRENDER);
[...]
  mFeatureWrSoftware = &gfxConfig::GetFeature(Feature::WEBRENDER_SOFTWARE);
[...]
So these two variables — mFeatureWr and mFeatureWrSoftware are feature objects which we can then use to enable and disable various features.

In ESR 78 the logic for whether mFeatureWr should be enabled or not is serpentine. I'm not going to try to work through by hand, rather I'll set the debugger on it and see which way it slithers.

Happily my debug session is still running from yesterday (I think it's been running for three days now), so I can continue straight with that. I'll include the full step-through, but there's a lot of it so don't feel you have to follow along, I'll summarise the important parts afterwards.
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) b gfxConfigManager::ConfigureWebRender
Breakpoint 5 at 0x7fb90a8d88: file gfx/config/gfxConfigManager.cpp, line 194.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
Thread 7 "GeckoWorkerThre" hit Breakpoint 5, mozilla::gfx::gfxConfigManager::
    ConfigureWebRender (this=this@entry=0x7fa7972598)
    at gfx/config/gfxConfigManager.cpp:194
194     void gfxConfigManager::ConfigureWebRender() {
(gdb) n
206       mFeatureWrCompositor->SetDefaultFromPref("gfx.webrender.compositor",
    true,
(gdb) n
209       if (mWrCompositorForceEnabled) {
(gdb) n
213       ConfigureFromBlocklist(nsIGfxInfo::FEATURE_WEBRENDER_COMPOSITOR,
(gdb) n
219       if (!mHwStretchingSupport && mScaledResolution) {
(gdb) n
225       bool guardedByQualifiedPref = ConfigureWebRenderQualified();
(gdb) n
300     obj-build-mer-qt-xr/dist/include/nsTStringRepr.h: No such file or directory.
(gdb) p *mFeatureWr
$15 = {mDefault = {mMessage = '\000' <repeats 63 times>, mStatus =
    mozilla::gfx::FeatureStatus::Unused}, mUser = {mMessage = '\000'
    <repeats 63 times>, mStatus = mozilla::gfx::FeatureStatus::Unused},
    mEnvironment = {mMessage = '\000' <repeats 63 times>,
    mStatus = mozilla::gfx::FeatureStatus::Unused}, mRuntime = {mMessage =
    '\000' <repeats 63 times>, mStatus = mozilla::gfx::FeatureStatus::Unused}, 
  mFailureId = {<nsTSubstring<char>> = {<mozilla::detail::nsTStringRepr<char>> =
    {mData = 0x7fbc7d4f42 <gNullChar> "", mLength = 0, mDataFlags =
    mozilla::detail::StringDataFlags::TERMINATED, mClassFlags =
    mozilla::detail::StringClassFlags::NULL_TERMINATED}, 
      static kMaxCapacity = 2147483637}, <No data fields>}}
(gdb) p mFeatureWr->GetValue()
$16 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWr->IsEnabled()
$17 = false
(gdb) p mFeatureWr->mDefault.mStatus
$30 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWr->mRuntime.mStatus
$31 = mozilla::gfx::FeatureStatus::Unused
(gdb) n
235       if (mWrEnvForceEnabled) {
(gdb) p mWrEnvForceEnabled
$18 = false
(gdb) n
237       } else if (mWrForceEnabled) {
(gdb) p mWrForceEnabled
$19 = false
(gdb) n
239       } else if (mFeatureWrQualified->IsEnabled()) {
(gdb) p mFeatureWrQualified->IsEnabled()
$20 = false
(gdb) n
253       if (mWrForceDisabled ||
(gdb) p mWrForceDisabled
$21 = false
(gdb) p mWrEnvForceDisabled
$22 = false
(gdb) p mWrQualifiedOverride.isNothing()
Cannot evaluate function -- may be inlined
(gdb) n
261       if (!mFeatureHwCompositing->IsEnabled()) {
(gdb) n
268       if (mSafeMode) {
(gdb) n
276       if (mIsWindows && !mIsWin10OrLater && !mDwmCompositionEnabled) {
(gdb) p mIsWindows
$23 = false
(gdb) p mIsWin10OrLater
$24 = false
(gdb) p mDwmCompositionEnabled
$25 = true
(gdb) n
283           NS_LITERAL_CSTRING("FEATURE_FAILURE_DEFAULT_OFF"));
(gdb) n
285       if (mFeatureD3D11HwAngle && mWrForceAngle) {
(gdb) n
301       if (!mFeatureWr->IsEnabled() && mDisableHwCompositingNoWr) {
(gdb) p mFeatureWr->IsEnabled()
$26 = false
(gdb) p mDisableHwCompositingNoWr
$27 = false
(gdb) n
324           NS_LITERAL_CSTRING("FEATURE_FAILURE_DEFAULT_OFF"));
(gdb) n
326       if (mWrDCompWinEnabled) {
(gdb) n
334       if (!mWrPictureCaching) {
(gdb) n
340       if (!mFeatureWrDComp->IsEnabled() && mWrCompositorDCompRequired) {
(gdb) n
348       if (mWrPartialPresent) {
(gdb) n
gfxPlatform::InitWebRenderConfig (this=<optimized out>)
    at gfx/thebes/gfxPlatform.cpp:2733
2733      if (Preferences::GetBool("gfx.webrender.program-binary-disk", false)) {
(gdb) c
[...]
That's a bit too much detail there, but the key conclusion is that mFeatureWr (which represents the state of the WEBRENDER feature starts off disabled and the value is never changed. So by the end of the gfxConfigManager::ConfigureWebRender() method the feature remains disabled. It's not changed anywhere else and so we're left with our layer manager being created as a Client layer manager, which is what we need.

We can see that it's set to disabled from the following sequence, copied from the full debugging session above:
(gdb) p mFeatureWr->IsEnabled()
$17 = false
(gdb) p mFeatureWr->mDefault.mStatus
$30 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWr->mRuntime.mStatus
$31 = mozilla::gfx::FeatureStatus::Unused
Features are made from multiple layers of states. Each layer can be either set or unused. To determine the state of a feature each layer is examined in order until one of them is set to something other than Unused. The first unused layer provides the actual state of the feature.

The layers are the following:
  1. mRuntime
  2. mUser
  3. mEnvironment
  4. mStatus
  5. mDefault
The mDefault layer provides a backstop: if all other layers are Unused then whatever value the mDefault layer takes is the value of the feature (even if that value is Unused).

So, to summarise and bring all this together, the mFeatureWr feature is enabled if all of the following hold:
  1. mFeatureWr->mDefault.mStatus is set to anything other than Unused.
  2. The mStatus value of one of the other layers is set to something other than Unused and is either Available or ForceEnabled.
Looking at the values from the debugging session above, we can therefore see exactly why mFeatureWr->IsEnabled() is returning false: it's simply never had any other value set on it.

Now we need to compare this to the process for ESR 91. Before we get into it it's worth noting that the WEBRENDER feature in ESR 91 is also (correctly) disabled, so we may not see any big differences here with this. Let's see.

Again, I can continue with the debugging session I've been running for the last few days:
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) b gfxConfigManager::ConfigureWebRender
Breakpoint 9 at 0x7ff138d708: file gfx/config/gfxConfigManager.cpp, line 215.
(gdb) b gfxConfigManager::ConfigureWebRenderSoftware
Breakpoint 10 at 0x7ff138d41c: file gfx/config/gfxConfigManager.cpp, line 125.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
Thread 7 "GeckoWorkerThre" hit Breakpoint 9, mozilla::gfx::gfxConfigManager::
    ConfigureWebRender (this=this@entry=0x7fd7da72f8)
    at gfx/config/gfxConfigManager.cpp:215
215     void gfxConfigManager::ConfigureWebRender() {
(gdb) p mFeatureWr->IsEnabled()
$13 = false
(gdb) p mFeatureWr->mDefault.mStatus
$14 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWr->mRuntime.mStatus
$15 = mozilla::gfx::FeatureStatus::Unused
So as we go in to the ConfigureWebRender() method the value is set to disabled. This is the same as for ESR 78.
(gdb) n
230       mFeatureWrCompositor->SetDefaultFromPref("gfx.webrender.compositor",
    true,
(gdb)
233       if (mWrCompositorForceEnabled) {
(gdb)
237       ConfigureFromBlocklist(nsIGfxInfo::FEATURE_WEBRENDER_COMPOSITOR,
(gdb)
243       if (!mHwStretchingSupport.IsFullySupported() && mScaledResolution) {
(gdb)
253       ConfigureWebRenderSoftware();
(gdb) n
At this point we're jumping in to the ConfigureWebRenderSoftware() method. We're going to continue into it, since we're interested to know what happens there. But it's worth noting that this is a departure from what happens on ESR 78.
Thread 7 "GeckoWorkerThre" hit Breakpoint 10, mozilla::gfx::gfxConfigManager::
    ConfigureWebRenderSoftware (this=this@entry=0x7fd7da72f8)
    at gfx/config/gfxConfigManager.cpp:125
125     void gfxConfigManager::ConfigureWebRenderSoftware() {
(gdb) p mFeatureWrSoftware->IsEnabled()
$16 = false
(gdb) p mFeatureWrSoftware->mDefault.mStatus
$17 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWrSoftware->mDefault.mStatus
$18 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWrSoftware->mRuntime.mStatus
$19 = mozilla::gfx::FeatureStatus::Unused
Going in we also see that the mFeatureWrSoftware feature is disabled.
(gdb) n
128       mFeatureWrSoftware->EnableByDefault();
(gdb) n
134       if (mWrSoftwareForceEnabled) {
(gdb) p mFeatureWrSoftware->IsEnabled()
$20 = true
(gdb) p mFeatureWrSoftware->mDefault.mStatus
$21 = mozilla::gfx::FeatureStatus::Available
(gdb) p mFeatureWrSoftware->mRuntime.mStatus
$22 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWrSoftware->mUser.mStatus
$23 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWrSoftware->mEnvironment.mStatus
$24 = mozilla::gfx::FeatureStatus::Unused
(gdb) p mFeatureWrSoftware->mDefault.mStatus
$25 = mozilla::gfx::FeatureStatus::Available
But this is immediately switched to being enabled; in this case set as having a default value of Available. So far there have been no conditions on the execution, so we're guaranteed to reach this state every time. Let's continue.
(gdb) p mWrSoftwareForceEnabled
$33 = false
(gdb) n
136       } else if (mWrForceDisabled || mWrEnvForceDisabled) {
(gdb) p mWrForceDisabled
$26 = false
(gdb) p mWrEnvForceDisabled
$27 = false
Here there was an opportunity to disable the feature if either mWrForceDisabled or mWrEnvForceDisabled were set to true, but since both were set to false we skip over this possibility. This might be our way in to disabling it, so we may want to return to this. But let's continue on with the rest of the debugging for now.
(gdb) n
141       } else if (gfxPlatform::DoesFissionForceWebRender()) {
(gdb) n
145       if (!mHasWrSoftwareBlocklist) {
(gdb) p mHasWrSoftwareBlocklist
$28 = false
At this point the mHasWrSoftwareBlocklist variable is set to false which causes us to jump out of the ConfigureWebRenderSoftware() method early. So we'll return back up the stack to the ConfigureWebRender() method and continue from there.
(gdb) n
mozilla::gfx::gfxConfigManager::ConfigureWebRender
    (this=this@entry=0x7fd7da72f8)
    at gfx/config/gfxConfigManager.cpp:254
254       ConfigureWebRenderQualified();
(gdb) n
256       mFeatureWr->EnableByDefault();
(gdb) n
262       if (mWrSoftwareForceEnabled) {
(gdb) p mFeatureWr->IsEnabled()
$29 = true
(gdb) n
Here we see another change from ESR 78. The mFeatureWr feature is enabled here. We already know it's ultimately disabled so we should keep an eye out for where that happens.
266       } else if (mWrEnvForceEnabled) {
(gdb) 
268       } else if (mWrForceDisabled || mWrEnvForceDisabled) {
(gdb)
275       } else if (mWrForceEnabled) {
(gdb) p mWrForceEnabled
$30 = false
(gdb) n
279       if (!mFeatureWrQualified->IsEnabled()) {
(gdb) p mFeatureWrQualified->IsEnabled()
$31 = false
(gdb) n
282         mFeatureWr->Disable(FeatureStatus::Disabled, "Not qualified",
(gdb) n
287       if (!mFeatureHwCompositing->IsEnabled()) {
(gdb) p mFeatureWr->IsEnabled()
$32 = false
So here it gets disabled again and the reason is because mFeatureWrQualified is disabled. Here's the comment text that goes alongside this in the code (the debugger skips these comments):
    // No qualified hardware. If we haven't allowed software fallback,
    // then we need to disable WR.
So we'll end up with this being disabled whatever happens. There's not much to see in the remainder of the method, but let's skip through the rest of the steps for completeness.
(gdb) n
293       if (mSafeMode) {
(gdb) n
302       if (mXRenderEnabled) {
(gdb) n
312       mFeatureWrAngle->EnableByDefault();
(gdb) n
313       if (mFeatureD3D11HwAngle) {
(gdb) n
335         mFeatureWrAngle->Disable(FeatureStatus::Unavailable,
    "OS not supported",
(gdb) n
339       if (mWrForceAngle && mFeatureWr->IsEnabled() &&
(gdb) n
347       if (!mFeatureWr->IsEnabled() && mDisableHwCompositingNoWr) {
(gdb) n
367       mFeatureWrDComp->EnableByDefault();
(gdb) n
368       if (!mWrDCompWinEnabled) {
(gdb) n
369         mFeatureWrDComp->UserDisable("User disabled via pref",
(gdb) n
373       if (!mIsWin10OrLater) {
(gdb) n
375         mFeatureWrDComp->Disable(FeatureStatus::Unavailable,
(gdb) n
380       if (!mIsNightly) {
(gdb) n
383         nsAutoString adapterVendorID;
(gdb) n
384         mGfxInfo->GetAdapterVendorID(adapterVendorID);
(gdb) n
385         if (adapterVendorID == u"0x10de") {
(gdb) n
383         nsAutoString adapterVendorID;
(gdb) n
396       mFeatureWrDComp->MaybeSetFailed(
(gdb) n
399       mFeatureWrDComp->MaybeSetFailed(mFeatureWrAngle->IsEnabled(),
(gdb) n
403       if (!mFeatureWrDComp->IsEnabled() && mWrCompositorDCompRequired) {
(gdb) n
411       if (mWrPartialPresent) {
(gdb) n
654     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/StaticPrefList_gfx.h:
    No such file or directory.
(gdb) n
433       ConfigureFromBlocklist(nsIGfxInfo::FEATURE_WEBRENDER_SHADER_CACHE,
(gdb) n
435       if (!mFeatureWr->IsEnabled()) {
(gdb) n
436         mFeatureWrShaderCache->ForceDisable(FeatureStatus::Unavailable,
(gdb) n
441       mFeatureWrOptimizedShaders->EnableByDefault();
(gdb) n
442       if (!mWrOptimizedShaders) {
(gdb) n
446       ConfigureFromBlocklist(nsIGfxInfo::FEATURE_WEBRENDER_OPTIMIZED_SHADERS,
(gdb) n
448       if (!mFeatureWr->IsEnabled()) {
(gdb) n
449         mFeatureWrOptimizedShaders->ForceDisable(FeatureStatus::Unavailable,
(gdb) n
And we're out of the method. So that's it: we can see that mFeatureWr is disabled here, as expected. However when it comes to mFeatureWrSoftware it's a different story. The value is enabled by default; to get it disabled we'll need to ensure one of mWrForceDisabled or mWrEnvForceDisabled is set to true.

Both of these are set in the initialisation method, like this:
void gfxConfigManager::Init() {
[...]
  mWrForceDisabled = StaticPrefs::gfx_webrender_force_disabled_AtStartup();
[...]
  mWrEnvForceDisabled = gfxPlatform::WebRenderEnvvarDisabled();
[...]
Here's the code that creates the former:
ONCE_PREF(
  "gfx.webrender.force-disabled",
   gfx_webrender_force_disabled,
   gfx_webrender_force_disabled_AtStartup,
  bool, false
)
That's from the autogenerated obj-build-mer-qt-xr/modules/libpref/init/StaticPrefList_gfx.h file. This is being generated from the gecko-dev/modules/libpref/init/StaticPrefList.yaml file, the relevant part of which looks like this:
# Also expose a pref to allow users to force-disable WR. This is exposed
# on all channels because WR can be enabled on qualified hardware on all
# channels.
- name: gfx.webrender.force-disabled
  type: bool
  value: false
  mirror: once
The latter is set using an environment variable:
/*static*/
bool gfxPlatform::WebRenderEnvvarDisabled() {
  const char* env = PR_GetEnv("MOZ_WEBRENDER");
  return (env && *env == '0');
}
Okay, we've reached the end of this piece of investigation. What's clear is that there may not be any Sailfish-specific code for disabling the web render layer manager because it's being disabled by default anyway.

For the software web render layer manager we could set the MOZ_WEBRENDER environment variable to 0 to force it to be disabled and this will be handy for testing. But in the longer term we should probably put some code into sailfish-browser to explicitly set the gfx.webrender.force-disabled static preference to true.

As I look in to this I discover something surprising. Even though web render is disabled by default, doing some grepping around the code threw the following up in the sailfish-browser code:
void DeclarativeWebUtils::setRenderingPreferences()
{
    SailfishOS::WebEngineSettings *webEngineSettings =
        SailfishOS::WebEngineSettings::instance();

    // Use external Qt window for rendering content
    webEngineSettings->setPreference(
        QString("gfx.compositor.external-window"), QVariant(true));
    webEngineSettings->setPreference(
        QString("gfx.compositor.clear-context"), QVariant(false));
    webEngineSettings->setPreference(
        QString("gfx.webrender.force-disabled"), QVariant(true));
    webEngineSettings->setPreference(
        QString("embedlite.compositor.external_gl_context"), QVariant(true));
}
This is fine for the browser, but it's not going to get executed for the WebView, so I'll need to set this in WebEngineSettings::initialize() as well. Thankfully, making this change turns out to be pretty straightforward:
diff --git a/lib/webenginesettings.cpp b/lib/webenginesettings.cpp
index de9e4b86..13b21d5b 100644
--- a/lib/webenginesettings.cpp
+++ b/lib/webenginesettings.cpp
@@ -110,6 +110,10 @@ void SailfishOS::WebEngineSettings::initialize()
     engineSettings->setPreference(QStringLiteral("intl.accept_languages"),
                                   QVariant::fromValue<QString>(langs));
 
+    // Ensure the web renderer is disabled
+    engineSettings->setPreference(QStringLiteral("gfx.webrender.force-disabled"),
+                                  QVariant(true));
+
     Silica::Theme *silicaTheme = Silica::Theme::instance();
 
     // Notify gecko when the ambience switches between light and dark
As well as this change I also had to amend the rawwebview.cpp file to accommodate some of the API changes I made earlier to gecko. I guess I've not built the sailfish-components-webview packages recently or this would have come up. Nevertheless the fix isn't anything too dramatic:
diff --git a/import/webview/rawwebview.cpp b/import/webview/rawwebview.cpp
index 1b1bb92a..2eab77f5 100644
--- a/import/webview/rawwebview.cpp
+++ b/import/webview/rawwebview.cpp
@@ -37,7 +37,7 @@ public:
     ViewCreator();
     ~ViewCreator();
 
-    quint32 createView(const quint32 &parentId, const uintptr_t &parentBrowsingContext) override;
+    quint32 createView(const quint32 &parentId, const uintptr_t &parentBrowsingContext, bool hidden) override;
 
     static std::shared_ptr<ViewCreator> instance();
 
@@ -54,9 +54,10 @@ ViewCreator::~ViewCreator()
     SailfishOS::WebEngine::instance()->setViewCreator(nullptr);
 }
 
-quint32 ViewCreator::createView(const quint32 &parentId, const uintptr_t &parentBrowsingContext)
+quint32 ViewCreator::createView(const quint32 &parentId, const uintptr_t &parentBrowsingContext, bool hidden)
 {
     Q_UNUSED(parentBrowsingContext)
+    Q_UNUSED(hidden)
 
     for (RawWebView *view : views) {
         if (view->uniqueId() == parentId) {
Having fixed all this, I've built and transferred the new packages over to my phone. Now when I run the harbour-webview example app I get something quite different to the crash we were seeing before:
[defaultuser@Xperia10III gecko]$ harbour-webview 
[D] unknown:0 - QML debugging is enabled. Only use this in a safe environment.
[D] main:30 - WebView Example
[D] main:44 - Using default start URL:  "https://www.flypig.co.uk/search/"
[D] main:47 - Opening webview
[D] unknown:0 - Using Wayland-EGL
library "libutils.so" not found
library "libcutils.so" not found
library "libhardware.so" not found
library "android.hardware.graphics.mapper@2.0.so" not found
library "android.hardware.graphics.mapper@2.1.so" not found
library "android.hardware.graphics.mapper@3.0.so" not found
library "android.hardware.graphics.mapper@4.0.so" not found
library "libc++.so" not found
library "libhidlbase.so" not found
library "libgralloctypes.so" not found
library "android.hardware.graphics.common@1.2.so" not found
library "libion.so" not found
library "libz.so" not found
library "libhidlmemory.so" not found
library "android.hidl.memory@1.0.so" not found
library "vendor.qti.qspmhal@1.0.so" not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[W] unknown:7 - file:///usr/share/harbour-webview/qml/harbour-webview.qml:7:30:
    Type WebViewPage unavailable 
         initialPage: Component { WebViewPage { } } 
                                  ^
[W] unknown:13 - file:///usr/share/harbour-webview/qml/pages/
    WebViewPage.qml:13:5: Type WebView unavailable 
         WebView { 
         ^
[W] unknown:141 - file:///usr/lib64/qt5/qml/Sailfish/WebView/WebView.qml:141:9:
    Type TextSelectionController unavailable 
             TextSelectionController { 
             ^
[W] unknown:14 - file:///usr/lib64/qt5/qml/Sailfish/WebView/Controls/
    TextSelectionController.qml:14:1: module "QOfono" is not installed 
     import QOfono 0.2 
     ^
Created LOG for EmbedLite
JSComp: EmbedLiteConsoleListener.js loaded
JSComp: ContentPermissionManager.js loaded
JSComp: EmbedLiteChromeManager.js loaded
JSComp: EmbedLiteErrorPageHandler.js loaded
JSComp: EmbedLiteFaviconService.js loaded
JSComp: EmbedLiteGlobalHelper.js loaded
EmbedLiteGlobalHelper app-startup
JSComp: EmbedLiteOrientationChangeHandler.js loaded
JSComp: EmbedLiteSearchEngine.js loaded
JSComp: EmbedLiteSyncService.js loaded
EmbedLiteSyncService app-startup
JSComp: EmbedLiteWebrtcUI.js: loaded
JSComp: EmbedLiteWebrtcUI.js: got app-startup
JSComp: EmbedPrefService.js loaded
EmbedPrefService app-startup
JSComp: EmbedliteDownloadManager.js loaded
JSComp: LoginsHelper.js loaded
JSComp: PrivateDataManager.js loaded
JSComp: UserAgentOverrideHelper.js loaded
UserAgentOverrideHelper app-startup
CONSOLE message:
[JavaScript Error: "Unexpected event profile-after-change" {file:
    "resource://gre/modules/URLQueryStrippingListService.jsm" line: 228}]
observe@resource://gre/modules/URLQueryStrippingListService.jsm:228:12

Created LOG for EmbedPrefs
No crash, several errors, but (of course) still a blank screen: no actual rendering taking place. But this is still really good progress. The WebView application which was completely crashing before, is now running, just not rendering. That means we now have the opportunity to debug and fix it. One more step forwards.

I'll look into the rendering more tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
20 Feb 2024 : Day 162 #
Yesterday we were looking in to the WebView rendering pipeline. We got to the point where we had a backtrace showing the flow that resulted in a WebRender layer manager being created, when the EmbedLite code was expecting a Client layer manager. The consequence was that the EmbedLite code forcefully killed itself.

That was on ESR 91. Today I want to find the equivalent flow on ESR 78 to see how it differs. To do this I need to first install the same harbour-webview-example code that I'm using for testing on my ESR 78 device. Then set it off with the debugger:
$ gdb harbour-webview
[...]
(gdb) b nsBaseWidget::CreateCompositorSession
Function "nsBaseWidget::CreateCompositorSession" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (nsBaseWidget::CreateCompositorSession) pending.
(gdb) r
[...]
Thread 7 "GeckoWorkerThre" hit Breakpoint 1, nsBaseWidget::
    CreateCompositorSession (this=this@entry=0x7f8ccbf3d0, aWidth=1080,
    aHeight=2520, aOptionsOut=aOptionsOut@entry=0x7fa7972ac0)
    at widget/nsBaseWidget.cpp:1176
1176        int aWidth, int aHeight, CompositorOptions* aOptionsOut) {
(gdb) n
1180        CreateCompositorVsyncDispatcher();
(gdb) n
1182        gfx::GPUProcessManager* gpu = gfx::GPUProcessManager::Get();
(gdb) n
1186        gpu->EnsureGPUReady();
(gdb) n
67      obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:
    No such file or directory.
(gdb) n
1193        bool enableAPZ = UseAPZ();
(gdb) n
1194        CompositorOptions options(enableAPZ, enableWR);
(gdb) n
1198        bool enableAL =
(gdb) n
1203        options.SetUseWebGPU(StaticPrefs::dom_webgpu_enabled());
(gdb) n
50      obj-build-mer-qt-xr/dist/include/mozilla/layers/CompositorOptions.h:
    No such file or directory.
(gdb) n
1210        options.SetInitiallyPaused(CompositorInitiallyPaused());
(gdb) n
53      obj-build-mer-qt-xr/dist/include/mozilla/layers/CompositorOptions.h:
    No such file or directory.
(gdb) 
39      in obj-build-mer-qt-xr/dist/include/mozilla/layers/CompositorOptions.h
(gdb) 
1217          lm = new ClientLayerManager(this);
(gdb) p enableWR
$1 = false
(gdb) p enableAPZ
$2 = <optimized out>
(gdb) p enableAL
$3 = <optimized out>
(gdb) p gfx::gfxConfig::IsEnabled(gfx::Feature::ADVANCED_LAYERS)
$4 = false
(gdb) p mFissionWindow
$5 = false
(gdb) p StaticPrefs::layers_advanced_fission_enabled()
No symbol "layers_advanced_fission_enabled" in namespace "mozilla::StaticPrefs".
(gdb) p StaticPrefs::dom_webgpu_enabled()
$6 = false
(gdb) p options.UseWebRender()
Cannot evaluate function -- may be inlined
(gdb) p options
$7 = {mUseAPZ = true, mUseWebRender = false, mUseAdvancedLayers = false,
    mUseWebGPU = false, mInitiallyPaused = false}
(gdb) 
As we can see, on ESR 78 things are different: the options.mUseWebRender field is set to false compared to ESR 91 where it's set to true. What's feeding in to these values?

The options structure and its functionality is defined in CompositorOptions.h. Checking through the code there we can see that mUseWebRender is set at initialisation, either to the default value of false if the default constructor is used, or an explicit value if the following constructor overload is used:
  CompositorOptions(bool aUseAPZ, bool aUseWebRender,
                    bool aUseSoftwareWebRender)
      : mUseAPZ(aUseAPZ),
        mUseWebRender(aUseWebRender),
        mUseSoftwareWebRender(aUseSoftwareWebRender) {
    MOZ_ASSERT_IF(aUseSoftwareWebRender, aUseWebRender);
  }
It's never changed after that. So going back to our nsBaseWidget::CreateCompositorSession() code, the only part we need to concern ourselves with is the value that's passed in to the constructor.

For both ESR 78 and ESR 91, the value that's passed in is that of the local enableWR variable. The logic for this value is really straightforward for ESR 78:
    bool enableWR =
        gfx::gfxVars::UseWebRender() && WidgetTypeSupportsAcceleration();
Let's find out how this value is being set:
(gdb) p WidgetTypeSupportsAcceleration()
$8 = true
(gdb) p gfx::gfxVars::UseWebRender()
Cannot evaluate function -- may be inlined
We can't call the UseWebRender() method directly, but we can extract the value it would return by digging into the data structures. This is all following from the code in gfxVars.h:
(gdb) p gfx::gfxVars::sInstance.mRawPtr.mVarUseWebRender.mValue
$11 = false
That's useful, but it doesn't tell us everything we need to know. The next step is to find out where and why this value is being set to false.
$ grep -rIn "gfxVars::SetUseWebRender(" * --include="*.cpp"
gecko-dev/gfx/thebes/gfxPlatform.cpp:2750:    gfxVars::SetUseWebRender(true);
gecko-dev/gfx/thebes/gfxPlatform.cpp:3297:    gfxVars::SetUseWebRender(false);
gecko-dev/gfx/ipc/GPUProcessManager.cpp:479:  gfx::gfxVars::SetUseWebRender(false);
These are being set in gfxPlatform::InitWebRenderConfig(), gfxPlatform::NotifyGPUProcessDisabled() and GPUProcessManager::DisableWebRender() respectively.

Let's find out which is responsible.
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break gfxPlatform::InitWebRenderConfig
Breakpoint 2 at 0x7fb9013328: file gfx/thebes/gfxPlatform.cpp, line 2691.
(gdb) b gfxPlatform::NotifyGPUProcessDisabled
Breakpoint 3 at 0x7fb9016fb0: file gfx/thebes/gfxPlatform.cpp, line 3291.
(gdb) b GPUProcessManager::DisableWebRender
Breakpoint 4 at 0x7fb907f858: GPUProcessManager::DisableWebRender. (3 locations)
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
Thread 7 "GeckoWorkerThre" hit Breakpoint 2, gfxPlatform::InitWebRenderConfig
    (this=0x7f8c8bbf60)
    at gfx/thebes/gfxPlatform.cpp:2691
2691    void gfxPlatform::InitWebRenderConfig() {
(gdb) n
2692      bool prefEnabled = WebRenderPrefEnabled();
(gdb) n
2693      bool envvarEnabled = WebRenderEnvvarEnabled();
(gdb) n
2698      gfxVars::AddReceiver(&nsCSSProps::GfxVarReceiver());
(gdb) n
2708      ScopedGfxFeatureReporter reporter("WR", prefEnabled || envvarEnabled);
(gdb) n
2709      if (!XRE_IsParentProcess()) {
(gdb) n
2723      gfxConfigManager manager;
(gdb) n
2725      manager.ConfigureWebRender();
(gdb) n
2733      if (Preferences::GetBool("gfx.webrender.program-binary-disk", false)) {
(gdb) n
2738      if (StaticPrefs::gfx_webrender_use_optimized_shaders_AtStartup()) {
(gdb) n
2739        gfxVars::SetUseWebRenderOptimizedShaders(
(gdb) n
2743      if (Preferences::GetBool("gfx.webrender.software", false)) {
(gdb) p gfxConfig::IsEnabled(Feature::WEBRENDER)
$12 = false
(gdb) n
2749      if (gfxConfig::IsEnabled(Feature::WEBRENDER)) {
(gdb) n
2791      if (gfxConfig::IsEnabled(Feature::WEBRENDER_COMPOSITOR)) {
(gdb) p gfxConfig::IsEnabled(Feature::WEBRENDER_COMPOSITOR)
$13 = false
(gdb) n
2795      Telemetry::ScalarSet(
(gdb) n
2799      if (gfxConfig::IsEnabled(Feature::WEBRENDER_PARTIAL)) {
(gdb) n
2805      gfxVars::SetUseGLSwizzle(
(gdb) n
2810      gfxUtils::RemoveShaderCacheFromDiskIfNecessary();
(gdb) r
[...]
No other breakpoints are hit. So as we can see here, on ESR 78 the value for UseWebRender() is left as the default value of false. The reason for this is that gfxConfig::IsEnabled(Feature::WEBRENDER) is returning false. We might need to investigate further where this Feature::WEBRENDER configuration value is coming from or being set, but let's switch to ESR 91 now to find out how things are happening there.

The value of enableWR has a much more complex derivation in ESR 91 compared to that in ESR 78. Here's the logic (note that I've simplified the code to remove the unnecessary parts):
    bool supportsAcceleration = WidgetTypeSupportsAcceleration();
    bool enableWR;
    if (supportsAcceleration ||
        StaticPrefs::gfx_webrender_unaccelerated_widget_force()) {
      enableWR = gfx::gfxVars::UseWebRender();
    } else if (gfxPlatform::DoesFissionForceWebRender() ||
               StaticPrefs::
                   gfx_webrender_software_unaccelerated_widget_allow()) {
      enableWR = gfx::gfxVars::UseWebRender();
    } else {
      enableWR = false;
    }
In practice supportsAcceleration is going to be set to true, which simplifies things and brings us back to this condition:
      enableWR = gfx::gfxVars::UseWebRender();
Let's follow the same investigatory path that we did for ESR 78.
$ grep -rIn "gfxVars::SetUseWebRender(" * --include="*.cpp"
gecko-dev/gfx/thebes/gfxPlatform.cpp:2713:    gfxVars::SetUseWebRender(true);
gecko-dev/gfx/thebes/gfxPlatform.cpp:3435:      gfxVars::SetUseWebRender(true);
gecko-dev/gfx/thebes/gfxPlatform.cpp:3475:    gfxVars::SetUseWebRender(false);
The second of these appears in some code that's compile-time conditional on the platform being Windows XP, so we can ignore it. The other two appear in gfxPlatform::InitWebRenderConfig() and gfxPlatform::FallbackFromAcceleration() respectively. I'm going to go out on a limb and say that we're interested in the former, but let's check using the debugger to make sure.
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) b gfxPlatform::InitWebRenderConfig
Breakpoint 7 at 0x7ff12ef954: file gfx/thebes/gfxPlatform.cpp, line 2646.
(gdb) b gfxPlatform::FallbackFromAcceleration
Breakpoint 8 at 0x7ff12f3048: file gfx/thebes/gfxPlatform.cpp, line 3381.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
Thread 7 "GeckoWorkerThre" hit Breakpoint 7, gfxPlatform::InitWebRenderConfig
    (this=0x7fc4a48c90)
    at gfx/thebes/gfxPlatform.cpp:2646
2646    void gfxPlatform::InitWebRenderConfig() {
(gdb) n
2647      bool prefEnabled = WebRenderPrefEnabled();
(gdb) n
2648      bool envvarEnabled = WebRenderEnvvarEnabled();
(gdb)
[New LWP 27297]
2653      gfxVars::AddReceiver(&nsCSSProps::GfxVarReceiver());
(gdb) 
2663      ScopedGfxFeatureReporter reporter("WR", prefEnabled || envvarEnabled);
(gdb) 
32      ${PROJECT}/obj-build-mer-qt-xr/dist/include/gfxCrashReporterUtils.h:
    No such file or directory.
(gdb) 
2664      if (!XRE_IsParentProcess()) {
(gdb) 
2678      gfxConfigManager manager;
(gdb) 
2679      manager.Init();
(gdb) 
2680      manager.ConfigureWebRender();
(gdb) 
2682      bool hasHardware = gfxConfig::IsEnabled(Feature::WEBRENDER);
(gdb) 
2683      bool hasSoftware = gfxConfig::IsEnabled(Feature::WEBRENDER_SOFTWARE);
(gdb) 
2684      bool hasWebRender = hasHardware || hasSoftware;
(gdb) p hasHardware
$10 = false
(gdb) p hasSoftware
$11 = true
(gdb) p hasWebRender
$12 = <optimized out>
(gdb) n
2701      if (gfxConfig::IsEnabled(Feature::WEBRENDER_SHADER_CACHE)) {
(gdb) n
2705      gfxVars::SetUseWebRenderOptimizedShaders(
(gdb) n
2708      gfxVars::SetUseSoftwareWebRender(!hasHardware && hasSoftware);
(gdb) n
2712      if (hasWebRender) {
(gdb) n
2713        gfxVars::SetUseWebRender(true);
(gdb) c
[...]
So there we can see that the WebRender layer manager is being activated in ESR 91 due to Feature::WEBRENDER_SOFTWARE being enabled.

So we have a clear difference. In ESR 78 Feature::WEBRENDER is set to false. In ESR 91 the Feature::WEBRENDER_SOFTWARE has been added which is enough for the WebRender layer manager to be enabled.

This is good progress. The next step is to figure out where Feature::WEBRENDER_SOFTWARE is being set to enabled and find out how to disable it. I'll take a look at that tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
19 Feb 2024 : Day 161 #
Yesterday I was complaining about the difficulty debugging while travelling by train. Before I'd even posted the diary entry I'd received some beautiful new creations from Thigg to illustrate my experiences. I think he's captured it rather too well and it's a real joy to be able to share this creation with you.
 
A pig with wings sitting in the storage department of a train with a laptop on its lap, entangled into way too many usb cables.

This is just so great! But although this was the most representative, it wasn't my favourite of the images Thigg created. I'll be sharing some of the others at other times when I have the pleasure of enjoying train-based-development, so watch out for more!

On to a fresh day, and this morning the package I started building yesterday evening on the train has finally finished. But that's not as helpful to me as I was hoping it would be when I kicked it off. The change I made was to annotate the code with some debug output. Since then I've been able to find out all the same information using the debugger.

To recap the situation, we've been looking at WebView rendering. Currently any attempt to use the WebView will result in a crash. That's because the the EmbedLite PuppetWdigetBase code, on discovering that the layer manager is of type LAYERS_WR (Web Renderer) is intentionally triggering a crash. It requires the layer manager to be of type LAYERS_CLIENT to prevent this crash from happening.

So my task for today is to find out where the layer manager is being created and establish why the wrong type is being used. To get a good handle on the situation I'll also need to compare this against the same paths in ESR 78 to find out whey they're different.

Looking through the code there are two obvious places where a WebLayerManager is created. First there's code in PuppetWidget that looks like this:
bool PuppetWidget::CreateRemoteLayerManager(
    const std::function<bool(LayerManager*)>& aInitializeFunc) {
  RefPtr<LayerManager> lm;
  MOZ_ASSERT(mBrowserChild);
  if (mBrowserChild->GetCompositorOptions().UseWebRender()) {
    lm = new WebRenderLayerManager(this);
  } else {
    lm = new ClientLayerManager(this);
  }
[...]
Second there's some code in nsBaseWidget that looks like this (I've left some of the comments in, since they're relevant):
already_AddRefed<LayerManager> nsBaseWidget::CreateCompositorSession(
    int aWidth, int aHeight, CompositorOptions* aOptionsOut) {
[...]
    gfx::GPUProcessManager* gpu = gfx::GPUProcessManager::Get();
    // Make sure GPU process is ready for use.
    // If it failed to connect to GPU process, GPU process usage is disabled in
    // EnsureGPUReady(). It could update gfxVars and gfxConfigs.
    gpu->EnsureGPUReady();

    // If widget type does not supports acceleration, we may be allowed to use
    // software WebRender instead. If not, then we use ClientLayerManager even
    // when gfxVars::UseWebRender() is true. WebRender could coexist only with
    // BasicCompositor.
[...]
    RefPtr<LayerManager> lm;
    if (options.UseWebRender()) {
      lm = new WebRenderLayerManager(this);
    } else {
      lm = new ClientLayerManager(this);
    }
[...]
It should be pretty easy to check using the debugger whether either of these are the relevant routes when setting up the layer manager. I still have the debugging session open from yesterday:
(gdb) break nsBaseWidget.cpp:1364
Breakpoint 3 at 0x7ff2a57b64: file widget/nsBaseWidget.cpp, line 1364.
(gdb) break PuppetWidget.cpp:616
Breakpoint 4 at 0x7ff2a67d48: file widget/PuppetWidget.cpp, line 616.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
Created LOG for EmbedLiteLayerManager

Thread 7 "GeckoWorkerThre" hit Breakpoint 3, nsBaseWidget::
    CreateCompositorSession (this=this@entry=0x7fc4dad520,
    aWidth=aWidth@entry=1080, aHeight=aHeight@entry=2520,
    aOptionsOut=aOptionsOut@entry=0x7fd7da7770)
    at widget/nsBaseWidget.cpp:1364
1364        options.SetInitiallyPaused(CompositorInitiallyPaused());
(gdb) n
43      ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    CompositorOptions.h: No such file or directory.
(gdb) 
1369          lm = new WebRenderLayerManager(this);
(gdb) p options
$4 = {mUseAPZ = true, mUseWebRender = true, mUseSoftwareWebRender = true,
    mAllowSoftwareWebRenderD3D11 = false, mAllowSoftwareWebRenderOGL = false, 
  mUseAdvancedLayers = false, mUseWebGPU = false, mInitiallyPaused = false}
(gdb) 
The options structure is really clean and it's helpful to be able to see all of the contents like this.

So we now know that the Web Render version of the layer manager is being created in nsBaseWidget::CreateCompositorSession(). There are two questions that immediately spring to mind: first, if the Client version of the layer manager were being created at this point, would it fix things? Second, is it possible to run with the Web Render layer manager instead?

I also want to know exactly what inputs are being used to decide which type of layer manager to use. Stepping through the nsBaseWidget::CreateCompositorSession() is likely to help with this, so let's give that a go.
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break nsBaseWidget::CreateCompositorSession
Breakpoint 5 at 0x7ff2a578f8: file widget/nsBaseWidget.cpp, line 1308.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]

Thread 7 "GeckoWorkerThre" hit Breakpoint 5, nsBaseWidget::
    CreateCompositorSession (this=this@entry=0x7fc4db8a30,
    aWidth=aWidth@entry=1080, aHeight=aHeight@entry=2520,
    aOptionsOut=aOptionsOut@entry=0x7fd7da7770)
    at widget/nsBaseWidget.cpp:1308
1308        int aWidth, int aHeight, CompositorOptions* aOptionsOut) {
(gdb) n
1312        CreateCompositorVsyncDispatcher();
(gdb) n
1314        gfx::GPUProcessManager* gpu = gfx::GPUProcessManager::Get();
(gdb) n
1318        gpu->EnsureGPUReady();
(gdb) n
1324        bool supportsAcceleration = WidgetTypeSupportsAcceleration();
(gdb) n
1327        if (supportsAcceleration ||
(gdb) n
1329          enableWR = gfx::gfxVars::UseWebRender();
(gdb) n
195     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/gfxVars.h:
    No such file or directory.
(gdb) n
1338        bool enableAPZ = UseAPZ();
(gdb) n
1339        CompositorOptions options(enableAPZ, enableWR, enableSWWR);
(gdb) p supportsAcceleration
$8 = <optimized out>
(gdb) p enableAPZ
$5 = true
(gdb) p enableWR
$6 = true
(gdb) p enableSWWR
$7 = true
(gdb) n
1357        options.SetUseWebGPU(StaticPrefs::dom_webgpu_enabled());
(gdb) p StaticPrefs::dom_webgpu_enabled()
$9 = false
(gdb) n
mozilla::Atomic<bool, (mozilla::MemoryOrdering)0, void>::operator bool
    (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    CompositorOptions.h:67
67      ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    CompositorOptions.h: No such file or directory.
(gdb) n
nsBaseWidget::CreateCompositorSession (this=this@entry=0x7fc4db8a30,
    aWidth=aWidth@entry=1080, aHeight=aHeight@entry=2520, 
    aOptionsOut=aOptionsOut@entry=0x7fd7da7770)
    at widget/nsBaseWidget.cpp:1364
1364        options.SetInitiallyPaused(CompositorInitiallyPaused());
(gdb) n
43      ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    CompositorOptions.h: No such file or directory.
(gdb) n
1369          lm = new WebRenderLayerManager(this);
(gdb) 
That gives us some things to work with, but to actually dig into what this all means will have to wait until the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 Feb 2024 : Day 160 #
It's been a long couple of days running an event at work, but now I'm on the train heading home and looking forward to a change of focus for a bit.

And part of that is getting the opportunity to take a look at the backtrace generated yesterday for the WebView rendering pipeline. I won't copy it out again in full, but it might be worth giving a high-level summary.
#0  PuppetWidgetBase::Invalidate (this=0x7fc4dac130, aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:274
#1  PuppetWidgetBase::UpdateBounds (...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:395
#2  EmbedLiteWindowChild::CreateWidget (this=0x7fc4d626d0)
    at xpcom/base/nsCOMPtr.h:851
#3  RunnableMethodArguments<>::applyImpl...
    at obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
[...]
#28 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
Now that I've mentally parsed the backtrace, it's clearly not as useful as I was hoping. But it is something to go on. The line that's causing the crash is the one with MOZ_CRASH() in it below.
void
PuppetWidgetBase::Invalidate(const LayoutDeviceIntRect &aRect)
{
[...]

  if (mozilla::layers::LayersBackend::LAYERS_CLIENT == lm->GetBackendType()) {
    // No need to do anything, the compositor will handle drawing
  } else {
    MOZ_CRASH("Unexpected layer manager type");
  }
[...]
That means that lm->GetBackendType() is returning something other than LAYERS_CLIENT.

It would be nice to know what value is actually being returned, but it looks like this will be easier said than done with the code in its present form. There's nowhere to place the required breakpoint and no variable to extract it from. The LayerManager is an interface and it's not clear what will be inheriting it at this point.

While I'm on the train it's also particularly challenging for me to do any debugging. It is technically possible and I've done it before, but it requires me to attach USB cables between my devices, which is fine until I lose track of time and find I've arrived at my destination. I prefer to spend my time on the train coding, or reviewing code, if I can.

So I'm going to examine the code visually first. So let's suppose it's EmbedLiteAppProcessParentManager that's inheriting from LayerManager. This isn't an absurd suggestion, it's quite possibly the case. So then the value returned will be a constant:
  virtual mozilla::layers::LayersBackend GetBackendType() override {
    return LayersBackend::LAYERS_OPENGL; }
Again, there's nothing to hang a breakpoint from there. So I've added a debug output so the value can be extracted explicitly.
  LOGW("WEBVIEW: Invalidate LAYERS_CLIENT: %d", lm->GetBackendType());
  if (mozilla::layers::LayersBackend::LAYERS_CLIENT == lm->GetBackendType()) {
    // No need to do anything, the compositor will handle drawing
  } else {
    MOZ_CRASH("Unexpected layer manager type");
  }
There's nothing wrong with this approach, except that it requires a rebuild of the code, which I've just set going. Hopefully it'll forge through the changes swiftly.

In the meantime, let's continue with our thought that the layer manager is of type EmbedLiteAppProcessParentManager and that the method is therefore returning LAYERS_OPENGL. The enum in LayersTypes.h shows that this definitely takes a different value from LAYERS_CLIENT:
enum class LayersBackend : int8_t {
  LAYERS_NONE = 0,
  LAYERS_BASIC,
  LAYERS_OPENGL,
  LAYERS_D3D11,
  LAYERS_CLIENT,
  LAYERS_WR,
  LAYERS_LAST
};
Which does make me wonder how this has come about. Isn't it inevitable that the code will crash in this case?

I'll need to check if either the return value or the test condition has changed since ESR 78. But the other possibility is that it's something else inheriting the LayerManager class.

[...]

Now I'm back home and have access to the debugger. The code is still building — no surprise there — so while I wait let's attache the debugger and see what it throws up.
(gdb) p lm->GetBackendType()
$2 = mozilla::layers::LayersBackend::LAYERS_WR
(gdb) ptype lm
type = class mozilla::layers::LayerManager : public mozilla::layers::FrameRecorder {
  protected:
    nsAutoRefCnt mRefCnt;
[...]
    virtual mozilla::layers::LayersBackend GetBackendType(void);
[...]
  protected:
    ~LayerManager();
[...]
} *
(gdb) p this->GetLayerManager(0, mozilla::layers::LayersBackend::LAYERS_NONE, LAYER_MANAGER_CURRENT)
$2 = (mozilla::layers::LayerManager *) 0x7fc4db1250
Direct examination of the LayerManager doesn't show what the original object type is that's inheriting it. But there is a trick you can do with gdb to get it to tell you:
(gdb) set print object on
(gdb) p this->GetLayerManager(0, mozilla::layers::LayersBackend::LAYERS_NONE, LAYER_MANAGER_CURRENT)
$3 = (mozilla::layers::WebRenderLayerManager *) 0x7fc4db1250
(gdb) set print object off
So the actual type of the layer manager is WebRenderLayerManager. This is clearly a problem, because this will always return LAYERS_WR as its backend type:
  LayersBackend GetBackendType() override { return LayersBackend::LAYERS_WR; }
All this debugging has been useful; so useful in fact that it's made the debug prints I added on the train completely redundant. No matter, I'll leave the build running anyway.

Tomorrow I must find out where the layer manager is being created and also what the layer manager type is on ERS 78 for comparison.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment