List items

Items from the current list are shown below.


5 Oct 2023 : Day 50 #
It's the big five-oh. The last few days things have moved frustratingly slowly, but not through lack of effort. There have been a bunch of glitches that have just taken longer to work through than I was expecting. Which means that the gecko-qtmozembed combination still isn't quite there yet.

There's a bit of compensation today with the result that this is a longer post than usual. I apologise for this, but it is rather the nature of the development process: sometimes it goes fast, sometimes it goes slow.

The QtMozEmbed from yesterday didn't produce any obvious compilation errors, but there was an error during the moc pass that looked like this:
usr/include/xulrunner-qt5-91.9.1/mozilla/MaybeStorageBase.:16: Parse error at "mozilla"
That's a strange error. The filename extension is missing; the path appears to be relative. I guess I was expecting the detailed level of output that comes from gcc, but this isn't gcc, it's moc, which has its own more terse approach.

Right now I'm confused by what's going on. But we're going to work through the investigation.

If you're unfamiliar with what the MOC is, now might be a good time to check out my short explanation from Day 15.

From the build command and error message we can immediately see that the error is happening when the moc is trying to consume the qmozview_p.h file. Within that file we have this line:
#include <mozilla/embedlite/EmbedLiteView.h>
This line appears to be the point at which the failure occurs within the file. It I remove it, the moc pass completes successfully. This file itself has changed very little since ESR 78:
$ diff ./embedding/embedlite/EmbedLiteView.h \
<   virtual void GoBack(bool aRequireUserInteraction, bool aUserActivation);
<   virtual void GoForward(bool aRequireUserInteraction, bool aUserActivation);
>   virtual void GoBack();
>   virtual void GoForward();
But that's not really the point, because the real issue isn't happening in this file. It's happening in a file included from it. Finding out these include chains is a bit of a pain (can anyone recommend a good command-line tool?), but with a bit of manual digging I came up with the following, which is at least one paths through which this problem file gets included.
At which point it hits this line inside MaybeStorageBase.h that causes the error:
namespace mozilla::detail {
This syntax of avoiding deep nesting by using the double colon :: notation was introduced in C++17:
namespace ns-name :: member-name { declarations }
Which, by the Sailfish OS Qt tooling standard is quite new. Could that be it?
$ moc --help
Usage: /usr/lib64/qt5/bin/moc [options] [header-file] [@option-file]
Qt Meta Object Compiler version 67 (Qt 5.6.3)
This file is part of Gecko and I don't want to have to do a complete rebuild to find out, so I'm going to hack around inside the SDK to edit this file manually. I do this by loading the file /usr/include/xulrunner-qt5-91.9.1/mozilla/MaybeStorageBase.h into vim from within the SDK.

Sure enough after I make the edit to split the namespace into nested blocks the moc command goes through. But I still have to try the full build. Let's see.
$ sfdk -c snapshot=temp -c no-pull-build-requires build -d -p
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.N1oCG5
+ umask 022
+ cd /home/flypig/Documents/Development/jolla/qtmozembed
+ /bin/rm -rf /home/deploy/installroot
+ RPM_EC=0
++ jobs -p
+ exit 0
Lovely! The full QtMozEmbed build now goes through without error and generates some nice solid rpm packages. I've updated the Gecko code to properly incorporate this change. I'll need to rebuild the rpm packages again. But to avoid another multi-hour wait I should first try linking mapplauncherd-booster-browser against the new library too.

After manually installing its dependencies, including our new qtmozembed-qt5 and qtmozembed-qt5-devel rpms, the mapplauncherd-booster-browser package builds first time and without any issues. That's not so surprising, because we didn't make any changes to the QtMozEmbed API. But I'm still happy to see it.
$ sfdk -c snapshot=temp -c no-pull-build-requires build -d -p
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.BSqNUV
+ umask 022
+ cd /home/flypig/Documents/Development/jolla/mapplauncherd-booster-browser
+ /bin/rm -rf /home/deploy/installroot
+ RPM_EC=0
++ jobs -p
+ exit 0
This brings us back to where we were on Day 46, but this time armed with a bag of shiny new rpm packages to install alongside our Gecko build. But first, I need to do that Gecko rebuild. That'll need a bit of time to complete.


Build complete. Now it's time to scp them all over to my phone and give it a go.
$ devel-su rpm -U --oldpackage xulrunner-qt5-91.*.rpm qtmozembed-qt5-1.53.*.rpm \
$ sailfish-browser 
sailfish-browser: error while loading shared libraries: cannot open
  shared object file: No such file or directory
Once again, this isn't such a surprise. The library has moved from /usr/lib64/xulrunner-qt5-78.15.1/ to /usr/lib64/xulrunner-qt5-91.9.1/ and at the very least, some path or other likely needs updating. I'll probably need to build sailfish-browser and the other browser packages as well to avoid this. But my gecko build is currently running, so in the meantime I'll work around it like this:
mv /usr/lib64/xulrunner-qt5-91.9.1 /usr/lib64/xulrunner-qt5-78.15.1
Now when I run it, something else happens.
$ sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "" not found
library "" not found
greHome from GRE_HOME:/usr/bin is not found, in /usr/bin/ is not found, in /usr/lib64/xulrunner-qt5-91.9.1/ return fail
Couldn't load XPCOM from 
[F] unknown:0 - ASSERT failure in QMozContextPrivate::QMozContextPrivate(QObject*):
    "Failed load XPCOMGlue", file qmozcontext.cpp, line 67
Redirecting call to abort() to mozalloc_abort

Segmentation fault (core dumped)
There's a file and line specified for the error, which makes things easier. Let's try that in the debugger for good measure.
greHome from GRE_HOME:/usr/bin is not found, in /usr/bin/ is not found, in /usr/lib64/xulrunner-qt5-91.9.1/ return fail
Couldn't load XPCOM from 
[F] unknown:0 - ASSERT failure in QMozContextPrivate::QMozContextPrivate(QObject*):
    "Failed load XPCOMGlue", file qmozcontext.cpp, line 67
Redirecting call to abort() to mozalloc_abort

Thread 1 "sailfish-browse" received signal SIGSEGV, Segmentation fault.
0x0000007fbfb95d78 in mozalloc_abort () from /usr/lib64/
(gdb) bt
#0  0x0000007fbfb95d78 in mozalloc_abort () from /usr/lib64/
#1  0x0000007fbfb95d34 in abort () from /usr/lib64/
#2  0x0000007fb7dd7bec in QMessageLogger::fatal(char const*, ...) const () from
#3  0x0000007fb7de7f70 in qt_assert_x(char const*, char const*, char const*, int)
                          () from /usr/lib64/
#4  0x0000007fbfb666ec in QMozContextPrivate::QMozContextPrivate(QObject*) ()
                          from /usr/lib64/
#5  0x0000007fbfb66808 in QMozContextPrivate::instance() () from
#6  0x0000007fbfb668b8 in QMozContext::QMozContext(QObject*) () from
#7  0x0000007fbfc5378c in SailfishOS::WebEngine::WebEngine(QObject*) () from
#8  0x0000007fbfc537fc in SailfishOS::WebEngine::instance() () from
#9  0x0000007fbfc53a10 in SailfishOS::WebEngine::initialize(QString const&, bool)
                          () from /usr/lib64/
#10 0x0000005555583764 in _start ()
Here's the line causing the problem.
    Q_ASSERT_X(LoadEmbedLite(), __PRETTY_FUNCTION__, "Failed load XPCOMGlue");
The LoadEmbedLite() function lives inside EmbedInitGlue.cpp. It's actually the method that we were playing around with yesterday. It's apparently returning false.
bool LoadEmbedLite(int argc, char** argv)
  // start the glue, i.e. load and link against xpcom shared lib
  std::string xpcomPath = ResolveXPCOMPath(argc, argv);
  BootstrapResult bootstrapResult = mozilla::GetBootstrap(xpcomPath.c_str());
  if (bootstrapResult.isErr()) {
    printf("Couldn't load XPCOM from %s\n", xpcomPath.c_str());
    return false;
  gBootstrap = bootstrapResult.unwrap();
  return true;
Now although this code is part of the gecko source, it's being compiled into the QtMozEmbed code, which is why the error is happening in the library.

Looking through the code where this error is happening (and especially the code in ResolveXPCOMPath) and comparing it against the output from the original failed load, it becomes clear that we need the library to be stored in /usr/lib64/xulrunner-qt5-91.9.1 as well. So let's try this another way.
mv /usr/lib64/xulrunner-qt5-78.15.1 /usr/lib64/xulrunner-qt5-91.9.1
ln -s /usr/lib64/xulrunner-qt5-91.9.1 /usr/lib64/xulrunner-qt5-78.15.1
This simple change allows us to get a fair bit further.
$ sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "" not found
library "" not found
greHome from GRE_HOME:/usr/bin is not found, in /usr/bin/
Created LOG for EmbedLiteTrace
[W] unknown:0 - Unable to open bookmarks
[D] onCompleted:105 - ViewPlaceholder requires a SilicaFlickable parent
Created LOG for EmbedLite
JavaScript error: file:///usr/lib64/mozembedlite/components/
                  EmbedLiteConsoleListener.js, line 251: TypeError:
                  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
                  ContentPermissionManager.js, line 94: TypeError:
                  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
                  EmbedLiteChromeManager.js, line 226: TypeError:
                  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: resource://gre/modules/EnterprisePoliciesParent.jsm, line 500:
                  TypeError: Services.appinfo is undefined
JavaScript error: resource://gre/modules/AddonManager.jsm, line 1479:
                  NS_ERROR_NOT_INITIALIZED: AddonManager is not initialized
JavaScript error: resource://gre/modules/URLQueryStrippingListService.jsm,
                  line 42: TypeError: Services.appinfo is undefined
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
Segmentation fault (core dumped)
Running it in the debugger gives us more info.
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 19550]
0x0000007fbcc9390c in mozilla::embedlite::PuppetWidgetBase::Invalidate
  (this=0x7f8849b2a0, aRect=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
274         MOZ_CRASH("Unexpected layer manager type");
(gdb) bt
#0  0x0000007fbcc9390c in mozilla::embedlite::PuppetWidgetBase::Invalidate
    (this=0x7f8849b2a0, aRect=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
#1  0x0000007fbcc980b8 in mozilla::embedlite::PuppetWidgetBase::UpdateBounds
    (this=0x7f8849b2a0, aRepaint=aRepaint@entry=true)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
#2  0x0000007fbcca12b0 in mozilla::embedlite::EmbedLiteWindowChild::CreateWidget
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/base/nsCOMPtr.h:851
#3  0x0000007fbcc918f8 in mozilla::detail::RunnableMethodArguments<>::applyImpl
    void (mozilla::embedlite::EmbedLiteWindowChild::*)(),
    mozilla::Tuple<>&, std::integer_sequence)
    (args=..., m=, o=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
#4  mozilla::detail::RunnableMethodArguments<>::apply
    m=, o=, this=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154
#5  mozilla::detail::RunnableMethodImpl::Run (this=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201
#6  0x0000007fb9e02b60 in mozilla::RunnableTask::Run (this=0x5555d9fd50)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#28 0x0000007fb79cf89c in ?? () from /lib64/
The error is coming from this check here in PuppetWidgetBase.cpp:
  if (mozilla::layers::LayersBackend::LAYERS_CLIENT == lm->GetBackendType()) {
    // No need to do anything, the compositor will handle drawing
  } else {
    MOZ_CRASH("Unexpected layer manager type");
Happily we can coax the debugger into giving us the value returned by lm->GetBackendType():
(gdb) p lm
$1 = (nsIWidget::LayerManager *) 0x7f8869fae0
(gdb) p lm->GetBackendType()
$2 = mozilla::layers::LayersBackend::LAYERS_WR
So PuppetWidgetBase is expecting a layer backend of type LAYERS_CLIENT, but what we actually have is a backend of type LAYERS_WR. That is, we're getting a WebRender layer manager, when what we want is a Client layer manager. Here are the relevant bits of code. First from LayersTypes.h:
enum class LayersBackend : int8_t {
Then from WebRenderLayerManager.h:
  LayersBackend GetBackendType() override { return LayersBackend::LAYERS_WR; }
And finally from ClientLayerManager.h:
  LayersBackend GetBackendType() override {
    return LayersBackend::LAYERS_CLIENT;
Clearly the wrong type of layer manager is being instantiated. This will take a bit more digging to get to the bottom of.

To try to figure out why the wrong layer manager is being created I've put a breakpoint on the WebRenderLayerManager constructor. That should be enough to get the answer we need:
Thread 8 "GeckoWorkerThre" hit Breakpoint 1,
    (this=0x7f887e5040, aWidget=0x7f887e0dc0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/wr/
38      WebRenderLayerManager::WebRenderLayerManager(nsIWidget* aWidget)
(gdb) bt
#0  mozilla::layers::WebRenderLayerManager::WebRenderLayerManager
    (this=0x7f887e5040, aWidget=0x7f887e0dc0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/wr/
#1  0x0000007fbc06fd98 in nsBaseWidget::CreateCompositorSession
    (this=this@entry=0x7f887e0dc0, aWidth=aWidth@entry=1080,
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  0x0000007fbc074348 in nsBaseWidget::CreateCompositor
    (this=this@entry=0x7f887e0dc0, aWidth=aWidth@entry=1080,
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/widget/nsBaseWidget.cpp:1440
#3  0x0000007fbcc93d58 in mozilla::embedlite::nsWindow::CreateCompositor
    (this=0x7f887e0dc0, aWidth=1080, aHeight=2520)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
#4  0x0000007fbcc92e2c in mozilla::embedlite::nsWindow::CreateCompositor
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
#5  0x0000007fbcc95c40 in mozilla::embedlite::nsWindow::GetLayerManager
    (this=0x7f887e0dc0, aShadowManager=,
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT) at
#6  0x0000007fbcc9387c in nsIWidget::GetLayerManager (this=0x7f887e0dc0) at
#35 0x0000007fb79cf89c in ?? () from /lib64/
Following these breadcrumbs and checking inside nsBaseWidget::CreateCompositorSession() I see the following code, which looks very much like it may be causing this situation:
    RefPtr lm;
    if (options.UseWebRender()) {
      lm = new WebRenderLayerManager(this);
    } else {
      lm = new ClientLayerManager(this);
Now it may be that we actually want the WebRender layer manager. But for now I'm going to stick with what we know and try to get it to produce a Client layer manager instead. At any rate, the bit of code above isn't new. What is new is the way the options.UseWebRender() is getting set. Here's the relevant code in ESR 78:
    bool enableWR =
        gfx::gfxVars::UseWebRender() && WidgetTypeSupportsAcceleration();
Pretty concise. Whereas here's the equivalent code from ESR 91.
    bool supportsAcceleration = WidgetTypeSupportsAcceleration();
    bool enableWR;
    bool enableSWWR;
    if (supportsAcceleration ||
        StaticPrefs::gfx_webrender_unaccelerated_widget_force()) {
      enableWR = gfx::gfxVars::UseWebRender();
      enableSWWR = gfx::gfxVars::UseSoftwareWebRender();
    } else if (gfxPlatform::DoesFissionForceWebRender() ||
                   gfx_webrender_software_unaccelerated_widget_allow()) {
      enableWR = enableSWWR = gfx::gfxVars::UseWebRender();
    } else {
      enableWR = enableSWWR = false;
Not so concise!

Although there's a lot more code here, it's still quite simple. By stepping through the code using the debugger we can find out what some of these values are set to at runtime.
(gdb) p supportsAcceleration
$4 = 
(gdb) p WidgetTypeSupportsAcceleration()
$5 = true
(gdb) p StaticPrefs::gfx_webrender_unaccelerated_widget_force()
No symbol "gfx_webrender_unaccelerated_widget_force" in namespace
(gdb) p gfx::gfxVars::UseWebRender()
Cannot evaluate function -- may be inlined
(gdb) p gfx::gfxVars::sInstance.mRawPtr->mVarUseWebRender.mValue
$15 = true
(gdb) p enableWR
$6 = true
(gdb) p enableSWWR
$7 = 
(gdb) p enableAPZ
$9 = false
(gdb) p options
$10 = {mUseAPZ = false, mUseWebRender = true, mUseSoftwareWebRender = true,
  mAllowSoftwareWebRenderD3D11 = false, mAllowSoftwareWebRenderOGL = false,
  mUseAdvancedLayers = false, mUseWebGPU = false, mInitiallyPaused = false}
The crucial points here are that supportsAcceleration and gfx::gfxVars::UseWebRender() are both true. It looks to me like the latter should be false. According to the code in gfxVars.h the default value is false, so it must be getting set to true elsewhere at runtime.

There's only a couple of places where this can happen, so I'll stick some breakpoints on them to see if they get triggered.

The first one to get triggered is the call to InitWebRenderConfig():
Thread 8 "GeckoWorkerThre" hit Breakpoint 4, gfxPlatform::InitWebRenderConfig
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/thebes/gfxPlatform.cpp:2646
2646    void gfxPlatform::InitWebRenderConfig() {
As it happens inside this method there's a clear difference in approach between the ESR 78 version of the code and the ESR 91 version of the code. The code that sets the web render variable in ESR 78 looks like this:
  if (gfxConfig::IsEnabled(Feature::WEBRENDER)) {
This suggests that the configuration value has to be explicitly set for the web render variable to follow. On the other hand in ESR 91 we have some code that looks like this:
  bool hasHardware = gfxConfig::IsEnabled(Feature::WEBRENDER);
  bool hasSoftware = gfxConfig::IsEnabled(Feature::WEBRENDER_SOFTWARE);
  bool hasWebRender = hasHardware || hasSoftware;
  if (hasWebRender) {
There's a new Feature::WEBRENDER_SOFTWARE variable which is also now allowing the value to be set. And in fact stepping through this code we see exactly what we expect:
(gdb) p hasHardware
$16 = false
(gdb) p hasSoftware
$17 = true
(gdb) p hasWebRender
$19 = true
So there we have it.

This WEBRENDER_SOFTWARE variable is used in quite a few places, so it seems the sensible thing to do here would be to flip the setting so it returns false everywhere. I don't yet know how to do this. But before I do, maybe I can try something by flipping the value just in this spot using the debugger to see what happens.
(gdb) set hasWebRender = false
(gdb) p hasWebRender
$21 = false
(gdb) c
The thing is, at this point, it really does continue. There are no crashes or really serious errors. There's no rendering either! But a bunch of other stuff does still seem to be working.

For example, it downloads the pages, switches between mobile and desktop mode, even allows searching on the page.

The browser actually running. There's no rendering, but it is downloading something.

Make no mistake, this is a thoroughly broken version of the browser. The screen remains stubbornly blank no matter what I do. But given how much I had to cut out of the rendering pipeline in order to get it to build, this is no real surprise.

I'm pretty excited by this. I don't want to imply there isn't a huge amount of work still to be done. There really is. But the fact that it's not just flat-out crashing is a positive. It means we're on the right track.

But this is also enough for today. I need to figure out how to set this WEBRENDER_SOFTWARE variable properly, not just forcing it using the debugger. I also need to write up all the tickets that I've yet to detail. This second task is now all the more important given things are at the stage where others may be able to contribute as well. That's for tomorrow.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.


Uncover Disqus comments