List items

Items from the current list are shown below.


8 Sep 2023 : Day 23 #
It's a bit of a long post today, happily the result of some good progress. But let me warn you in advance that we're going to be using the debugger today!

Yesterday evening I was trying to find a way to replicate the functionality of the now-defunct OnChannelConnected() override, which has been removed upstream in an effort to reduce the requirement to spawn processes when sending messages.

This morning I've spent some time comparing OnChannelConnected() usage in ESR 78 with OnChannelConnected() usage in ESR 91. My thinking is that, if I can find how some other upstream code changed as a result of OnChannelConnected() having been removed, I should be able to apply similar changes to the EmbedLite code for ESR 91.

The problem I've hit is that OnChannelConnected() exists in other places (and for other users) and so is still present in quite a few places. That's making it hard to determine whether a change in the ESR 91 code relates to the same removal of OnChannelConnected() that's affecting the EmbedLite code. That, and the fact that I've not yet found a good example that shows a clear way to safely switch from OnChannelConnected() to something else.

What's more, just by looking at the code I'm not even able to determine sensible where exactly OnChannelConnected() gets called from.

So it's time to crack out a new tool: the debugger. Until our ESR 91 is being fully built without errors I won't be able to run the debugger on it. But I can run it on the ESR 78 code. That means I can use gdb to figure out where the method gets called in the current version of the browser and what the call stack is when it does. Since manually reviewing the code isn't giving me a sense of this, watching it run should give a much better idea of what's happening.

To do this I need to install the debug symbols for current version of the browser, then run it with debugging enabled. I'm also setting the EMBED_CONSOLE environment variable so that I get good debug output from xulrunner (which is the name of the binary output from all this gecko code).
$ devel_su zypper install xulrunner-qt5-debugsource xulrunner-qt5-debuginfo
$ EMBED_CONSOLE=1 gdb sailfish-browser
I set a breakpoint on EmbedLiteAppProcessParent::OnChannelConnected and a few other pertinent places to try to figure out what's going on. Here are the breakpoint locations:
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000007ff4d85588 in EmbedLiteAppProcessParent::OnChannelConnected(int)
                                                   at EmbedLiteAppProcessParent.cpp:162
2       breakpoint     keep y   0x0000007ff4d906f8 in EmbedLiteAppProcessParent::EmbedLiteAppProcessParent()
                                                   at EmbedLiteAppProcessParent.cpp:107
3       breakpoint     keep y            
3.1                         y   0x0000007ff4d90be8 in EmbedLiteApp::StartChild(EmbedLiteApp*)
                                                   at EmbedLiteApp.cpp:174
3.2                         y   0x0000007ff4d90c60 in EmbedLiteApp::StartChild(EmbedLiteApp*)
                                                   at EmbedLiteApp.cpp:177
4       breakpoint     keep y   0x0000007ff4d8a230 in EmbedLiteApp::StartWithCustomPump
                                                      (EmbedLiteApp::EmbedType, EmbedLiteMessagePump*)
                                                   at EmbedLiteApp.cpp:208
        breakpoint already hit 1 time
5       breakpoint     keep y   0x0000007ff4d8a230 in EmbedLiteApp::StartWithCustomPump
                                                      (EmbedLiteApp::EmbedType, EmbedLiteMessagePump*)
                                                   at EmbedLiteApp.cpp:208
        breakpoint already hit 1 time
The interesting thing is that when I run it like this OnChannelConnected() doesn't get called at all. Neither at start up nor during general use. In fact the EmbedLiteAppProcessParent class is never getting instantiated at all.

From the code I can see that if it were to be instantiated, this would happen in the EmbedLiteApp::StartChild() method. There's a condition in the code that means it's only used if aApp->mEmbedType == EMBED_PROCESS:
EmbedLiteApp::StartChild(EmbedLiteApp* aApp)
  NS_ASSERTION(aApp->mState == STARTING, "Wrong timing");
  if (aApp->mEmbedType == EMBED_THREAD) {
    if (!aApp->mListener ||
        !aApp->mListener->ExecuteChildThread()) {
      // If toolkit hasn't started a child thread we have to create the thread on our own
      aApp->mSubThread = new EmbedLiteSubThread(aApp);
      if (!aApp->mSubThread->StartEmbedThread()) {
        LOGE("Failed to start child thread");
  } else if (aApp->mEmbedType == EMBED_PROCESS) {
    aApp->mAppParent = EmbedLiteAppProcessParent::CreateEmbedLiteAppProcessParent();
But as the debugging shows, when we run the browser the value is set to EMBED_THREAD so the constructor never gets called. Here's the debugger output that shows us this when our breakpoint on StartChild() is triggered (I've chopped out some of the backtrace for brevity):
Thread 1 "sailfish-browse" hit Breakpoint 3, EmbedLiteApp::StartChild (aApp=0x555587e6e0)
at mobile/sailfishos/EmbedLiteApp.cpp:174
174       LOGT();
(gdb) bt
#0  EmbedLiteApp::StartChild (aApp=0x555587e6e0) at mobile/sailfishos/EmbedLiteApp.cpp:174
#1  0x0000007ff4d83d30 in details::CallFunction<0ul, void (*)(EmbedLiteApp*), EmbedLiteApp*>
    (arg=..., function=) at ipc/chromium/src/base/task.h:52
#2  DispatchTupleToFunction
    (arg=..., function=) at ipc/chromium/src/base/task.h:53
#3  RunnableFunction >::Run
    (this=) at ipc/chromium/src/base/task.h:324
#4  0x0000007ff22abd48 in MessageLoop::RunTask (aTask=..., this=0x55556d6030)
    at ipc/chromium/src/base/
#25 0x0000007fefc41608 in QEventLoop::exec(QFlags) ()
    from /usr/lib64/
#26 0x0000007fefc491d4 in QCoreApplication::exec() () from /usr/lib64/
#27 0x000000555557b360 in main ()
(gdb) p aApp->mEmbedType
$1 = EmbedLiteApp::EMBED_THREAD
This value is being passed in from code that's external to the library:
Thread 1 "sailfish-browse" hit Breakpoint 2, EmbedLiteApp::StartWithCustomPump
    aEmbedType=EmbedLiteApp::EMBED_THREAD, aEventLoop=0x55558804f0)
    at mobile/sailfishos/EmbedLiteApp.cpp:208
208     {
(gdb) bt
#0  EmbedLiteApp::StartWithCustomPump (this=0x555587d950,
    aEventLoop=0x55558804f0) at mobile/sailfishos/EmbedLiteApp.cpp:208
#1  0x0000007ff7ea2e24 in ?? () from /usr/lib64/
#2  0x0000007fefc6dc6c in QObject::event(QEvent*) () from /usr/lib64/
#11 0x0000007fefc41608 in QEventLoop::exec(QFlags) ()
    from /usr/lib64/
#12 0x0000007fefc491d4 in QCoreApplication::exec() () from /usr/lib64/
#13 0x000000555557b360 in main ()
(gdb) p aEmbedType
$1 = EmbedLiteApp::EMBED_THREAD
In fact, it's coming from QtMozEmbed which is a separate package. Looking through QtMozEmbed it's clear that it's always EMBED_THREAD that's used, never EMBED_PROCESS. Whichever way gecko is run on Sailfish OS, whether as the Sailfish Browser or as a WebView component, it's always wrapped with QtMozEmbed. So I'm pretty convinced from this that the process route is never used on Sailfish OS.

This is a useful realisation because it gives me a bit more leeway in handling this code. There might be an argument for us dropping the EmbedLiteAppProcessParent class entirely. But that's not the purpose of the work I'm doing right now, so I'll just log that as an issue and otherwise not worry too much about fixing any changes in the code that become too involved.

In particular, I think I'm just going to remove OnChannelConnected() entirely.

Now we can move on to the next error in the epic list of errors that were generated by the build yesterday.
229:05.23 $PROJECT/gecko-dev/mobile/sailfishos/embedprocess/EmbedLiteAppProcessChild.cpp:
          In member function ‘bool EmbedLiteAppProcessChild::Init(MessageLoop*,
          base::ProcessId, PEmbedLiteAppChild::UniquePtr&>::type, base::ProcessId&, MessageLoop*&)’
229:05.24    if (!Open(std::move(aChannel), aParentPid, aIOLoop)) {
229:05.24                                                      ^
It appears from the ESR 78 code that the Open() method should be coming from IToplevelProtocol in ProtocolUtils.h.

The calling signature is this:
bool Open(UniquePtr, base::ProcessId, MessageLoop*)
And in Transport.h we find this:
typedef IPC::Channel Transport;
So that this matches the signature in the IToplevelProtocol class:
bool Open(UniquePtr aTransport, base::ProcessId aOtherPid,
          MessageLoop* aThread = nullptr,
          mozilla::ipc::Side aSide = mozilla::ipc::UnknownSide);
In the new code the signature of this method has changed to the following:
bool Open(ScopedPort aPort, base::ProcessId aOtherPid);
This change happened upstream in two stages. First the new version was added in diff D112775. Then the original was removed in diff D116671

The more interesting diff, however, is diff D112777, since that's where the changeover from one to the other seems to have happened. Looking at that we have an analogous situation where IOThreadChild::TakeChannel() is being replaced with IOThreadChild::TakeInitialPort() and the MessageLoop parameter is just dropped entirely. There's also this comment related to the latter:
Looks like the aIOLoop parameters have been useless for some time, although some places used to use it to open their MessageChannels here (always redundantly, since they always pass the IO thread and it is the default) and in the rest of these process classes. Can we drop these parameters, too?

This gives me confidence that it's safe to do the same here and drop the parameter. So that gives me a clear template for how to update the EmbedLite code.

Having made the changes based on this template, here's the next error in the list:
229:05.35 In file included from $PROJECT/gecko-dev/mobile/sailfishos/embedprocess/
229:05.35                  from Unified_cpp_mobile_sailfishos0.cpp:65:
229:05.35 $PROJECT/gecko-dev/mobile/sailfishos/embedprocess/
          EmbedLiteCompositorProcessParent.h: At global scope:
229:05.35 $PROJECT/gecko-dev/mobile/sailfishos/embedprocess/
          EmbedLiteCompositorProcessParent.h:21:102: error: ISO C++ forbids
          declaration of
          with no type [-fpermissive]
229:05.35                                                                                                       ^
229:05.35 $PROJECT/gecko-dev/mobile/sailfishos/embedprocess/
          EmbedLiteCompositorProcessParent.h:21:102: error: expected ‘;’ at end
          of member declaration
229:05.35                                                                                                       ^
229:05.35                                                                                                        ;
Wow, this is quite something:
That's one of the longest preprocessor defines I've yet come across. Let's find out what's wrong with it (apart from the length!).

Immediately I notice that it doesn't exist anywhere in the ESR 91 code apart from this one place. But it does exist in several other place in the ESR 78 code, such as in MediaManager.cpp. That's our route in.

Checking the difference between the two it looks very much like the name of this macro has been changed to the equally cumbersome NS_INLINE_DECL_THREADSAFE_REFCOUNTING_WITH_DELETE_ON_MAIN_THREAD. The git history should confirm this:
$ git log -1 6d17703514ba8
commit 6d17703514ba811f8800e2eb57c82df7ee0a08b2
Author: Nika Layzell 
Date:   Mon Dec 14 18:30:51 2020 +0000

    Bug 1678463 - Part 1: Add _WITH_DELETE_ON_EVENT_TARGET macros to nsISupportsImpl, r=mccr8

    This also migrates all existing users of _WITH_MAIN_THREAD_DESTRUCTION to the
    new macro in nsISupportsImpl.

    Differential Revision:
The related diff also makes this clear. Given the impressive length of the define, it's worth taking a moment to marvel at how large this header file naming is as well.
#include "ThreadSafeRefcountingWithMainThreadDestruction.h"
A sight to behold.

Sightseeing done, I can apply the same change to EmbedLiteCompositorProcessParent. The next error is this:
229:05.35 In file included from $PROJECT/gecko-dev/mobile/sailfishos/embedprocess/
229:05.35                  from Unified_cpp_mobile_sailfishos0.cpp:65:
229:05.35 $PROJECT/gecko-dev/mobile/sailfishos/embedprocess/
          EmbedLiteCompositorProcessParent.h:27:35: error: ‘virtual
          mozilla::ipc::IPCResult EmbedLiteCompositorProcessParent::
          marked ‘override’, but does not override
229:05.36    virtual mozilla::ipc::IPCResult RecvGetFrameUniformity(FrameUniformityData* aOutData) override { return IPC_OK(); }
229:05.36                                    ^~~~~~~~~~~~~~~~~~~~~~
This RecvGetFrameUniformity method is being inherited from CompositorBridgeParent in ESR 78, but it's been removed in the upstream ESR 91 code. Let's find out which commit removed it.
$ git log -1 -S "RecvGetFrameUniformity" gfx/layers/ipc/CompositorBridgeParent.h
commit 39167fa7cd9e9cc1c076ad8c1bcddeca1d4fd82e
Author: Kartikaya Gupta 
Date:   Wed Aug 5 21:42:06 2020 +0000

    Bug 1251612 - Support the GetFrameUniformity API in content processes. r=botond,froydnj
    This moves the IPC mechanism from PCompositorBridge to PLayerTransaction/
    PWebRenderBridge, so that it can be used by content processes like the other
    test APIs. It still only produces actual data for the layers backend; for
    WR it will just return empty datasets.
    Differential Revision:
Looking through these changes, it seems RecvGetFrameUniformity() has been removed, while GetFrameUniformity() has been added. The method in EmbedLiteCompositorProcessParent is just a stub anyway, so it looks safe to make this same change there.

But that's one for tomorrow. For this evening it's time to sign off! My apologies for the excessive length today, but it's good to make the progress.

As always, for other posts, check out my full Gecko Dev Diary.


Uncover Disqus comments