flypig.co.uk

List items

Items from the current list are shown below.

Blog

All items from December 2023

31 Dec 2023 : Day 124 #
Here in the UK as I write this it's verging on 2024. From the fireworks outside I can tell that new year celebrations have already started for some. But I still have just enough time to squeeze in a final Gecko Dev Diary for 2023.


 

What I'm really looking forward to in 2024 is not having to worry about the printing pipeline any more. And thankfully we're reaching the end game for this particular issue. But there are still plenty more things to work on to get this ESR 91 version of the engine working as well as ESR 78, so this is far from being close to my final post on the topic.

So please expect posts to continue as we fly headlong into the new year. To everyone who has been following along over the last 124 days, thank you so much for your time, commitment, Mastodon favourites, boosts, generous comments and generally hugely motivational interactions. It's been a wonderful experience for me.

But I mustn't get carried away when there's development to be done, so let's continue with today's post.

Following the plan I cooked up to suspend rendering when the "hidden" window is created, yesterday I attempted to avoid the flicker at the point when the user selects the "Save page to PDF" option. The approach failed, so I'm dropping the idea. It doesn't change the functionality and, in retrospect, suspending rendering was always likely to leave the display showing the background colour rather than the last thing rendered, so without a fair bit more engineering it was always likely to fail.

That means I'm moving on to refactoring the QML and DownloadPDFSaver code today.

To kick things off I've worked through all the changes I made to gecko-dev to expose the isForPrinting flag, converting it to a hidden flag in the process. All of these changes live in the EmbedLite portion of the code, which is ideally what we want because it avoids having to patch the upstream gecko-dev code itself.

I've also committed the changes to qtmozembed and sailfish-browser that are needed to support this too.

The next step is to backtrack and take another look at the changes made to gecko-dev that we inserted in order to support printing. This is covered by two commits the first of which, now I look at it, I made right at the start of December. It makes me think that I've been stalled on these printing changes for far too long.
$ git log -2
commit 2fb912372c0475c1ca84c381cf9927f75fe32595
    (HEAD -> FIREFOX_ESR_91_9_X_RELBRANCH_patches)
Author: David Llewellyn-Jones <david@flypig.co.uk>
Date:   Tue Dec 19 20:22:51 2023 +0000

    Call Print from the CanonicalBrowingContext
    
    Call Print from the CanonicalBrowsingContext rather than on the
    Ci.nsIWebBrowserPrint of the current window.

commit 26259399358f14e9695d7b9497aeb3a8577285a9
Author: David Llewellyn-Jones <david@flypig.co.uk>
Date:   Tue Dec 5 22:29:55 2023 +0000

    Reintroduce PDF printing code
    
    Reintroduces code to allow printing of a window. This essentially
    reverts the following three upstream commits:
    
    https://phabricator.services.mozilla.com/D84012
    
    https://phabricator.services.mozilla.com/D84137
    
    https://phabricator.services.mozilla.com/D83264
The plan was always to try to move the reverted changes in the "Reintroduce PDF printing code" from the gecko-dev code and into the EmbedLite code. More specifically, moving the changes in DownloadCore.jsm to EmbedLiteDownloadManager.js. This may turn out not to be practical, but I'd like to give it a go.

The DownloadPDFSaver class prototype itself looks pretty self-contained, so moving that to EmbedliteDownloadManager.js looks plausible. However there's also deserialisation code which looks a bit more baked in. In order to move the code, we'd have to perform the deserialisation in EmbedliteDownloadManager.js as well.

Thankfully I looked through this code quite carefully already on Day 99 at the start of December. It's these situations in which I'm glad to have recorded these notes, because reading through the post means I don't have to dig through the code all over again.

The flow is the following:
  1. EmbedliteDownloadManager.observe() in EmbedliteDownloadManager.js.
  2. Downloads.createDownload() in Downloads.jsm.
  3. Download.fromSerializable() in DownloadCore.jsm.
  4. DownloadSource.fromSerializable() and DownloadTarget.fromSerializable() in DownloadCore.jsm.
  5. DownloadSaver.fromSerializable() in DownloadCore.jsm.
  6. DownloadPDFSaver.fromSerializable() in DownloadCore.jsm.
The aim is to move DownloadPDFSaver into EmbedliteDownloadManager.js. But in order for this to work all of the steps between will need moving there too. In practice most of the logic in the intermediate fromSerializable() methods are conditions on the contents of the data passed in. If we're moving DownloadPDFSaver into EmbedliteDownloadManager.js then most of the logic becomes redundant because we'll only have to cover the one case. Moreover we don't really need to perform any deserialisation: we can just configure the DownloadPDFSaver class with the values we have directly.

So it looks like it should be straightforward, but will need a little care and testing.

I've hatched a plan for how to proceed that will start with me moving the DownloadPDFSaver code, then collecting together the data to be configured into this, all of which comes from the following "serialised" data structure that's created in EmbedliteDownloadManager.js:
{
  source: Services.ww.activeWindow,
  target: data.to,
  saver: "pdf",
  contentType: "application/pdf"
}
Thankfully that's not a lot of data to have to deal with. We'll need to end up with a promise that resolves to a Download structure that will look like this:
download = Download
{
	source: {
	    url: Services.ww.activeWindow.location.href,
	    isPrivate: PrivateBrowsingUtils.isContentWindowPrivate(
	        Services.ww.activeWindow
	    ),
	    windowRef: Cu.getWeakReference(Services.ww.activeWindow),
	},
	target: {
	    path: data.to,
	},
	saver: DownloadPDFSaver(
	    download: download,
	),
	contentType: "application/pdf",
	saveAsPdf: true,
}
That's simply the structure you get when you follow all of the fromSerializable() steps through the code. Once DownloadPDFSaver has been moved it doesn't look like there's anything else there that can't be accessed from the EmbedliteDownloadManager.js code.

But we'll find that out tomorrow when I actually try to implement this. Roll on 2024!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
30 Dec 2023 : Day 123 #
I finished yesterday in high spirits, having figured out what was causing the hang on switching to private mode although, to be clear, that's not the same thing as having a solution. Nevertheless this puts us in a good position with the printing situation, because it means the window hiding code isn't causing the hang and it should now be pretty straightforward to get it to a state where it's ready to be merged in.

I concluded my post yesterday by highlighting four tasks which I now need to focus on to get this over the line.
  1. Check whether the flicker can be removed by skipping the activation of the hidden window.
  2. Tidy up the QML implementation of the new proxy filter model.
  3. Move the DownloadPDFSaver class from DownloadCore.js to EmbedliteDownloadManager.js.
  4. Record an issue for the setBrowserCover() hang.
I'm going to spend a bit of time on the first of these today. Unfortunately I don't have as much time to spend on this today as I do usually, so this will be a terse investigation. And if it doesn't pan out then my plan is to simply drop the idea rather than spend any more time trying to figure something more complex out. So this may not work out and in case it doesn't, well, that's just fine.

The bits I'm interested in are the following, which are part of qtmozembed and can be found in the qopenglwebpage.cpp file:
void QOpenGLWebPage::suspendView()
{
    if (!d || !d->mViewInitialized) {
        return;
    }
    setActive(false);
    d->mView->SuspendTimeouts();
}

void QOpenGLWebPage::resumeView()
{
    if (!d || !d->mViewInitialized) {
        return;
    }
    setActive(true);

    // Setting view as active, will reset RefreshDriver()->SetThrottled at
    // PresShell::SetIsActive (nsPresShell). Thus, keep on throttling
    // if should keep on throttling.
    if (mThrottlePainting) {
        d->setThrottlePainting(true);
    }

    d->mView->ResumeTimeouts();
}
I'm interested in these because of the flicker that happens when the hidden page is initially created. If we can arrange things so that the page is in a suspended state when it's created and never resumed, then rendering may never take place and it's possible the flicker can be avoided. The aim here is to suspend the rendering but not execution of the page itself: we want the page to continue running JavaScript and doing whatever else it needs to do in the background. It's only the rendering we want to avoid.

These methods are called from a couple of places both of which appear to be in the webpages.cpp file of sailfish-browser. Here's one of them:
WebPageActivationData WebPages::page(const Tab& tab)
{
    const int tabId = tab.tabId();

    if (m_activePages.active(tabId)) {
        DeclarativeWebPage *activePage = m_activePages.activeWebPage();
        activePage->resumeView();
        return WebPageActivationData(activePage, false);
    }

    DeclarativeWebPage *webPage = 0;
    DeclarativeWebPage *oldActiveWebPage = m_activePages.activeWebPage();
    if (!m_activePages.alive(tabId)) {
        webPage = m_pageFactory->createWebPage(m_webContainer, tab);
        if (webPage) {
            m_activePages.prepend(tabId, webPage);
        } else {
            return WebPageActivationData(nullptr, false);
        }
    }

    DeclarativeWebPage *newActiveWebPage = m_activePages.activate(tabId);
    updateStates(oldActiveWebPage, newActiveWebPage);

    if (m_memoryLevel == MemCritical) {
        handleMemNotify(m_memoryLevel);
    }

    return WebPageActivationData(newActiveWebPage, true);
}
And here's the other (notice that this gets called in the code above):
void WebPages::updateStates(DeclarativeWebPage *oldActivePage,
    DeclarativeWebPage *newActivePage)
{
    if (oldActivePage) {
        // Allow suspending only the current active page if it is not the
        // creator (parent).
        if (newActivePage->parentId() != (int)oldActivePage->uniqueId()) {
            if (oldActivePage->loading()) {
                oldActivePage->stop();
            }
            oldActivePage->suspendView();
        } else {
            // Sets parent to inactive and suspends rendering keeping
            // timeouts running.
            oldActivePage->setActive(false);
        }
    }

    if (newActivePage) {
        newActivePage->resumeView();
        newActivePage->update();
    }
}
The second of these we've seen before (I was playing around with this block of code on Day 118). In order to try to get the page to stay inactive I've amended the WebPages::page() method so that rather than calling updateStates() in all circumstances, it now does something slightly different if the tab is hidden:
    DeclarativeWebPage *newActiveWebPage = m_activePages.activate(tabId);
    if (tab.hidden()) {
        newActiveWebPage->setActive(false);
        newActiveWebPage->suspendView();
    } else {
        updateStates(oldActiveWebPage, newActiveWebPage);
    }
The idea here is that if the tab is hidden it will be set to inactive and suspended rather than being activated through the call to updateStates().

The theory looks plausible, but the practice shows otherwise: after making this change there's still a visible flicker when the hidden page is created and then immediately hidden.

However, while investigating this I also notice that the WebView.qml has this readyToPaint value that also controls whether rendering occurs:
    readyToPaint: resourceController.videoActive ? webView.visible
        && !resourceController.displayOff : webView.visible
        && webView.contentItem
        && (webView.contentItem.domContentLoaded || webView.contentItem.painted)
If I comment out this line, so that readyToPaint is never set to true then I find none of the pages render at all. That's good, because it means that using this flag it may be possible to avoid the rendering of the page that's causing the flicker.

I've added some extra variables into the class so that when a hidden page is created the readyToPaint value is skipped on the next occasion it's set. This is a hack and if this works I'll need to figure out a more satisfactory approach, but for testing this might be enough.

Unfortunately my attempts to control it fail: it has precisely the opposite effect, so that the page turns white and then rendering never starts up again. I'm left with a blank screen and no page being rendered at all.

I give it one more go, this time with a little more sophistication in my approach. Essentially I'm recording the hidden state, skipping any readyToPaint updates while it's set, then restoring the flag as soon as the page has reverted back to the non-hidden page.

Now the rendering state is at least restored afterwards, but there's still a flicker on screen as the page appears and then is hidden. And when I check the debug output it's clear that there's no change of readyToPaint state occurring between the time the new page is created and the old page is reinstated. During this time the rendering state is set to false.

So I don't think there's anything more to test here. Suspending the page appears to still render to the screen as a white page, rather than simply leaving behind the page that was there before. This shouldn't be such a surprise; the texture used for rending is almost certainly getting cleared somewhere, even when rendering is suspended.

But this task simply isn't worth spending more time on. Maybe at some point in the future the path to avoiding the new page render for a frame will become clearer, but in the meantime the impact on the user is minimal.

So, tomorrow I'll get started on cleaning up the QML and JavaScript code so that this can all be finalised.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
29 Dec 2023 : Day 122 #
Over the last couple of days the plan had been to wrap up creating a QSortFilterProxyModel to hide the "hidden" tabs, stick it in the QML and call it a good 'un. But while doing that I discovered that switching from persistent to private tab mode causes the user interface thread of the browser to hang.

It's turning out to be quite nasty because, since there's no crash, the debugger won't tell us where the problem is. Looking through the code carefully hasn't revealed anything obvious either.

So to try to figure out what's going on I have to wait until the hang occurs, manually break the execution (using Ctrl-c) and then check each of the threads to find out what they're up to.

First of all, let's find out what threads are running.
(gdb) c
Continuing.
[Parent 23887: Unnamed thread 7f88002670]: E/EmbedLite NON_IMPL:
    EmbedLite::virtual nsresult WebBrowserChrome::GetDimensions(uint32_t,
    int32_t*, int32_t*, int32_t*, int32_t*):542 GetView dimensitions
[New LWP 26091]
^C
Thread 1 "sailfish-browse" received signal SIGINT, Interrupt.
0x0000007fb7bcf718 in pthread_cond_wait () from /lib64/libpthread.so.0
(gdb) info thread
  Id   Target Id                   Frame 
* 1    LWP 23887 "sailfish-browse" 0x0000007fb7bcf718 in pthread_cond_wait ()
  2    LWP 24132 "QQmlThread"      0x0000007fb78a8740 in poll ()
  3    LWP 24133 "QDBusConnection" 0x0000007fb78a8740 in poll ()
  4    LWP 24134 "gmain"           0x0000007fb78a8740 in poll ()
  5    LWP 24135 "dconf worker"    0x0000007fb78a8740 in poll ()
  6    LWP 24137 "gdbus"           0x0000007fb78a8740 in poll ()
  7    LWP 24147 "QThread"         0x0000007fb78a8740 in poll ()
  8    LWP 24149 "GeckoWorkerThre" StringMatch (text=0x7f9df3bf40,
                                   pat=0x7f9e0400c0, start=start@entry=0)
                                   at js/src/builtin/String.cpp:1944
  10   LWP 24151 "IPC I/O Parent"  0x0000007fb78ade24 in syscall ()
  11   LWP 24152 "QSGRenderThread" 0x0000007fb7bcf718 in pthread_cond_wait ()
  12   LWP 24153 "Netlink Monitor" 0x0000007fb78a8740 in poll ()
  13   LWP 24154 "Socket Thread"   0x0000007fb78a8740 in poll ()
  15   LWP 24156 "TaskCon~read #0" 0x0000007fb7bcf718 in pthread_cond_wait ()
  16   LWP 24157 "TaskCon~read #1" 0x0000007fb7bcf718 in pthread_cond_wait ()
  17   LWP 24158 "TaskCon~read #2" 0x0000007fb7bcf718 in pthread_cond_wait ()
  18   LWP 24159 "TaskCon~read #3" 0x0000007fb7bcf718 in pthread_cond_wait ()
  19   LWP 24160 "TaskCon~read #4" 0x0000007fb7bcf718 in pthread_cond_wait ()
  20   LWP 24161 "TaskCon~read #5" 0x0000007fb7bcf718 in pthread_cond_wait ()
  21   LWP 24162 "TaskCon~read #6" 0x0000007fb7bcf718 in pthread_cond_wait ()
  22   LWP 24163 "TaskCon~read #7" 0x0000007fb7bcf718 in pthread_cond_wait ()
  24   LWP 24165 "Timer"           0x0000007fb7bcfb80 in pthread_cond_timedwait ()
  25   LWP 24167 "IPDL Background" 0x0000007fb7bcf718 in pthread_cond_wait ()
  26   LWP 24168 "Cache2 I/O"      0x0000007fb7bcf718 in pthread_cond_wait ()
  27   LWP 24169 "Cookie"          0x0000007fb7bcf718 in pthread_cond_wait ()
  32   LWP 24174 "Worker Launcher" 0x0000007fb7bcf718 in pthread_cond_wait ()
  33   LWP 24175 "QuotaManager IO" 0x0000007fb7bcf718 in pthread_cond_wait ()
  35   LWP 24177 "Softwar~cThread" 0x0000007fb7bcfb80 in pthread_cond_timedwait ()
  36   LWP 24178 "Compositor"      0x0000007fb7bcf718 in pthread_cond_wait ()
  37   LWP 24179 "ImageIO"         0x0000007fb7bcf718 in pthread_cond_wait ()
  38   LWP 24181 "DOM Worker"      0x0000007fb7bcf718 in pthread_cond_wait ()
  40   LWP 24183 "ImageBridgeChld" 0x0000007fb7bcf718 in pthread_cond_wait ()
  42   LWP 24185 "Permission"      0x0000007fb7bcf718 in pthread_cond_wait ()
  43   LWP 24186 "TRR Background"  0x0000007fb7bcf718 in pthread_cond_wait ()
  44   LWP 24187 "URL Classifier"  0x0000007fb7bcf718 in pthread_cond_wait ()
  48   LWP 24191 "ProxyResolution" 0x0000007fb7bcf718 in pthread_cond_wait ()
  49   LWP 24193 "mozStorage #1"   0x0000007fb7bcf718 in pthread_cond_wait ()
  50   LWP 24195 "HTML5 Parser"    0x0000007fb7bcf718 in pthread_cond_wait ()
  51   LWP 24196 "localStorage DB" 0x0000007fb7bcf718 in pthread_cond_wait ()
  53   LWP 24198 "StyleThread#0"   0x0000007fb7bcf718 in pthread_cond_wait ()
  54   LWP 24199 "StyleThread#1"   0x0000007fb7bcf718 in pthread_cond_wait ()
  55   LWP 24200 "StyleThread#2"   0x0000007fb7bcf718 in pthread_cond_wait ()
  56   LWP 24202 "StyleThread#3"   0x0000007fb7bcf718 in pthread_cond_wait ()
  57   LWP 24203 "StyleThread#4"   0x0000007fb7bcf718 in pthread_cond_wait ()
  58   LWP 24204 "StyleThread#5"   0x0000007fb7bcf718 in pthread_cond_wait ()
  65   LWP 24293 "mozStorage #2"   0x0000007fb7bcf718 in pthread_cond_wait ()
  66   LWP 24294 "mozStorage #3"   0x0000007fb7bcf718 in pthread_cond_wait ()
  69   LWP 24942 "Backgro~Pool #5" 0x0000007fb7bcfb80 in pthread_cond_timedwait ()
  70   LWP 26091 "QSGRenderThread" 0x0000007fb78a8740 in poll ()
That's rather a lot of threads to check through, but nonetheless it's still likely to be our most fruitful approach right now. We can ask the debugger to print a backtrace for every single thread by calling thread apply all bt.

I won't copy out all of the resulting output here. Instead I'll include just the interesting backtraces and summarise the others.
(gdb) thread apply all bt

Thread 70 (LWP 26091):
#0  0x0000007fb78a8740 in poll () from /lib64/libc.so.6
#1  0x0000007fafb38bfc in ?? () from /usr/lib64/libwayland-client.so.0
#2  0x0000007fafb3a258 in wl_display_dispatch_queue ()
    from /usr/lib64/libwayland-client.so.0
#3  0x0000007faf885204 in WaylandNativeWindow::readQueue(bool) ()
    from /usr/lib64/libhybris//eglplatform_wayland.so
#4  0x0000007faf8843ec in WaylandNativeWindow::finishSwap() ()
    from /usr/lib64/libhybris//eglplatform_wayland.so
#5  0x0000007fb73f9210 in _my_eglSwapBuffersWithDamageEXT ()
    from /usr/lib64/libEGL.so.1
#6  0x0000007fafa4e080 in ?? () from
    /usr/lib64/qt5/plugins/wayland-graphics-integration-client/libwayland-egl.so
#7  0x0000007fb88e5180 in QOpenGLContext::swapBuffers(QSurface*) ()
    from /usr/lib64/libQt5Gui.so.5
#8  0x0000007fb8e64c68 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fb8e6ac10 in ?? () from /usr/lib64/libQt5Quick.so.5
#10 0x0000007fb7ce20e8 in ?? () from /usr/lib64/libQt5Core.so.5
#11 0x0000007fb7bc8a4c in ?? () from /lib64/libpthread.so.0
#12 0x0000007fb78b289c in ?? () from /lib64/libc.so.6

[...]

Thread 11 (LWP 24152):
#0  0x0000007fb7bcf718 in pthread_cond_wait () from /lib64/libpthread.so.0
#1  0x0000007fb7ce2924 in QWaitCondition::wait(QMutex*, unsigned long) ()
    from /usr/lib64/libQt5Core.so.5
#2  0x0000007fb8e6a7cc in ?? () from /usr/lib64/libQt5Quick.so.5
#3  0x0000007fb8e6ac60 in ?? () from /usr/lib64/libQt5Quick.so.5
#4  0x0000007fb7ce20e8 in ?? () from /usr/lib64/libQt5Core.so.5
#5  0x0000007fb7bc8a4c in ?? () from /lib64/libpthread.so.0
#6  0x0000007fb78b289c in ?? () from /lib64/libc.so.6

Thread 10 (LWP 24151):
#0  0x0000007fb78ade24 in syscall () from /lib64/libc.so.6
#1  0x0000007fba03cdd0 in epoll_wait (epfd=<optimized out>,
    events=events@entry=0x7f8c0018a0, maxevents=<optimized out>,
    timeout=timeout@entry=-1)
    at ipc/chromium/src/third_party/libevent/epoll_sub.c:62
#2  0x0000007fba03f7a0 in epoll_dispatch (base=0x7f8c0015e0, tv=<optimized out>)
    at ipc/chromium/src/third_party/libevent/epoll.c:462
#3  0x0000007fba041568 in event_base_loop (base=0x7f8c0015e0,
    flags=flags@entry=1) at ipc/chromium/src/third_party/libevent/event.c:1947
#4  0x0000007fba01e248 in base::MessagePumpLibevent::Run (this=0x7f8c001560,
    delegate=0x7f9eeb9de0) at ipc/chromium/src/base/message_pump_libevent.cc:346
#5  0x0000007fba01f5bc in MessageLoop::RunInternal (this=this@entry=0x7f9eeb9de0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#6  0x0000007fba01f800 in MessageLoop::RunHandler (this=0x7f9eeb9de0)
    at ipc/chromium/src/base/message_loop.cc:352
#7  MessageLoop::Run (this=this@entry=0x7f9eeb9de0)
    at ipc/chromium/src/base/message_loop.cc:334
#8  0x0000007fba0336d8 in base::Thread::ThreadMain (this=0x7f88031040)
    at ipc/chromium/src/base/thread.cc:187
#9  0x0000007fba01dba0 in ThreadFunc (closure=<optimized out>)
    at ipc/chromium/src/base/platform_thread_posix.cc:40
#10 0x0000007fb7bc8a4c in ?? () from /lib64/libpthread.so.0
#11 0x0000007fb78b289c in ?? () from /lib64/libc.so.6

Thread 8 (LWP 24149):
#0  StringMatch (text=0x7f9df3bf40, pat=0x7f9e0400c0, start=start@entry=0)
    at js/src/builtin/String.cpp:1944
#1  0x0000007fbcc68b0c in js::str_indexOf (cx=<optimized out>,
    argc=<optimized out>, vp=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/CallArgs.h:245
#2  0x0000007f00708994 in ?? ()
#3  0x0000007fbd165538 in js::jit::MaybeEnterJit (cx=0xc021, state=...)
    at js/src/jit/Jit.cpp:207
#4  0x0000007fbd165538 in js::jit::MaybeEnterJit (cx=0x7f881cd720, state=...)
    at js/src/jit/Jit.cpp:207
#5  0x0000007ef4101401 in ?? ()
Backtrace stopped: Cannot access memory at address 0x84041a32f7d6

[...]
(gdb)
Of the other threads 35 are in a pthread_cond_wait() state and three are in a pthread_cond_timedwait() state. Nine are in a poll() state.

That leaves thread 8 and thread 10 in slightly different states, but in practice their backtraces still don't tell us anything particularly interesting.

The one that looks most interesting to me is thread 70. It looks to be a rendering thread, but thread 11 is already there as a rendering thread and during normal operation I'd only expect to see one of these.

Thread 70 is doing something to do with EGL and wayland.

Unfortunately this still doesn't give me anything actionable. So I'm going to try something else.

The hang occurs when we switch to private browsing mode. So let's take a look at the code that happens as a result of pressing the button to do this in the user interface. This can be found in declarativewebcontainer.cpp and looks like this:
void DeclarativeWebContainer::updateMode()
{
    setTabModel((BrowserApp::captivePortal() || m_privateMode)
        ? m_privateTabModel.data() : m_persistentTabModel.data());
    emit tabIdChanged();

    // Reload active tab from new mode
    if (m_model->count() > 0) {
        reload(false);
    } else {
        setWebPage(NULL);
        emit contentItemChanged();
    }
}
By commenting out different bits of the code and rebuilding I should be able to narrow the issue down.

And it turns out that here it's the call to setTabModel() that triggers the hang. Comment this line out and everything continues without hanging. Of course the functionality isn't all working (it doesn't correctly switch to private browsing mode), but this does at least get us somewhere to start. Let's dig deeper into this setTabModel() method. The implementation for this method looks like this:
void DeclarativeWebContainer::setTabModel(DeclarativeTabModel *model)
{
    if (m_model != model) {
        int oldCount = 0;
        if (m_model) {
            disconnect(m_model, 0, 0, 0);
            oldCount = m_model->count();
        }

        m_model = model;
        int newCount = 0;
        if (m_model) {
            connect(m_model.data(), &DeclarativeTabModel::activeTabChanged,
                    this, &DeclarativeWebContainer::onActiveTabChanged);
            connect(m_model.data(), &DeclarativeTabModel::activeTabChanged,
                    this, &DeclarativeWebContainer::tabIdChanged);
            connect(m_model.data(), &DeclarativeTabModel::loadedChanged,
                    this, &DeclarativeWebContainer::initialize);
            connect(m_model.data(), &DeclarativeTabModel::tabClosed,
                    this, &DeclarativeWebContainer::releasePage);
            connect(m_model.data(), &DeclarativeTabModel::newTabRequested,
                    this, &DeclarativeWebContainer::onNewTabRequested);
            newCount = m_model->count();
        }
        emit tabModelChanged();
        if (m_model && oldCount != newCount) {
            emit m_model->countChanged();
        }
    }
}
Again, by commenting out different parts of the code it should be possible to narrow down what's causing the problem. And indeed in this case the only line that consistently causes the hang to happen is the following:
        emit tabModelChanged();
I can leave all of the other lines in and the hang dissipates, but keep this line in, even with the others removed, and the hang returns.

So that means there's something hooked into this signal that's causing the hang. There's only one connection made to this in the C++ code and commenting that out doesn't make any difference. So the problem must be in the QML code.

Unfortunately there are many different bits of code that hang off this signal. I've commented large chunks of code related to this signal out to see whether skipping them prevents the hang. Most seem to have no effect on the outcome. But if I comment out enough stuff the hang no longer occurs.

Now I'm working backwards adding code back in until the hang returns. It's laborious, but at least guaranteed to move things forwards one step at a time.

Eventually after a lot of trial an error I've been able to narrow down the problem to this small snippet of code in BrowserPage.qml:
    // Use Connections so that target updates when model changes.
    Connections {
        target: AccessPolicy.browserEnabled && webView
                && webView.tabModel || null
        ignoreUnknownSignals: true
        // Animate overlay to top if needed.
        onCountChanged: {
            if (webView.tabModel.count === 0) {
                webView.handleModelChanges(false)
            }
            window.setBrowserCover(webView.tabModel)
        }
    }
Comment this code out and everything seems to work okay. At least, everything except the cover preview. In particular, the proxy filter model doesn't seem to be causing any issues.

With a bit more trial and error I'm able to narrow it down even further, to just this line:
            window.setBrowserCover(webView.tabModel)
With this line commented out things are looking up: the print to PDF functionality works nicely; there are no extraneous tabs added during printing; there's just the slightest of flickers when the printing starts; but crucially there are no other obvious issues and no hanging.

I'm really happy with this. It means that the code for getting the print to PDF functionality working is now all there. It'll be important to deal with the hang, but it's actually unrelated to these changes.

Given all this, that leaves me three things to deal with:
  1. Check whether the flicker can be removed by skipping the activation of the hidden window.
  2. Tidy up the QML implementation of the new proxy filter model.
  3. Move the DownloadPDFSaver class from DownloadCore.js to EmbedliteDownloadManager.js.
  4. Record an issue for the setBrowserCover() hang we were just looking at.
That's still quite a lot to do. But at least I'm happy that everything is now under control. What's now clear is that it'll definitely be possible to restore the PDF printing functionality, all I really have to do now is clean up the implementation. Fixing the hang can go in a separate issue.

Today I'm happy: it's been a productive day, we got to the bottom of the hang, and all of the printing changes are coming together into something usable. Now it's time for bed and hopefully a very sound sleep as a result.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
28 Dec 2023 : Day 121 #
Yesterday we looked at creating a DeclarativeTabFilterModel that allowed the hidden windows to be filtered out. The implementation turned out to be really straightforward, although to be fair that's because Qt is doing the majority of the hard work for us.

Today I'm going to try adding it in to the QML code to see what happens. The interesting file here is TabView.qml. This is the component that displays the tabs to the user. The design of the user interface means it has to handle multiple models: a model to allow switching between standard and private tabs called the modeModel which is defined directly in the QML. A model for handling the standard tab list accessed as the webView.persistentTabModel. The use of "persistent" in the name is a reference to the fact that these tabs will survive the browser being shutdown and restarted. Finally there are the private tabs accessed with the webView.privateTabModel model. These lose the "persistent" moniker because the private tabs are all discarded when the user closes the browser.

We're interested in the last of these two models because we want to wrap them in our filtering proxy. The key bit of code for this is the following (found in the same TabView.qml file:
    TabGridView {
        id: _tabView

        portrait: tabView.portrait
        model: tabItem.privateMode ? webView.privateTabModel
                                   : webView.persistentTabModel
        header: Item {
            width: 1
            height: Theme.paddingLarge
        }

        onHide: tabView.hide()
        onEnterNewTabUrl: tabView.enterNewTabUrl()
        onActivateTab: tabView.activateTab(index)
        onCloseTab: tabView.closeTab(index)
        onCloseAll: tabView.closeAll()
        onCloseAllCanceled: tabView.closeAllCanceled()
        onCloseAllPending: tabView.closeAllPending()
    }
As you can see the tabs are set to use a model that depends on whether private mode is active or not. There's s ternary operator there that chooses between the two models:
        model: tabItem.privateMode ? webView.privateTabModel
                                   : webView.persistentTabModel
This line hooks the correct model up to the model to be used by the TabGridView. The code here looks pretty slick and straightforward, but I recall when it was first implemented it caused a lot of trouble. The difficulty is that when the component switches between persistent and private tabs there's a short period during the animation when both models are used simultaneously. Ensuring that they can both exist without the tabs suddenly switching from one model to the other in an instant needed some work.

It wasn't me that had to implement that, but I probably reviewed the code at some stage.

So now let's add in our filtering proxy model.
    TabGridView {
        id: _tabView

        portrait: tabView.portrait
        model: TabFilterModel {
        	sourceModel: tabItem.privateMode ? webView.privateTabModel
        	                                 : webView.persistentTabModel
        	showHidden: false
        }
        header: Item {
            width: 1
            height: Theme.paddingLarge
        }
[...]
When I test this out on-device it works surprisingly well. The hidden tab is indeed hidden in the tab view. I need to sort the tab numbering out, but that's expected because I've not started using the filter model for the numbering yet. As soon as I switch to using it for the count as well, it should all fall into line.

However, there is one more serious problem. Switching between persistent and private modes causes the browser to hang. My guess is that this is to do with the code used to switch between the two, but I'm not certain yet.

But that's okay, this is what building this sort of stuff is all about: try something out, find the glitches, fix the glitches. It's not quite as clean and logical as we might always like, but this is software engineering, not computer science.

As we can see from the code shown above, the model is being used in a TabGridView component. When the tab switches it comes up with this error:
[W] unknown:65 - file:///usr/share/sailfish-browser/pages/components/
    TabGridView.qml:65:19: Unable to assign [undefined] to int
When we look inside the code for TabGridView we can see that this relates to this line:
    currentIndex: model.activeTabIndex
This is making use of the activeTabIndex property, which is a member of DeclarativeTabModel (and so therefore also a member of the persistent and private models that inherit from it) but not a member of our filter proxy model. So that would explain why it's so unhappy.

I can pass through the property quite easily. It looks like the count property is also used, but not the loaded property, so probably we just need to pass through those two.

Having added the required properties and tested out the code, it seems the problem still persists: switching from persistent to private tabs causes the browser to hang. I tried out a few changes to my code to see if that would help, including setting the filter flag to false, but that still doesn't fix it.

Thinking that my changes might have corrupted the tab database details stored on disk I also tried removing the profile data stored at ~/.local/share/org.sailfishos/browser. This cleared all of the tabs, but to my surprise the browser now hangs when creating the first persistent tab as well. So the issue isn't necessarily to do with switching between persistent and private views; more likely it's to do with creating the very first tab in each case.

After reverting my changes to the filter proxy it's now clear that it's not the proxy that's causing the issue here at all: it is indeed the creation of the very first tab. With the profile restored there are a bunch of tabs pre-existing in the persistent tab list. But in the private tab list there are none. So switching to this empty list is what's causing the crash.

So it looks like I need to double back and fix this. Here's the backtrace I get from the crash:
Thread 1 "sailfish-browse" received signal SIGSEGV, Segmentation fault.
Tab::tabId (this=0x48) at ../storage/tab.cpp:38
38          return m_tabId;
(gdb) bt
#0  Tab::tabId (this=0x48) at ../storage/tab.cpp:38
#1  0x00000055555cb3d8 in DeclarativeWebPage::tabId (this=<optimized out>)
    at ../qtmozembed/declarativewebpage.cpp:127
#2  0x0000005555591fd0 in DeclarativeWebContainer::onNewTabRequested
    (this=0x555569b370, tab=...) at include/c++/8.3.0/bits/atomic_base.h:390
#3  0x0000007fb7ec4204 in QMetaObject::activate(QObject*, int, int, void**) ()
    from /usr/lib64/libQt5Core.so.5
#4  0x00000055555f8bb0 in DeclarativeTabModel::newTabRequested
    (this=this@entry=0x55559c7e30, _t1=...) at moc_declarativetabmodel.cpp:366
#5  0x00000055555c5148 in DeclarativeTabModel::newTab
    (this=this@entry=0x55559c7e30, url=..., parentId=parentId@entry=0, 
    browsingContext=browsingContext@entry=0, hidden=hidden@entry=false)
    at ../history/declarativetabmodel.cpp:233
#6  0x00000055555c5318 in DeclarativeTabModel::newTab
    (this=this@entry=0x55559c7e30, url=...)
    at ../history/declarativetabmodel.cpp:199
#7  0x00000055555f8f3c in DeclarativeTabModel::qt_static_metacall
    (_o=_o@entry=0x55559c7e30, _c=_c@entry=QMetaObject::InvokeMetaMethod,
    _id=_id@entry=18, _a=_a@entry=0x7fffffbdc8)
    at moc_declarativetabmodel.cpp:182
[...]
#18 0x0000005555c61460 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) frame 1
#1  0x00000055555cb3d8 in DeclarativeWebPage::tabId (this=<optimized out>)
    at ../qtmozembed/declarativewebpage.cpp:127
127         return m_initialTab.tabId();
(gdb) p m_initialTab
value has been optimized out
(gdb) b DeclarativeWebPage::setInitialState
Breakpoint 1 at 0x55555cb3d8: file ../qtmozembed/declarativewebpage.cpp, line 134.
(gdb) r
[...]

Thread 1 "sailfish-browse" received signal SIGSEGV, Segmentation fault.
Tab::tabId (this=0x48) at ../storage/tab.cpp:38
38          return m_tabId;
(gdb) 
Notice that the issue here is that m_initialTab isn't set correctly. In fact, by putting additional breakpoints on the code I'm able to see that it's never actually getting set at all. It seems that the DeclarativeWebPage::setInitialState() method that sets it is never getting called.

This is supposed to be called in the WebPageFactory::createWebPage() method, which a breakpoint confirms is also not being called. From inspection of the code it seems that this is supposed to be being called from the WebPages::page() method. But that's also not being called.

Finally, this page() call is supposed to be made from the DeclarativeWebContainer::activatePage() method. You might recall this is a method I messed around with earlier.

It looks like the problem line might actually be one of the debug lines I added:
    qDebug() << "PRINT: onNewTabRequested post activeTab: "
        << m_webPage->tabId();
This line works fine if m_webPage::m_initialTab has been set correctly, but if it's the very first tab it won't yet have been set at all. When this happens the code tries to dereference an uninitialised pointer and boom. If this really is the problem then that will be nice: understandable and easy to fix. I've rebuilt sailfish-browser without the debug output line to see what happens.

This has changed things a bit: there's no longer a crash when creating the very first page. But switching to private browsing mode is still causing problems, even without the changes I made today to add the filter proxy model. That might suggest the issue has been there for much longer than I thought: I've just not switched to private browsing recently.

Running the app through the debugger it becomes clear that the app really is hanging rather than crashing. Or, even more specifically, the front-end is hanging. The JavaScript interpreter seems to happily continue running in the background. It's not at all clear why this might be happening and since it's already quite late now, finding the answer is going to have to wait until the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
27 Dec 2023 : Day 120 #
Over the last few days we've been looking at hiding the print window used to clone the page into when saving a PDF of a page using the print routines. We got to the point where the print succeeds, the page appears for only the briefest of moments, but the tab for the page still appears in the tab model view.

Today we're going to start work on filtering out the page from the tab view.

After some discussion online with thigg, who rightly pointed out some potential dangers with doing this, I've decided to add the functionality but using a config setting that allows it to be enabled and disabled.

Before we implement the config setting I first want to put the filter together. To do this we're going to use a QSortFilterProxyModel. This will look exactly like the original model but with the ability to filter on specific values, in our case the "hidden" flag.

The Qt docs have a nice example of using this filtering approach which we can pretty much copy. All of this is happening in the sailfish-browser code as a wrapper for DeclarativeTabModel. There are also plenty of existing examples in the sailfish-browser code, including BookmarkFilterModel and LoginFilterModel, used to search on bookmarks and logins respectively.

Following these examples we're going to call ours the DeclarativeTabFilterModel.

I've put together the filter. It's generally very simple code, to the extent that I think I can get away with posting the entire header here.
#include <QSortFilterProxyModel>

class DeclarativeTabFilterModel : public QSortFilterProxyModel
{
    Q_OBJECT
    Q_PROPERTY(bool showHidden READ showHidden WRITE setShowHidden NOTIFY
        showHiddenChanged)
public:
    DeclarativeTabFilterModel(QObject *parent = nullptr);

    Q_INVOKABLE int getIndex(int currentIndex);

    bool filterAcceptsRow(int sourceRow, const QModelIndex &sourceParent)
        const override;
    void setSourceModel(QAbstractItemModel *sourceModel) override;

    bool showHidden() const;
    void setShowHidden(const bool showHidden);

signals:
    void showHiddenChanged(bool showHidden);

private:
    bool m_showHidden;
};
In essence all it's doing is accepting the model, then filtering the rows based on whether they're hidden or not. The key piece of code in the implementation is for the filterAcceptsRow() method which looks like this:
bool DeclarativeTabFilterModel::filterAcceptsRow(int sourceRow,
    const QModelIndex &sourceParent) const
{
    QModelIndex index = sourceModel()->index(sourceRow, 0, sourceParent);

    return (m_showHidden || !sourceModel()->data
        (index, DeclarativeTabModel::HiddenRole).toBool());
}
The underlying Qt implementation does all the rest. It's very nice stuff.

I'm hoping I can drop this right in as a replacement for the model in the TabView.qml code. Or, to be more precise, as a drop in for the two models in the TabView.qml code, because there's a separate model for the persistent (standard) and private tab lists.

However, there may be a catch because the DeclarativeTabModel provides a bunch of other properties which potentially might get used for various things. I'll have to be careful not to accidentally switch out a model for the filtered model where any existing code relies on these additional properties. The additional properties won't automatically be provided by the filtered model.

Looking carefully through the code, I think it's safe though. I should be able to replace the model for the filtered proxy model close enough to the view controller that it will only use a minimal set of additional properties.

In order to actually make use of this new declarative component, we must register it with the QML typing system. Along with all of the other components built in this way we do this in the startup code of the application, most easily done in the main() function like this:
    qmlRegisterType<DeclarativeTabFilterModel>(uri, 1, 0, "TabFilterModel");
The only other thing I need to do is add the new file to the Qt project files; in this case the apps/history/history.pri file.

Having done that I'm pleased to see it compiles and builds the packages fine. But it's only a short post today and I'm not planning to test the new model now. Instead I'll pick this up and add it to the QML front-end tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
26 Dec 2023 : Day 119 #
We're getting perilously close to 27 days on this now. I admit this printing issue has turned out to be more gnarly than I'd hopped. But I don't feel ready to give up on it just yet. Yesterday we got the page hidden but couldn't figure out why that prevented the print from successfully completing. Today I plan to debug the print process again to try to find out why.

Let's get the error output that we're dealing with. There are some additional debug print outputs that I've added (all of the ones starting with PRINT:) to help clarify the flow. Whenever I add temporary debug prints in this way I like to give them all the same prefix. It makes them easier to spot, but also makes it easier to filter on them all using grep in case there's so much debug output that it makes them hard to spot.
$ EMBED_CONSOLE=1 MOZ_LOG="EmbedLite:5" sailfish-browser
[...]
PRINT: Window: [object Window]
PRINT: Window document: [object HTMLDocument]
PRINT: Window MozElement: undefined
[Parent 15251: Unnamed thread 7718002670]: E/EmbedLite FUNC::virtual nsresult mozilla::embedlite::EmbedLiteAppChild::Observe(nsISupports*, const char*,
    const char16_t*):68 topic:embed:download
[D] unknown:0 - PRINT: onNewTabRequested pre activeTab:  10
[D] unknown:0 - PRINT: new tab is hidden; recording previous ID:  10
[D] unknown:0 - PRINT: onNewTabRequested post activeTab:  11
[W] unknown:0 - bool DBWorker::execute(QSqlQuery&) failed execute query
[W] unknown:0 - "INSERT INTO tab (tab_id, tab_history_id) VALUES (?,?);"
[W] unknown:0 - QSqlError("19", "Unable to fetch row", "UNIQUE constraint
    failed: tab.tab_id")
[D] unknown:0 - PRINT: onActiveTabChanged:  11
[D] unknown:0 - PRINT: onActiveTabChanged hidden:  true
[D] unknown:0 - PRINT: new tab is hidden, activating previous ID:  10
[D] unknown:0 - PRINT: activateTab: old:  11
[D] unknown:0 - PRINT: activateTab: new:  4
[D] unknown:0 - PRINT: activateTab: activate tab:  4 Tab(tabId = 10,
    parentId = 0, isValid = true, url = "https://jolla.com/",
    requested url = "", url resolved: true, title = "Jolla", thumbnailPath =
    "~/.cache/org.sailfishos/browser/tab-10-thumb.jpg", desktopMode = false)
[D] unknown:0 - PRINT: onActiveTabChanged:  10
[D] unknown:0 - PRINT: onActiveTabChanged hidden:  false
EmbedliteDownloadManager error: [Exception... "Abort"  nsresult: "0x80004004
    (NS_ERROR_ABORT)"  location: "JS frame ::
    resource://gre/modules/DownloadCore.jsm :: DownloadError :: line 1755"
    data: no]
[Parent 15251: Unnamed thread 7718002670]: E/EmbedLite FUNC::virtual nsresult 
    mozilla::embedlite::EmbedLiteAppChild::Observe(nsISupports*, const char*,
    const char16_t*):68 topic:embed:download
[Parent 15251: Unnamed thread 7718002670]: I/EmbedLite WARN: EmbedLite::virtual
    void* mozilla::embedlite::EmbedLitePuppetWidget::GetNativeData(uint32_t):127
    EmbedLitePuppetWidget::GetNativeData not implemented for this type
JavaScript error: , line 0: uncaught exception: Object
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
CONSOLE message:
[JavaScript Error: "uncaught exception: Object"]
[...]
There are two things I need to look into from this debug output. First the database error:
[W] unknown:0 - QSqlError("19", "Unable to fetch row", "UNIQUE constraint failed:
    tab.tab_id")
It looks like something has messed up the tab database. That's not so surprising given that I made some (not very carefully thought-through) changes to the dbworker.cpp code. However my suspicion is that these database changes won't be affecting printing adversely. Second the download error:
EmbedliteDownloadManager error: [Exception... "Abort"  nsresult: "0x80004004
    (NS_ERROR_ABORT)"  location: "JS frame ::
    resource://gre/modules/DownloadCore.jsm :: DownloadError :: line 1755"
    data: no]
I'm going to start with the latter because it seems more likely to be the underlying issue. The exception is being thrown here, in the last of the three conditional blocks found in DownloadCore.js:
var DownloadError = function(aProperties) {
[...]
  if (aProperties.message) {
    this.message = aProperties.message;
  } else if (
    aProperties.becauseBlocked ||
    aProperties.becauseBlockedByParentalControls ||
    aProperties.becauseBlockedByReputationCheck ||
    aProperties.becauseBlockedByRuntimePermissions
  ) {
    this.message = "Download blocked.";
  } else {
    let exception = new Components.Exception("", this.result);
    this.message = exception.toString();
  }
That's not super helpful though because this is just the code that's called when there's an error of any kind. We need to find out where this is being called from.

There are 19 places in the DownloadCore.js file that might trigger an error using this DownloadError() method. Of these, nine include a message field and so will fall into the first block of the condition, four have one of the becauseBlocked... flags set and so fall into the second block of the condition. Finally one of them is the method used for deserialisation of the message.

That leaves five left which could potentially be one of the ones triggering the error we're seeing. These five can be found on lines 518, 557, 2164, 2740 and 3035 of the DownloadCore.jsm file.

That's not too many; let's find out which one exactly by adding in different messages to all of these six entries and seeing which one pops up. Here's the result:
EmbedliteDownloadManager error: Line 3035
That means the error is happening inside this block:
    try {
      await new Promise((resolve, reject) => {
        this._browsingContext.print(printSettings)
        .then(() => {
          resolve();
        })
        .catch(exception => {
          reject(new DownloadError({ result: exception, inferCause: true }));
        });
      });
    }
That's not too surprising or revealing. That means there's an exception being thrown inside the C++ code from somewhere following this call:
already_AddRefed<Promise> CanonicalBrowsingContext::Print(
    nsIPrintSettings* aPrintSettings, ErrorResult& aRv)
Inside this method there are all sorts of calls to get the window details. So it might be worth checking whether we're switching tabs before or after all this is happening. Maybe it's just a timing issue?

A horrible thought suddenly occurred to me: what if it's the printing part that's broken and not the window changes causing it to break? I've been reinstalling various files, but maybe I didn't re-apply some manual change that had fixed the printing earlier?

Just to double check this I've removed the code that switches the tab back again to check whether the printing works correctly with this removed:
void DeclarativeWebContainer::onActiveTabChanged(int activeTabId)
{
    if (activeTabId <= 0) {
        return;
    }

    reload(false);

    if (m_model->activeTab().hidden()) {
        // Switch back to the old tab
        //m_model->activateTabById(mPreviousTabWhenHidden);
    }
}
Rebuild, reinstall, rerun. But now the printing does work. So it's definitely the introduction of this one line that's causing the issue. I'm going to put the line back in again, but this time with just a slight 100 millisecond delay to see whether that makes any difference. This will help confirm or deny my suspicion that this may be a timing issue.
void DeclarativeWebContainer::onActiveTabChanged(int activeTabId)
{
    if (activeTabId <= 0) {
        return;
    }

    reload(false);

    if (m_model->activeTab().hidden()) {
        // Switch back to the old tab
        hiddenTabTimer.setSingleShot(true);
        disconnect(&hiddenTabTimer, &QTimer::timeout, this, nullptr);
		connect(&hiddenTabTimer, &QTimer::timeout, this, [this]() {
        	m_model->activateTabById(mPreviousTabWhenHidden);
		});
		hiddenTabTimer.start(100);
    }
}
Now with this slight delay the window appears visibly to the user for a fraction of a second, then disappears. The print then continues and the PDF is output successfully, no longer a zero-byte file:
$ ls -l
total 4596
-rw-rw-r-- 1 defaultuser defaultuser 4703901 Dec 24 18:39 Jolla.pdf
So that confirms it: it's a timing issue. Most likely the print code is expecting to get details from the current window. But if the window switches before it can do this, it'll end up getting the details from the original window, causing the print to fail.

If this is indeed what's going wrong, we should be able to pull that delay right down to zero. Adding a delay of zero time isn't the same as adding in no timer at all. The big difference is that by using the timer, execution will be force once round the event loop before the switch back to the window occurs. In this case, it could be enough for the print code to pick up all of the info it needs in order to avoid the print failing.

So I've made this change:
		hiddenTabTimer.start(0);
With a timeout time set to zero like this the window barely appears: it's more like a flicker of the screen, no different to when we had no timer at all. What's more the file is still generated in the background and this time gets filled up with data; the error that we were seeing earlier doesn't occur. Printing multiple times also seems to work correctly.

So this is a good result. I don't think I'm going to be able to remove the initial flicker entirely, which is a shame, but maybe further down the line someone will think of a way to address that. I'm also not totally done because the "hidden" window is also currently still appearing in the tab view. I need to filter it out. The good news is that I'm much more comfortable with how to do that as there are standard approaches to filtering items from models using the QSortFilterProxyModel class.

I'm not going to do that right now though, I'll pick that up tomorrow.

Over the last few weeks there have been many times when I wasn't convinced I'd be able to get the "Save page to PDF" functionality back up and working satisfactorily again. So it's a huge relief to get to this stage where I know it will get to a place that I'm happy with. That seems like a great place to end for the day.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
25 Dec 2023 : Day 118 #
It's Christmas Day. I've spent most of the day eating, opening presents and watching films. If you celebrate Christmas I hope you've had a wonderful day as well. Alongside general celbration and relaxation I've also had the chance to do a bit of gecko development. I'm very keen to get this printing user interface working correctly and it's definitely getting there.

As something a bit different, here's a photo of our Christmas tree before and after the decorations. But it comes with a warning: if you read on beyond this photo there will be backtraces. You have been warned!
On the left a Nordic Pine; on the right the same tree covered in Christmas decorations

The day before yesterday we got to the point where the hidden tabs were glowing red in the tab view, leaving the non-hidden tabs their usual colour. We don't actually want them to be red, but it does demonstrate that the hidden flag is being exposed correctly in the user interface.

Then yesterday I worked my way carefully through the code to come up with a plan for how to actually hide a tab based on the flag being set. My plan was to wait for the new tab to be activated, then immediately set the tab that was activated directly before as the active tab again.

If that happens quickly enough the user should hopefully not even be aware that it's happened.

So today I'm planning to harness that information in order to trigger the activation. With any luck we can get this done just by amending the front-end sailfish-browser code. No need to make changes to the gecko library itself.

To do this I'm going to keep track of the previous tab by recording the tab id in the DeclarativeWebContainer class. I've added the code to do this into the onNewTabRequested() method. As you can see, if the new tab has the hidden flag set we'll copy the tab ID into the mPreviousTabWhenHidden class member:
void DeclarativeWebContainer::onNewTabRequested(const Tab &tab)
{
	if (tab.hidden()) {
		mPreviousTabWhenHidden = m_webPage->tabId();
	}

    if (activatePage(tab, false)) {
        m_webPage->loadTab(tab.requestedUrl(), false);
    }
}
As we saw yesterday, once the tab has been created an activeTabChanged signal will be emitted which is connected to a DeclarativeWebContainer::onActiveTabChanged() slot. At that point I've added in some code to then switch back to the previous tab using the tab ID we recorded earlier:
void DeclarativeWebContainer::onActiveTabChanged(int activeTabId)
{
    if (activeTabId <= 0) {
        return;
    }

    reload(false);

    if (m_model->activeTab().hidden()) {
        // Switch back to the old tab
        m_model->activateTabById(mPreviousTabWhenHidden);
    }
}
When I test this out it actually works, which I'm a little surprised about. There's a slight flicker when the new tab opens but then it's immediately hidden again. It's perceptible, but really looks okay to me.

The only problem is that now the print doesn't complete. The printing starts, the tab opens, the tab closes, but then there's an error in the debug output and the data is never written to the file.

Here's the output from the debug console. Note that there are quite a few debug prints that I've added here to try to help me figure out what's going on, so they're non-standard and I'll remove them once I'm done.
[D] unknown:0 - PRINT: onNewTabRequested pre activeTab:  10
[D] unknown:0 - PRINT: new tab is hidden; recording previous ID:  10
[D] unknown:0 - PRINT: onNewTabRequested post activeTab:  11
[W] unknown:0 - bool DBWorker::execute(QSqlQuery&) failed execute query
[W] unknown:0 - "INSERT INTO tab (tab_id, tab_history_id) VALUES (?,?);"
[W] unknown:0 - QSqlError("19", "Unable to fetch row", "UNIQUE constraint failed: tab.tab_id")
[D] unknown:0 - PRINT: onActiveTabChanged:  11
[D] unknown:0 - PRINT: onActiveTabChanged hidden:  true
[D] unknown:0 - PRINT: new tab is hidden, activating previous ID:  10
[D] unknown:0 - PRINT: activateTab: old:  11
[D] unknown:0 - PRINT: activateTab: new:  4
[D] unknown:0 - PRINT: activateTab: activate tab:  4 Tab(tabId = 10,
    parentId = 0, isValid = true, url = "https://jolla.com/",
    requested url = "", url resolved: true, title = "Jolla", thumbnailPath =
    "/home/defaultuser/.cache/org.sailfishos/browser/tab-10-thumb.jpg",
    desktopMode = false)
[D] unknown:0 - PRINT: onActiveTabChanged:  10
[D] unknown:0 - PRINT: onActiveTabChanged hidden:  false
EmbedliteDownloadManager error: [Exception... "Abort"  nsresult:
    "0x80004004 (NS_ERROR_ABORT)"  location: "JS frame ::
    resource://gre/modules/DownloadCore.jsm :: DownloadError :: line 1755"
    data: no]
[Parent 24026: Unnamed thread 7e54002670]: E/EmbedLite FUNC::virtual nsresult 
    mozilla::embedlite::EmbedLiteAppChild::Observe(nsISupports*, const char*,
    const char16_t*):68 topic:embed:download
[Parent 24026: Unnamed thread 7e54002670]: I/EmbedLite WARN: EmbedLite::virtual
    void* mozilla::embedlite::EmbedLitePuppetWidget::GetNativeData(uint32_t):127
    EmbedLitePuppetWidget::GetNativeData not implemented for this type
JavaScript error: , line 0: uncaught exception: Object
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
CONSOLE message:
[JavaScript Error: "uncaught exception: Object"]
JavaScript error: resource://gre/modules/SessionStoreFunctions.jsm, line 120:
    NS_ERROR_FILE_NOT_FOUND:
CONSOLE message:
It's not quite clear to me whether it's the error shown here causing the problem, or the fact that the page is being deactivated and suspended. The latter could be causing problems, even though it might not necessarily trigger any error output (it would just freeze the page in the background). To explore the possibility of it being the page being set to inactive I've put a breakpoint on the place where this happens: the WebPages::updateStates() method. With any luck this will tell me where the deactivation takes place so I can disable it.

When we execute the app the breakpoint gets hit three times. First on creation of the app when the page we're loading gets created. We can see this from the fact that onNewTabRequested() is in the backtrace. I then press the "Save page to PDF" option after which the blank print page is created, so that a call to onNewTabRequested() triggers the breakpoint again. Following that the tab will switch from the blank page back to the previous page. On this occasion there's no call to onNewTabRequested() Instead there's a call to activateTabById() which will trigger the breakpoint a third and final time.

This last call to activateTabById() is the one we just added.

Here are the backtraces for all three of these calls to updateStates(). As I mentioned yesterday, I'm very well aware that these backtraces make for horrible reading. I'm honestly sorry about this. I keep the backtraces here for anyone who's really keen to see all of the details, or for a future me who might need this information for reference. If you're a normal human I strongly recommend just to skip past this bit. You'll not lose anything by doing so.
(gdb) b WebPages::updateStates
Breakpoint 2 at 0x55555ad150: WebPages::updateStates. (2 locations)
(gdb) c
Continuing.
	Thread 1 "sailfish-browse" hit Breakpoint 2, WebPages::updateStates
	(this=0x55559bf880, oldActivePage=0x5555c4e0c0, newActivePage=0x7f8c063720)
    at ../core/webpages.cpp:203
203         if (oldActivePage) {
(gdb) bt
#0  WebPages::updateStates (this=0x55559bf880, oldActivePage=0x5555c4e0c0,
    newActivePage=0x7f8c063720) at ../core/webpages.cpp:203
#1  0x00000055555ad69c in WebPages::page (this=0x55559bf880, tab=...)
    at ../core/webpages.cpp:166
#2  0x000000555559148c in DeclarativeWebContainer::activatePage
    (this=this@entry=0x55559bf220, tab=..., force=force@entry=false)
    at include/c++/8.3.0/bits/atomic_base.h:390
#3  0x00000055555917a4 in DeclarativeWebContainer::onNewTabRequested
    (this=0x55559bf220, tab=...) at ../core/declarativewebcontainer.cpp:1062
#4  0x0000007fb7ec4204 in QMetaObject::activate(QObject*, int, int, void**) ()
    from /usr/lib64/libQt5Core.so.5
#5  0x00000055555f7808 in DeclarativeTabModel::newTabRequested
    (this=this@entry=0x55559c69a0, _t1=...) at moc_declarativetabmodel.cpp:366
#6  0x00000055555c4428 in DeclarativeTabModel::newTab (this=0x55559c69a0,
    url=..., parentId=1, browsingContext=547754475280, hidden=<optimized out>)
    at ../history/declarativetabmodel.cpp:233
#7  0x00000055555d1900 in DeclarativeWebPageCreator::createView
    (this=0x55559c69f0, parentId=<optimized out>,
    parentBrowsingContext=<optimized out>, 
    hidden=<optimized out>) at /usr/include/qt5/QtCore/qarraydata.h:240
#8  0x0000007fbfb71ef0 in QMozContextPrivate::CreateNewWindowRequested
    (this=<optimized out>, chromeFlags=<optimized out>,
    hidden=@0x7fffffe922: true, aParentView=0x5555bfad90,
    parentBrowsingContext=@0x7fffffe938: 547754475280) at qmozcontext.cpp:218
#9  0x0000007fbcb10eec in mozilla::embedlite::EmbedLiteApp::
    CreateWindowRequested (this=0x555585a3a0, chromeFlags=@0x7fffffe928: 4094, 
    hidden=@0x7fffffe922: true, parentId=@0x7fffffe924: 1,
    parentBrowsingContext=@0x7fffffe938: 547754475280)
    at mobile/sailfishos/EmbedLiteApp.cpp:543
#10 0x0000007fbcb1ea68 in mozilla::embedlite::EmbedLiteAppThreadParent::
    RecvCreateWindow (this=<optimized out>, parentId=<optimized out>, 
    parentBrowsingContext=<optimized out>, chromeFlags=<optimized out>,
    hidden=<optimized out>, createdID=0x7fffffe92c, cancel=0x7fffffe923)
    at mobile/sailfishos/embedthread/EmbedLiteAppThreadParent.cpp:70
#11 0x0000007fba183aa0 in mozilla::embedlite::PEmbedLiteAppParent::
    OnMessageReceived (this=0x7f88a0fe90, msg__=..., reply__=@0x7fffffea38: 0x0)
    at PEmbedLiteAppParent.cpp:924
#12 0x0000007fba06b618 in mozilla::ipc::MessageChannel::DispatchSyncMessage
    (this=this@entry=0x7f88a0ff58, aProxy=aProxy@entry=0x5555c69ea0, aMsg=..., 
    aReply=@0x7fffffea38: 0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:675
[...]
#32 0x000000555557b360 in main (argc=<optimized out>, argv=<optimized out>)
    at main.cpp:201
(gdb) c
Continuing.
[New LWP 21735]

Thread 1 "sailfish-browse" hit Breakpoint 2, WebPages::updateStates
    (this=<optimized out>, oldActivePage=<optimized out>,
    newActivePage=0x7f8c063720) at ../core/webpages.cpp:218
218             newActivePage->resumeView();
(gdb) bt
#0  WebPages::updateStates (this=<optimized out>, oldActivePage=<optimized out>,
    newActivePage=0x7f8c063720) at ../core/webpages.cpp:218
#1  WebPages::updateStates (this=<optimized out>, oldActivePage=<optimized out>,
    newActivePage=0x7f8c063720) at ../core/webpages.cpp:201
#2  0x00000055555ad69c in WebPages::page (this=0x55559bf880, tab=...)
    at ../core/webpages.cpp:166
#3  0x000000555559148c in DeclarativeWebContainer::activatePage
    (this=this@entry=0x55559bf220, tab=..., force=force@entry=false)
    at include/c++/8.3.0/bits/atomic_base.h:390
#4  0x00000055555917a4 in DeclarativeWebContainer::onNewTabRequested
    (this=0x55559bf220, tab=...) at ../core/declarativewebcontainer.cpp:1062
#5  0x0000007fb7ec4204 in QMetaObject::activate(QObject*, int, int, void**) ()
    from /usr/lib64/libQt5Core.so.5
#6  0x00000055555f7808 in DeclarativeTabModel::newTabRequested
    (this=this@entry=0x55559c69a0, _t1=...) at moc_declarativetabmodel.cpp:366
#7  0x00000055555c4428 in DeclarativeTabModel::newTab (this=0x55559c69a0,
    url=..., parentId=1, browsingContext=547754475280, hidden=<optimized out>)
    at ../history/declarativetabmodel.cpp:233
#8  0x00000055555d1900 in DeclarativeWebPageCreator::createView
    (this=0x55559c69f0, parentId=<optimized out>,
    parentBrowsingContext=<optimized out>, 
    hidden=<optimized out>) at /usr/include/qt5/QtCore/qarraydata.h:240
#9  0x0000007fbfb71ef0 in QMozContextPrivate::CreateNewWindowRequested
    (this=<optimized out>, chromeFlags=<optimized out>,
    hidden=@0x7fffffe922: true, aParentView=0x5555bfad90,
    parentBrowsingContext=@0x7fffffe938: 547754475280) at qmozcontext.cpp:218
#10 0x0000007fbcb10eec in mozilla::embedlite::EmbedLiteApp::
    CreateWindowRequested (this=0x555585a3a0, chromeFlags=@0x7fffffe928: 4094, 
    hidden=@0x7fffffe922: true, parentId=@0x7fffffe924: 1,
    parentBrowsingContext=@0x7fffffe938: 547754475280)
    at mobile/sailfishos/ EmbedLiteApp.cpp:543
#11 0x0000007fbcb1ea68 in mozilla::embedlite::EmbedLiteAppThreadParent::
    RecvCreateWindow (this=<optimized out>, parentId=<optimized out>, 
    parentBrowsingContext=<optimized out>, chromeFlags=<optimized out>,
    hidden=<optimized out>, createdID=0x7fffffe92c, cancel=0x7fffffe923)
    at mobile/sailfishos/embedthread/EmbedLiteAppThreadParent.cpp:70
#12 0x0000007fba183aa0 in mozilla::embedlite::PEmbedLiteAppParent::
    OnMessageReceived (this=0x7f88a0fe90, msg__=..., reply__=@0x7fffffea38: 0x0)
    at PEmbedLiteAppParent.cpp:924
#13 0x0000007fba06b618 in mozilla::ipc::MessageChannel::DispatchSyncMessage
    (this=this@entry=0x7f88a0ff58, aProxy=aProxy@entry=0x5555c69ea0, aMsg=..., 
    aReply=@0x7fffffea38: 0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:675
[...]
#33 0x000000555557b360 in main (argc=<optimized out>, argv=<optimized out>)
    at main.cpp:201
(gdb) c
Continuing.

[D] unknown:0 - PRINT: onNewTabRequested post activeTab:  11
[W] unknown:0 - bool DBWorker::execute(QSqlQuery&) failed execute query
[W] unknown:0 - "INSERT INTO tab (tab_id, tab_history_id) VALUES (?,?);"
[W] unknown:0 - QSqlError("19", "Unable to fetch row", "UNIQUE constraint failed: tab.tab_id")
[D] unknown:0 - PRINT: onActiveTabChanged:  11
[D] unknown:0 - PRINT: onActiveTabChanged hidden:  true
[D] unknown:0 - PRINT: new tab is hidden, activating previous ID:  10
[D] unknown:0 - PRINT: activateTab: old:  11
[D] unknown:0 - PRINT: activateTab: new:  4
[D] unknown:0 - PRINT: activateTab: activate tab:  4 Tab(tabId = 10, parentId = 0, isValid = true, url = "https://jolla.com/", requested url = "", url resolved: true, title = "Jolla", thumbnailPath = "/home/defaultuser/.cache/org.sailfishos/browser/tab-10-thumb.jpg", desktopMode = false)
[D] unknown:0 - PRINT: onActiveTabChanged:  10

Thread 1 "sailfish-browse" hit Breakpoint 2, WebPages::updateStates
    (this=0x55559bf880, oldActivePage=0x7f8c063720, newActivePage=0x5555c4e0c0)
    at ../core/webpages.cpp:203
203         if (oldActivePage) {
(gdb) bt
#0  WebPages::updateStates (this=0x55559bf880, oldActivePage=0x7f8c063720,
    newActivePage=0x5555c4e0c0) at ../core/webpages.cpp:203
#1  0x00000055555ad69c in WebPages::page (this=0x55559bf880, tab=...)
    at ../core/webpages.cpp:166
#2  0x000000555559148c in DeclarativeWebContainer::activatePage
    (this=this@entry=0x55559bf220, tab=..., force=force@entry=true)
    at include/c++/8.3.0/bits/atomic_base.h:390
#3  0x00000055555928d4 in DeclarativeWebContainer::loadTab (this=0x55559bf220,
    tab=..., force=false) at ../core/declarativewebcontainer.cpp:1187
#4  0x0000005555592b78 in DeclarativeWebContainer::onActiveTabChanged
    (this=0x55559bf220, activeTabId=10)
    at ../core/declarativewebcontainer.cpp:960
#5  0x0000007fb7ec4204 in QMetaObject::activate(QObject*, int, int, void**) ()
    from /usr/lib64/libQt5Core.so.5
#6  0x00000055555f7768 in DeclarativeTabModel::activeTabChanged
    (this=this@entry=0x55559c69a0, _t1=<optimized out>)
    at moc_declarativetabmodel.cpp:339
#7  0x00000055555c3eb4 in DeclarativeTabModel::updateActiveTab
    (this=this@entry=0x55559c69a0, activeTab=..., reload=reload@entry=false)
    at ../history/declarativetabmodel.cpp:429
#8  0x00000055555c4870 in DeclarativeTabModel::activateTab
    (this=this@entry=0x55559c69a0, index=4, reload=reload@entry=false)
    at ../history/declarativetabmodel.cpp:167
#9  0x00000055555c4d28 in DeclarativeTabModel::activateTabById
    (this=0x55559c69a0, tabId=<optimized out>)
    at ../history/declarativetabmodel.cpp:174
#10 0x0000005555592dbc in DeclarativeWebContainer::onActiveTabChanged
    (this=0x55559bf220, activeTabId=<optimized out>)
    at include/c++/8.3.0/bits/atomic_base.h:390
#11 0x0000007fb7ec4204 in QMetaObject::activate(QObject*, int, int, void**) ()
    from /usr/lib64/libQt5Core.so.5
#12 0x00000055555f7768 in DeclarativeTabModel::activeTabChanged
    (this=this@entry=0x55559c69a0, _t1=<optimized out>)
    at moc_declarativetabmodel.cpp:339
#13 0x00000055555c3eb4 in DeclarativeTabModel::updateActiveTab
    (this=this@entry=0x55559c69a0, activeTab=..., reload=reload@entry=false)
    at ../history/declarativetabmodel.cpp:429
#14 0x00000055555c4004 in DeclarativeTabModel::addTab
    (this=this@entry=0x55559c69a0, tab=..., index=index@entry=5)
    at ../history/declarativetabmodel.cpp:78
#15 0x00000055555c4438 in DeclarativeTabModel::newTab
    (this=0x55559c69a0, url=..., parentId=1, browsingContext=547754475280,
    hidden=<optimized out>) at ../history/declarativetabmodel.cpp:235
#16 0x00000055555d1900 in DeclarativeWebPageCreator::createView
    (this=0x55559c69f0, parentId=<optimized out>,
    parentBrowsingContext=<optimized out>, hidden=<optimized out>)
    at /usr/include/qt5/QtCore/qarraydata.h:240
#17 0x0000007fbfb71ef0 in QMozContextPrivate::CreateNewWindowRequested
    (this=<optimized out>, chromeFlags=<optimized out>,
    hidden=@0x7fffffe922: true, aParentView=0x5555bfad90,
    parentBrowsingContext=@0x7fffffe938: 547754475280) at qmozcontext.cpp:218
#18 0x0000007fbcb10eec in mozilla::embedlite::EmbedLiteApp::
    CreateWindowRequested (this=0x555585a3a0, chromeFlags=@0x7fffffe928: 4094, 
    hidden=@0x7fffffe922: true, parentId=@0x7fffffe924: 1,
    parentBrowsingContext=@0x7fffffe938: 547754475280)
    at mobile/sailfishos/EmbedLiteApp.cpp:543
#19 0x0000007fbcb1ea68 in mozilla::embedlite::EmbedLiteAppThreadParent::
    RecvCreateWindow (this=<optimized out>, parentId=<optimized out>, 
    parentBrowsingContext=<optimized out>, chromeFlags=<optimized out>,
    hidden=<optimized out>, createdID=0x7fffffe92c, cancel=0x7fffffe923)
    at mobile/sailfishos/embedthread/EmbedLiteAppThreadParent.cpp:70
#20 0x0000007fba183aa0 in mozilla::embedlite::PEmbedLiteAppParent::
    OnMessageReceived (this=0x7f88a0fe90, msg__=..., reply__=@0x7fffffea38: 0x0)
    at PEmbedLiteAppParent.cpp:924
#21 0x0000007fba06b618 in mozilla::ipc::MessageChannel::DispatchSyncMessage
    (this=this@entry=0x7f88a0ff58, aProxy=aProxy@entry=0x5555c69ea0, aMsg=..., 
    aReply=@0x7fffffea38: 0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:675
[...]
#41 0x000000555557b360 in main (argc=<optimized out>, argv=<optimized out>)
    at main.cpp:201
(gdb) 
From all of these backtraces we can see that WebPages::updateStates() will be a good place to look at to avoid the deactivation and suspension of the old page.

So I've edited this method to remove the calls that perform the deactivation and suspension of the page. In theory this should leave the page not just running, but even rendering, even though it's no longer visible.
void WebPages::updateStates(DeclarativeWebPage *oldActivePage,
    DeclarativeWebPage *newActivePage)
{
    if (oldActivePage) {
        // Allow suspending only the current active page if it is not the
        // creator (parent).
        if (newActivePage->parentId() != (int)oldActivePage->uniqueId()) {
            if (oldActivePage->loading()) {
                //oldActivePage->stop();
            }
            //oldActivePage->suspendView();
        } else {
            // Sets parent to inactive and suspends rendering keeping
            // timeouts running.
            //oldActivePage->setActive(false);
        }
    }

    if (newActivePage) {
        newActivePage->resumeView();
        newActivePage->update();
    }
}
Unfortunately even after making this change the problem persists: the error still triggers and the PDF data still doesn't get stored in the file, which is created but left empty. Given this I'm going to have to look into the JavaScript error and try to figure out what that's happening there instead. Maybe it's something that I've not yet considered.

That'll be a task for Boxing Day!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
24 Dec 2023 : Day 117 #
It's Christmas eve as I write this, there's a sense of excitement and anticipation in the air. I like Christmas for many reasons, but mostly because it's a time for relaxation and not worrying about things for a day or two. Since gecko development is one of my means of relaxation I still plan to write an update tomorrow. I don't expect anyone to be reading it though. Given this, let me take the chance now to wish everybody reading this a very Merry Christmas. It amazes me how many of you continue to read it and today I want to give a special shout-out to Sylvain (Sthocs) who said nice things today on the Sailfish Forum. I'm thoroughly uplifted when such kind posts.

And because it's Christmas eve, here's one of Thigg's amazing images to add some colour and energy to the post today.
 
A lizard running through an autumnal forest carrying a pig (with wings) on its back

Now on to today's development. Yesterday we were able to confirm that the hidden flag was making it all the way from the print request out to the QML user interface of sailfish-browser. Today we have to think of something useful to do with the information.

I spent some time earlier today sitting in a coffee shop in the centre of Cambridge looking over the code and trying to think of what might be the right way to approach this. I've concluded that it might be a little tricky. But there are some things to try.

Because most of the changes will either happen in the QML or in the sailfish-browser code it should be possible to get a pretty swift turnaround on the possibilities I plan to try, which will hopefully mean we'll arrive at a solution all the more quickly.

To get the clearest picture it'll help to understand what happens when a new window is created, so let's break that down a bit.

The request comes in to the DeclarativeWebPageCreator::createView() method which looks like this:
quint32 DeclarativeWebPageCreator::createView(const quint32 &parentId,
    const uintptr_t &parentBrowsingContext, bool hidden)
{
    QPointer<DeclarativeWebPage> oldPage = m_activeWebPage;
    m_model->newTab(QString(), parentId, parentBrowsingContext, hidden);

    if (m_activeWebPage && oldPage != m_activeWebPage) {
        return m_activeWebPage->uniqueId();
    }
    return 0;
}
Notice that before and after the call to m_model->newTab() there's an expectation that the value of m_activeWebPage will change and that's because it's triggering a lot more stuff that's happening in the background. The newTab() call goes through to DeclarativeTabModel::newTab() which creates a new instance of the Tab class and adds it to the tab model. This will trigger a dataChanged signal through a call to addTab(), as well as a newTabRequested signal sent explicitly in the newTab code.

The newTabRequested signal is connected in declarativewebcontainer.cpp to the onNewTabRequested() method:
    connect(m_model.data(), &DeclarativeTabModel::newTabRequested,
            this, &DeclarativeWebContainer::onNewTabRequested);
This onNewTabRequested() method ends up calling two other methods: activatePage() and loadTab(), like this:
void DeclarativeWebContainer::onNewTabRequested(const Tab &tab)
{
    if (activatePage(tab, false)) {
        m_webPage->loadTab(tab.requestedUrl(), false);
    }
}
The activatePage() method does quite a lot of work here, but in particular activates the page. I'm thinking that we probably do want this to happen, but then might want to switch back immediately to the previous page. Otherwise the new tab is going to get shown to the user, which we want to avoid.

It might also be helpful to compare this flow against that which happens when a user selects a tab from the tab view. In that case the TabItem component sends out a TabView.activateTab() signal. This results in the following bit of code from the Overlay QML component getting called:
	onActivateTab: {
		webView.tabModel.activateTab(index)
		pageStack.pop()
	}
Although this bit of code is in the Overlay component, that component itself doesn't directly have a webView property. That's because both the Overlay and webView can be found in BrowserPage.qml. I must admit I'm not too keen on this type of approach, where a QML component relies on some implied features of its parent component. Nevertheless it's not uncommon in QML code and perfectly legal. Here's where the webView property is defined in the BrowserPage.qml file:
    Shared.WebView {
        id: webView
[...]
The Shared.WebView component is an instance of DeclarativeWebContainer, which we can see is due to this call in the DeclarativeWebContainer C++ constructor code:
DeclarativeWebContainer::DeclarativeWebContainer(QWindow *parent)
[...]
{
[...]
    setTitle("BrowserContent");
    setObjectName("WebView");
[...]
The tabModel property is an instance of DeclarativeTabModel, which has a compatible activateTab() overload available:
    Q_INVOKABLE void activateTab(int index, bool reload = false);
With the method definition looking like this:
void DeclarativeTabModel::activateTab(int index, bool reload)
{
    if (m_tabs.isEmpty()) {
        return;
    }

    index = qBound<int>(0, index, m_tabs.count() - 1);
    const Tab &newActiveTab = m_tabs.at(index);
    updateActiveTab(newActiveTab, reload);
}
Finally the call to updateActiveTab() sends out the dataChanged signal and then an activeTabChanged signal too. The latter is tied to onActiveTabChanged() due to this line in declarativewebcontainer.cpp:
	connect(m_model.data(), &DeclarativeTabModel::activeTabChanged,
		    this, &DeclarativeWebContainer::onActiveTabChanged);
Which therefore causes this code to be run:
void DeclarativeWebContainer::onActiveTabChanged(int activeTabId)
{
    if (activeTabId <= 0) {
        return;
    }

    reload(false);
}
The reload() method contains a call to loadTab() which looks just like the onNewTabRequested() method that was called in the case of the new window creation route:
void DeclarativeWebContainer::loadTab(const Tab& tab, bool force)
{
    if (activatePage(tab, true) || force) {
        // Note: active pages containing a "link" between each other (parent-child relationship)
        // are not destroyed automatically e.g. in low memory notification.
        // Hence, parentId is not necessary over here.
        m_webPage->loadTab(tab.url(), force);
    }
}
Which means we're back to the same place, in particular the activatePage() method which actually does the work. I've gone through both the process for creating a new tab and the process for switching tabs to highlight the crucial bit that we're interested in. We want to switch tabs right after the new tab has been activated, so it's the point where these two paths converge that's likely to make the best place for us to amend the code.

As well as working through the code as we have done above, we can also find the place where the two paths converge using backtraces in the debugger. Now I know these backtraces can be a bit hard to parse on a website, especially on a small screen. If you're a human reading this then please do feel free to just skip these, they're more for my future reference, as it's really helpful for me to keep a record of them. In any case here's the "changing tab" case in backtrace form.
(gdb) bt
#0  DeclarativeWebContainer::setWebPage (this=0x55559bd820, webPage=0x5555ff3960,
    triggerSignals=false) at ../core/declarativewebcontainer.cpp:165
#1  0x00000055555914a0 in DeclarativeWebContainer::activatePage
    (this=0x55559bd820, tab=..., force=true)
    at ../core/declarativewebcontainer.cpp:589
#2  0x000000555559257c in DeclarativeWebContainer::loadTab (this=0x55559bd820,
    tab=..., force=false) at ../core/declarativewebcontainer.cpp:1171
#3  0x0000007fb7ec4204 in QMetaObject::activate(QObject*, int, int, void**) ()
    from /usr/lib64/libQt5Core.so.5
#4  0x00000055555f6ce0 in DeclarativeTabModel::activeTabChanged
    (this=this@entry=0x55559d2b30, _t1=<optimized out>)
    at moc_declarativetabmodel.cpp:339
#5  0x00000055555c37b4 in DeclarativeTabModel::updateActiveTab
    (this=this@entry=0x55559d2b30, activeTab=..., reload=reload@entry=false)
    at ../history/declarativetabmodel.cpp:426
#6  0x00000055555c3f38 in DeclarativeTabModel::activateTab
    (this=this@entry=0x55559d2b30, index=<optimized out>,
    reload=reload@entry=false)
    at ../history/declarativetabmodel.cpp:164
#7  0x00000055555f70ec in DeclarativeTabModel::qt_static_metacall
    (_o=_o@entry=0x55559d2b30, _c=_c@entry=QMetaObject::InvokeMetaMethod,
    _id=_id@entry=16, 
    _a=_a@entry=0x7fffff64c8) at moc_declarativetabmodel.cpp:180
[...]
#17 0x0000007fb8643bf8 in QQmlJavaScriptExpression::evaluate(QV4::CallData*,
    bool*) () from /usr/lib64/libQt5Qml.so.5
#18 0x0000007fa83c2410 in ?? ()
#19 0x00000055556d2c90 in ?? ()
(gdb) 
So, in short, it looks like if we emit an activeTabChanged() signal straight after the new tab has been created using the index of the previous tab (the one that was active before the window opened) then with a bit of luck the browser will immediately (and maybe imperceptible?.. we'll have to see about that) switch back to the previous tab.

I'm keen to try this out, but all of this digging through code has left me a bit exhausted, so testing it out will have to wait until tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
23 Dec 2023 : Day 116 #
Yesterday we were looking at the window creation code and I made some changes to "surface" (as they say) the hidden flag of a window. The idea is to set the flag on the window generated as a result of the page being cloned during printing so that the front-end can filter the window out from the user interface.

Here's the new "Hidden" role from the tab model that I've added and which I'm hoping to use for the filtering, taken from the declarativetabmodel.h file:
    enum TabRoles {
        ThumbPathRole = Qt::UserRole + 1,
        TitleRole,
        UrlRole,
        ActiveRole,
        TabIdRole,
        DesktopModeRole,
        HiddenRole,
    };
Although in theory the pieces are in place to implement the filtering I don't want to do that just yet. I want to do something easier to check whether it's working as expected. I've already made quite a number of changes and experience tells me that periodically checking my work is a sensible thing to do. I'm not always where I think I am.

So rather than the filtering I want to add some kind of indicator to the tab view that shows whether a page is supposed to be hidden or not. I'm thinking maybe a colour change or something like that.

I've found a nice place to put this indicator in the TabItem.qml component of the sailfish-browser user interface code. There's some QML code there to set the background colour depending on the ambience:
    color: Theme.colorScheme === Theme.LightOnDark ? "black" : "white"
I'm going to embellish this a little — I can do it directly on the device — like this:
    color: hidden ? "red" : Theme.colorScheme === Theme.LightOnDark ? "black" : "white"
This is about as simple as a change gets. When I apply the change, run the browser and open the tab view I get the following error:
[W] unknown:60 - file:///usr/share/sailfish-browser/pages/components/TabItem.qml:60:
    ReferenceError: hidden is not defined
That's okay though: I've not yet installed my updated packages where the hidden role is defined, so the error is expected. Now to install the packages to see whether that changes. I have to update three sets of packages (xulrunner, qtmozembed and sailfish-browser) simultaneously to make it work.

All of the packages installed fine. The browser runs. Opening the tab window now no longer generates any errors. But things haven't quite gone to plan because now all of the tabs have turned red. Hrmf. I guess I need to check my logic.
 
Two screenshots of the tab view in the browser; on the left all of the titles of the tabs have dark backgrounds; on the right the backgrounds are all red

After checking with the debugger, all of the tabs have their hidden flag set to false. Odd. Annoyingly though, the debugger also tells me that when the print page is created its hidden state is also set to false. So that's now two errors to fix. Well, I've found the first error in my logic, caused by a hack added on top of a hack:
    color: hidden ? "red" : 
        Theme.colorScheme === Theme.LightOnDark ? "red" : "white"
It's annoyingly obvious now I see it. The second colour should be "black" not "red", like this:
    color: hidden ? "red" : 
        Theme.colorScheme === Theme.LightOnDark ? "black" : "white"
Having fixed this now all the tabs are the correct colour again. Apart from the printing page which should be red but now isn't. For some reason the information isn't being "surfaced" as it should be.

I've picked a point suitably high up the stack — on EmbedLiteViewChild::InitGeckoWindow() — to attach a breakpoint and set the print running. Hopefully this will give us an idea about where the chain is being broken.
(gdb) b EmbedLiteViewChild::InitGeckoWindow
Breakpoint 1 at 0x7fbcb195a4: file mobile/sailfishos/embedshared/
    EmbedLiteViewChild.cpp, line 179.
(gdb) c
Continuing.
[...]

Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::embedlite::
    EmbedLiteViewChild::InitGeckoWindow (this=0x7e37699810, parentId=1, 
    parentBrowsingContext=0x7f88b7a680, isPrivateWindow=false,
    isDesktopMode=false, isHidden=false)
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:179
179     {
(gdb) bt
#0  mozilla::embedlite::EmbedLiteViewChild::InitGeckoWindow (this=0x7e37699810,
    parentId=1, parentBrowsingContext=0x7f88b7a680, isPrivateWindow=false, 
    isDesktopMode=false, isHidden=false)
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:179
#1  0x0000007fbcb0ae48 in mozilla::detail::RunnableMethodArguments<unsigned int
    const, mozilla::dom::BrowsingContext*, bool const, bool const, bool 
    const>::applyImpl<mozilla::embedlite::EmbedLiteViewChild, void 
    (mozilla::embedlite::EmbedLiteViewChild::*)(unsigned int,
    mozilla::dom::BrowsingContext*, bool, bool, bool),
    StoreCopyPassByConstLRef>unsigned int const>,
    StoreRefPtrPassByPtr<mozilla::dom::BrowsingContext>,
    StoreCopyPassByConstLRef<bool const>, StoreCopyPassByConstLRef<bool const>>,
    StoreCopyPassByConstLRef<bool const>, 0ul, 1ul, 2ul, 3ul, 4ul>
    (args=..., m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:280
#2  mozilla::detail::RunnableMethodArguments<unsigned int const,
    mozilla::dom::BrowsingContext*, bool const, bool const,
    bool const>::apply<mozilla::embedlite::EmbedLiteViewChild, void
    (mozilla::embedlite::EmbedLiteViewChild::*)(unsigned int,
    mozilla::dom::BrowsingContext*, bool, bool, bool)> (
    m=<optimized out>, o=<optimized out>, this=<optimized out>)
    at xpcom/threads/nsThreadUtils.h:1154
#3  mozilla::detail::RunnableMethodImpl<mozilla::embedlite::EmbedLiteViewChild*,
    void (mozilla::embedlite::EmbedLiteViewChild::*)(unsigned int,
    mozilla::dom::BrowsingContext*, bool, bool, bool), true,
    (mozilla::RunnableKind)0, unsigned int const, mozilla::dom::BrowsingContext*,
    bool const, bool const, bool const>::Run (this=<optimized out>)
    at xpcom/threads/nsThreadUtils.h:1201
#4  0x0000007fb9c8b99c in mozilla::RunnableTask::Run (this=0x7ed8004590)
[...]
Oh, that backtrace quickly degenerates because EmbedLiteViewChild::InitGeckoWindow() is being called as a posted runnable task. I'll need to pick somewhere lower down. It's being run from the EmbedLiteViewChild::EmbedLiteViewChild() constructor, so let's try that instead.

The debugger is running incredibly slowly today for some reason. But eventually it gets there.
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) b EmbedLiteViewChild::EmbedLiteViewChild
Breakpoint 2 at 0x7fbcb143f0: file mobile/sailfishos/embedshared/
    EmbedLiteViewChild.cpp, line 71.
(gdb) c
Continuing.

Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::embedlite::
    EmbedLiteViewChild::EmbedLiteViewChild (this=0x7e30cc9000,
    aWindowId=@0x7f9f3cfc3c: 1, aId=@0x7f9f3cfc40: 2,
    aParentId=@0x7f9f3cfc44: 1, parentBrowsingContext=0x7f88bd2f40,
    isPrivateWindow=@0x7f9f3cfc2e: false,
    isDesktopMode=@0x7f9f3cfc2f: false, isHidden=@0x7f9f3cfc30: false)
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:71
71      EmbedLiteViewChild::EmbedLiteViewChild(const uint32_t &aWindowId,
(gdb) bt
#0  mozilla::embedlite::EmbedLiteViewChild::EmbedLiteViewChild
    (this=0x7e30cc9000, aWindowId=@0x7f9f3cfc3c: 1, aId=@0x7f9f3cfc40: 2, 
    aParentId=@0x7f9f3cfc44: 1, parentBrowsingContext=0x7f88bd2f40,
    isPrivateWindow=@0x7f9f3cfc2e: false, isDesktopMode=@0x7f9f3cfc2f: false, 
    isHidden=@0x7f9f3cfc30: false)
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:71
#1  0x0000007fbcb23aa4 in mozilla::embedlite::EmbedLiteViewThreadChild::
    EmbedLiteViewThreadChild (this=0x7e30cc9000, windowId=<optimized out>, 
    id=<optimized out>, parentId=<optimized out>,
    parentBrowsingContext=<optimized out>, isPrivateWindow=<optimized out>,
    isDesktopMode=<optimized out>, isHidden=<optimized out>)
    at mobile/sailfishos/embedthread/EmbedLiteViewThreadChild.cpp:15
#2  0x0000007fbcb2a69c in mozilla::embedlite::EmbedLiteAppThreadChild::
    AllocPEmbedLiteViewChild (this=0x7f889f48e0, windowId=@0x7f9f3cfc3c: 1, 
    id=@0x7f9f3cfc40: 2, parentId=@0x7f9f3cfc44: 1,
    parentBrowsingContext=<optimized out>, isPrivateWindow=@0x7f9f3cfc2e: false, 
    isDesktopMode=@0x7f9f3cfc2f: false, isHidden=@0x7f9f3cfc30: false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#3  0x0000007fba17f48c in mozilla::embedlite::PEmbedLiteAppChild::
    OnMessageReceived (this=0x7f889f48e0, msg__=...) at PEmbedLiteAppChild.cpp:529
#4  0x0000007fba06b85c in mozilla::ipc::MessageChannel::DispatchAsyncMessage
    (this=this@entry=0x7f889f49a8, aProxy=aProxy@entry=0x7ebc001cb0, aMsg=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:675
[...]
#49 0x0000007fbd165538 in js::jit::MaybeEnterJit (cx=0x7f88234ba0, state=...)
    at js/src/jit/Jit.cpp:207
#50 0x0000007f8830d951 in ?? ()
Backtrace stopped: Cannot access memory at address 0xfa512247ea
(gdb) 
That gets us a little further, but not massively so. Let's go back a bit further again.
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) b PEmbedLiteAppParent::SendPEmbedLiteViewConstructor
Breakpoint 2 at 0x7fba19099c: PEmbedLiteAppParent::SendPEmbedLiteViewConstructor.
    (2 locations)
(gdb) c
Continuing.

Thread 1 "sailfish-browse" hit Breakpoint 2, mozilla::embedlite::
    PEmbedLiteAppParent::SendPEmbedLiteViewConstructor
    (this=this@entry=0x7f88a64fc0, windowId=@0x7fffffe414: 1,
    id=@0x7fbfb29440: 3, parentId=@0x7fffffe40c: 1,
    parentBrowsingContext=@0x7fffffe400: 547754946368, 
    isPrivateWindow=@0x7fffffe40b: false, isDesktopMode=@0x7fffffe40a: false,
    isHidden=@0x7fffffe409: false) at PEmbedLiteAppParent.cpp:168
168     PEmbedLiteAppParent.cpp: No such file or directory.
(gdb) bt
#0  mozilla::embedlite::PEmbedLiteAppParent::SendPEmbedLiteViewConstructor
    (this=this@entry=0x7f88a64fc0, windowId=@0x7fffffe414: 1,
    id=@0x7fbfb29440: 3, parentId=@0x7fffffe40c: 1,
    parentBrowsingContext=@0x7fffffe400: 547754946368,
    isPrivateWindow=@0x7fffffe40b: false, isDesktopMode=@0x7fffffe40a: false,
    isHidden=@0x7fffffe409: false) at PEmbedLiteAppParent.cpp:168
#1  0x0000007fbcb18850 in mozilla::embedlite::EmbedLiteApp::CreateView
    (this=0x5555859800, aWindow=0x5555c67e80, aParent=<optimized out>, 
    aParentBrowsingContext=<optimized out>, aIsPrivateWindow=<optimized out>,
    isDesktopMode=<optimized out>, isHidden=<optimized out>)
    at mobile/sailfishos/EmbedLiteApp.cpp:478
#2  0x0000007fbfb8693c in QMozViewPrivate::createView (this=0x555569b070)
    at qmozview_p.cpp:862
#3  QMozViewPrivate::createView (this=0x555569b070) at qmozview_p.cpp:848
#4  0x00000055555d10ec in WebPageFactory::createWebPage (this=0x55559bc500,
    webContainer=0x55559bc110, initialTab=...)
    at ../factories/webpagefactory.cpp:44
#5  0x00000055555acf68 in WebPages::page (this=0x55559bc650, tab=...)
    at include/c++/8.3.0/bits/atomic_base.h:390
#6  0x000000555559148c in DeclarativeWebContainer::activatePage
    (this=this@entry=0x55559bc110, tab=..., force=force@entry=false)
    at include/c++/8.3.0/bits/atomic_base.h:390
#7  0x000000555559160c in DeclarativeWebContainer::onNewTabRequested
    (this=0x55559bc110, tab=...) at ../core/declarativewebcontainer.cpp:1047
#8  0x0000007fb7ec4204 in QMetaObject::activate(QObject*, int, int, void**) ()
    from /usr/lib64/libQt5Core.so.5
#9  0x00000055555f6d80 in DeclarativeTabModel::newTabRequested
    (this=this@entry=0x55559d6130, _t1=...) at moc_declarativetabmodel.cpp:366
#10 0x00000055555c3d28 in DeclarativeTabModel::newTab (this=0x55559d6130,
    url=..., parentId=1, browsingContext=547754946368, hidden=<optimized out>)
    at ../history/declarativetabmodel.cpp:230
#11 0x00000055555d0e78 in DeclarativeWebPageCreator::createView
    (this=0x55559c50f0, parentId=<optimized out>,
    parentBrowsingContext=<optimized out>, hidden=<optimized out>)
    at /usr/include/qt5/QtCore/qarraydata.h:240
#12 0x0000007fbfb71ef0 in QMozContextPrivate::CreateNewWindowRequested
    (this=<optimized out>, chromeFlags=<optimized out>, hidden=<optimized out>, 
    aParentView=0x5555bf74e0, parentBrowsingContext=@0x7fffffe938: 547754946368)
    at qmozcontext.cpp:218
#13 0x0000007fbcb10eec in mozilla::embedlite::EmbedLiteApp::CreateWindowRequested
    (this=0x5555859800, chromeFlags=@0x7fffffe928: 4094,
    hidden=@0x7fffffe922: true, parentId=@0x7fffffe924: 1,
    parentBrowsingContext=@0x7fffffe938: 547754946368)
    at mobile/sailfishos/EmbedLiteApp.cpp:543
#14 0x0000007fbcb1ea68 in mozilla::embedlite::EmbedLiteAppThreadParent::
    RecvCreateWindow (this=<optimized out>, parentId=<optimized out>, 
    parentBrowsingContext=<optimized out>, chromeFlags=<optimized out>,
    hidden=<optimized out>, createdID=0x7fffffe92c, cancel=0x7fffffe923)
    at mobile/sailfishos/embedthread/EmbedLiteAppThreadParent.cpp:70
#15 0x0000007fba183aa0 in mozilla::embedlite::PEmbedLiteAppParent::
    OnMessageReceived (this=0x7f88a64fc0, msg__=..., reply__=@0x7fffffea38: 0x0)
    at PEmbedLiteAppParent.cpp:924
#16 0x0000007fba06b618 in mozilla::ipc::MessageChannel::DispatchSyncMessage
    (this=this@entry=0x7f88a65088, aProxy=aProxy@entry=0x7fa0010590, aMsg=..., 
    aReply=@0x7fffffea38: 0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:675
[...]
#36 0x000000555557b360 in main (argc=<optimized out>, argv=<optimized out>)
    at main.cpp:201
(gdb) 
Frustratingly a lot of the values we'd like to investigate have been optimised out, making them hard to access. However, what we can see is that for the call to EmbedLiteApp::CreateWindowRequested() the value of hidden is set to true:
#13 0x0000007fbcb10eec in mozilla::embedlite::EmbedLiteApp::CreateWindowRequested
    (this=0x5555859800, chromeFlags=@0x7fffffe928: 4094,
    hidden=@0x7fffffe922: true, parentId=@0x7fffffe924: 1,
    parentBrowsingContext=@0x7fffffe938: 547754946368)
That's at stack frame 13. The next time we see it in non-optimised form is at stack frame 0 where it's set to false (albeit with a slightly different name):
#0  mozilla::embedlite::PEmbedLiteAppParent::SendPEmbedLiteViewConstructor
    (this=this@entry=0x7f88a64fc0, windowId=@0x7fffffe414: 1,
    id=@0x7fbfb29440: 3, parentId=@0x7fffffe40c: 1,
    parentBrowsingContext=@0x7fffffe400: 547754946368,
    isPrivateWindow=@0x7fffffe40b: false, isDesktopMode=@0x7fffffe40a: false,
    isHidden=@0x7fffffe409: false) at PEmbedLiteAppParent.cpp:168
Somewhere between these two frames the value is getting lost. The debugger isn't being very helpful in identifying exactly where, so I'll need to read through the code manually.

Luckily it doesn't take long to find out... it's in stack frame 12 where we have this:
uint32_t QMozContextPrivate::CreateNewWindowRequested(const uint32_t &chromeFlags,
    const bool &hidden, EmbedLiteView *aParentView,
    const uintptr_t &parentBrowsingContext)
{
    Q_UNUSED(chromeFlags)

    uint32_t parentId = aParentView ? aParentView->GetUniqueID() : 0;
    qCDebug(lcEmbedLiteExt) << "QtMozEmbedContext new Window requested: parent:"
        << (void *)aParentView << parentId;
    uint32_t viewId = QMozContext::instance()->createView(parentId,
        parentBrowsingContext);
    return viewId;
}
As we can see, the value of hidden just isn't being used at all in this method. I must have missed it when I made all my changes earlier. It slipped past because I gave it a default value in the header, so it compiled even though I forgot to pass it on:
    quint32 createView(const quint32 &parentId = 0,
        const uintptr_t &parentBrowsingContext = 0, const bool hidden = false);
Luckily qtmozembed is one of the quicker browser packages to compile, so it should be possible to fix, build and test the changes pretty swiftly.
 
A screenshot of the tab view in the browser; most of the tabs have a dark background but the one at the bottom is blank and coloured red

Rather excitingly all of the windows now appear with the standard colour, except for the window created during printing which is glaringly red. Just what we need!

Okay, that's it tested and working. The next step is to add the view filtering so that the red windows aren't shown in the user interface. I have a suspicion that this step is going to be harder than I think it is. But that's something to think about tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
22 Dec 2023 : Day 115 #
This morning the build had not completed successfully. There were a few instances of the hidden and isHidden parameters that I'd failed to add. It's quite a web inside the EmbedLite structures with what seems to be every combination of {App, View}, {Thread, Process, Nothing} and {Parent, Child, Interface}. It was always going to be easy to miss something.

Thankfully that's not the disaster it might have been because the idl interface definitions went through their regeneration cycle to create new source and header files. That means I can now do partial builds to check that things are working.

As I add the final arguments to the code I notice that the final resting place of the flag seems to be here in EmbedLiteViewChild:
void
EmbedLiteViewChild::InitGeckoWindow(const uint32_t parentId,
                                    mozilla::dom::BrowsingContext
                                    *parentBrowsingContext,
                                    const bool isPrivateWindow,
                                    const bool isDesktopMode,
                                    const bool &isHidden)
At this point the flag is discarded and it should be used for something. This is a note to myself to figure out what.

Another note to myself is to decide whether we need to store isHidden in EmbedLiteViewParent or not. Here's the relevant method signature:
EmbedLiteViewParent::EmbedLiteViewParent(const uint32_t &windowId,
                                         const uint32_t &id,
                                         const uint32_t &parentId,
                                         const uintptr_t &parentBrowsingContext,
                                         const bool &isPrivateWindow,
                                         const bool &isDesktopMode,
                                         const bool &isHidden)
Currently this gets stored in a class member, but we don't seem to use it anywhere. It can probably be removed, although then that begs the question of why pass it in at all? I should come back and check this later.

Now that the partial build completed successfully I'm going to run it through the full build again so that I have a package to install. That's necessary for me to be able to build qtmozembed and sailfish-browser against the new header files from these changes.

[...]

It's towards the end of the day now and the build completed successfully. At least, I'm pretty sure it did. I stupidly closed the build output window by accident during the day. But the packages have a modified time of 15:30 today, which sounds about right.

The next step is to find out how to push the hidden flag through to the sailfish-browser front-end. In theory, with the interface changed, if I now try to build qtmozembed against the latest packages I've just built, it should fail in some way. If I find the place it's failing, that should give me a good place to start.

Let's build them and see what happens.
$ cd ../qtmozembed/
$ sfdk -c no-fix-version build -d
[...]
The following 5 packages are going to be reinstalled:
  xulrunner-qt5              91.9.1-1
  xulrunner-qt5-debuginfo    91.9.1-1
  xulrunner-qt5-debugsource  91.9.1-1
  xulrunner-qt5-devel        91.9.1-1
  xulrunner-qt5-misc         91.9.1-1

5 packages to reinstall.
[...]
Wrote: RPMS/SailfishOS-devel-aarch64/qtmozembed-qt5-tests-1.53.9-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/qtmozembed-qt5-devel-1.53.9-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/qtmozembed-qt5-debugsource-1.53.9-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/qtmozembed-qt5-1.53.9-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/qtmozembed-qt5-debuginfo-1.53.9-1.aarch64.rpm
[...]
Surprisingly the qtmozembed build all completed without error. What about sailfish-browser?
$ cd ../sailfish-browser/
$ sfdk -c no-fix-version build -d
[...]
The following 2 packages are going to be downgraded:
  qtmozembed-qt5        1.53.25+sailfishos.esr91.20231003080118.8b9a009-1 -> 1.53.9-1
  qtmozembed-qt5-devel  1.53.25+sailfishos.esr91.20231003080118.8b9a009-1 -> 1.53.9-1

2 packages to downgrade.
[...]
Wrote: RPMS/SailfishOS-devel-aarch64/sailfish-browser-debugsource-2.2.45-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/sailfish-browser-ts-devel-2.2.45-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/sailfish-browser-settings-2.2.45-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/sailfish-browser-2.2.45-1.aarch64.rpm
Wrote: RPMS/SailfishOS-devel-aarch64/sailfish-browser-tests-2.2.45-1.aarch64.rpm
[...]
Well that's all a bit strange. I thought there would be at least some exposure to some of the elements I changed. I guess I got that wrong.

The connection point is supposed to happen in qmozcontext.cpp where the QMozContextPrivate class implements EmbedLiteAppListener:
class QMozContextPrivate : public QObject, public EmbedLiteAppListener
{
[...]
    virtual uint32_t CreateNewWindowRequested(const uint32_t &chromeFlags,
                                              EmbedLiteView *aParentView,
                                              const uintptr_t
                                              &parentBrowsingContext) override;
Compare this with the method it's supposed to be overriding:
class EmbedLiteAppListener
{
[...]
  // New Window request which is usually coming from WebPage new window request
  virtual uint32_t CreateNewWindowRequested(const uint32_t &chromeFlags,
                                            const bool &hidden,
                                            EmbedLiteView *aParentView,
                                            const uintptr_t
                                            &parentBrowsingContext) { return 0; }
The override should be causing an error.

Since I didn't do a clean rebuild of qtmozembed I'm thinking maybe the reason is that it just didn't know to rebuild some of the files. Let's try it again, but this time with passion.
$ cd ../qtmozembed/
$ git clean -xdf
$ sfdk -c no-fix-version build -d
[...]
In file included from qmozcontext.cpp:22:
qmozcontext_p.h:53:22: error: ‘virtual uint32_t QMozContextPrivate::
    CreateNewWindowRequested(const uint32_t&, mozilla::embedlite::EmbedLiteView*,
    const uintptr_t&)’ marked ‘override’, but does not override
     virtual uint32_t CreateNewWindowRequested(const uint32_t &chromeFlags,
                      ^~~~~~~~~~~~~~~~~~~~~~~~
qmozcontext_p.h:53:22: warning:   by ‘virtual uint32_t QMozContextPrivate::
    CreateNewWindowRequested(const uint32_t&, mozilla::embedlite::EmbedLiteView*,
    const uintptr_t&)’ [-Woverloaded-virtual]
[...]
make[1]: *** [Makefile:701: ../src/moc_qmozcontext_p.o] Error 1
Okay, that's more like it. So now I know for sure that this is the place to start.

I've added the hidden parameter to QMozContextPrivate::CreateNewWindowRequested() and let the changes cascade through the rest of the code. There's quite a lot of abstraction in qtmozembed, but despite that the changes all seem quite reasonable and, crucially, result in a new Q_PROPERTY being created to expose the hidden flag to the front end.

After a few build-fail-fix cycles the packages now builds fully without compiler or linker errors. The sailfish-browser code also links against it without issue... but it shouldn't. I probably need to clean out all the existing build first again and give it another go. I've cleaned it out, but sailfish-browser takes a surprising amount of time to complete (nothing compared to gecko, about 20 mins or so, but that's still a lot of code to build).

I've set it going, let's see if there are errors now (there will be!).

[...]

And indeed there are! Thank goodness for that.
$ cd ../sailfish-browser/
$ git clean -xdf
$ sfdk -c no-fix-version build -d -p
[...]
../qtmozembed/declarativewebpagecreator.h:37:21: error: ‘virtual quint32
    DeclarativeWebPageCreator::createView(const quint32&, const uintptr_t&)’
    marked ‘override’, but does not override
     virtual quint32 createView(const quint32 &parentId, const uintptr_t &parentBrowsingContext) override;
                     ^~~~~~~~~~
compilation terminated due to -Wfatal-errors.
make[2]: *** [Makefile:988: declarativewebpagecreator.o] Error 1
I've been through and made the changes needed. Ultimately they all flowed through the code rather neatly, touching 12 files in qtmozembed and 10 files in sailfish-browser. Crucially the tab model now has a new HiddenRole role that can be used to hide certain tabs.

The next step will be to create a QSortFilterProxyModel on it so that the pages can be hidden. But it's already late now so this will be a task for tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
21 Dec 2023 : Day 114 #
It has to be said, I'm pretty frustrated with myself. I've looked over the code in printUtils.js over and over again and I just can't seem to figure out what I'm doing wrong. I've been poring over this code for days now.

In theory it might be possible to create a browser element and add it to the current page, and use that as the print source. This seems to be roughly what the print preview is doing in the code that emilio highlighted:
  startPrintWindow(aBrowsingContext, aOptions) {
[...]
    if (openWindowInfo) {
      let printPreview = new PrintPreview({
        sourceBrowsingContext: aBrowsingContext,
        openWindowInfo,
      });
      let browser = printPreview.createPreviewBrowser("source");
      document.documentElement.append(browser);
      // Legacy print dialog or silent printing, the content process will print
      // in this <browser>.
      return browser;
    }

    let settings = this.getPrintSettings();
    settings.printSelectionOnly = printSelectionOnly;
    this.printWindow(aBrowsingContext, settings);
    return null;
  },
That call to createPreviewBrowser() goes on to do something like this:
  createPreviewBrowser(sourceVersion) {
    let browser = document.createXULElement("browser");
[...]
      browser.openWindowInfo = this.openWindowInfo;
[...]
    return browser;
  }
So it looks like the code is creating a browser element, appending it to the document and then... well, then what? It doesn't call printWindow() on the result, it just returns it. That could be because it's opening the print preview and waiting for the user to hit the "Print" button, but if that's the case it's no good for what I need, because I want it to go ahead and print straight away.

To assuage my frustration I'm going to leave this and — even if there is a solution that avoids it — just go ahead and implement the page hiding functionality that I've been talking about for the last few days. It feels like a defeat though, because there should be some other way, I just can't quite put my finger on it. And that's because I can't properly follow the flow of the code when it's all so abstract.

Alright, it's time to cut my losses and move on.

At least finally this means I've been able to actually do some coding. I've added a hidden (in some cases isHidden to match the style of a particular interface) parameter to various window creation methods scattered around the EmbedLite code. Changes like this:
--- a/embedding/embedlite/PEmbedLiteApp.ipdl
+++ b/embedding/embedlite/PEmbedLiteApp.ipdl
@@ -18,12 +18,12 @@ nested(upto inside_cpow) sync protocol PEmbedLiteApp {
 parent:
   async Initialized();
   async ReadyToShutdown();
-  sync CreateWindow(uint32_t parentId, uintptr_t parentBrowsingContext, uint32_t chromeFlags)
+  sync CreateWindow(uint32_t parentId, uintptr_t parentBrowsingContext, uint32_t chromeFlags, bool hidden)
     returns (uint32_t createdID, bool cancel);
   async PrefsArrayInitialized(Pref[] prefs);
This has knock-on effects to a lot of files, but all of them follow similar lines: a parameter is added to the method signature which is then passed on to some other method that gets called inside the method definition. I'm not going to list all of the changes here, but just note that it cascades throughout the EmbedLite portion of the code, across 22 files in total:
$ git status
On branch sailfishos-esr91
Your branch is up-to-date with 'origin/sailfishos-esr91'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
        modified:   embedding/embedlite/EmbedLiteApp.cpp
        modified:   embedding/embedlite/EmbedLiteApp.h
        modified:   embedding/embedlite/PEmbedLiteApp.ipdl
        modified:   embedding/embedlite/embedprocess/EmbedLiteAppProcessParent.cpp
        modified:   embedding/embedlite/embedprocess/EmbedLiteAppProcessParent.h
        modified:   embedding/embedlite/embedprocess/EmbedLiteViewProcessParent.cpp
        modified:   embedding/embedlite/embedprocess/EmbedLiteViewProcessParent.h
        modified:   embedding/embedlite/embedshared/EmbedLiteAppChild.cpp
        modified:   embedding/embedlite/embedshared/EmbedLiteAppChild.h
        modified:   embedding/embedlite/embedshared/EmbedLiteAppChildIface.h
        modified:   embedding/embedlite/embedshared/EmbedLiteAppParent.h
        modified:   embedding/embedlite/embedshared/EmbedLiteViewChild.cpp
        modified:   embedding/embedlite/embedshared/EmbedLiteViewChild.h
        modified:   embedding/embedlite/embedshared/EmbedLiteViewParent.cpp
        modified:   embedding/embedlite/embedshared/EmbedLiteViewParent.h
        modified:   embedding/embedlite/embedthread/EmbedLiteAppThreadParent.cpp
        modified:   embedding/embedlite/embedthread/EmbedLiteAppThreadParent.h
        modified:   embedding/embedlite/embedthread/EmbedLiteViewThreadChild.cpp
        modified:   embedding/embedlite/embedthread/EmbedLiteViewThreadChild.h
        modified:   embedding/embedlite/embedthread/EmbedLiteViewThreadParent.cpp
        modified:   embedding/embedlite/embedthread/EmbedLiteViewThreadParent.h
        modified:   embedding/embedlite/utils/WindowCreator.cpp
        modified:   gecko-dev (new commits, untracked content)

no changes added to commit (use "git add" and/or "git commit -a")
These changes will also have impacts elsewhere, in particular I expect them to bubble up into the qtmozembed and sailfish-browser code as well. In fact, I'm absolutely hoping they will be because because the entire point is to get the details about the need to hide the window into the sailfish-browser code where it can actually be made to do something useful.

However, having made these changes, a partial build doesn't work because the PEmbedLiteApp.ipdl code needs to be regenerated, so I'm going to have to set this going on a full build overnight. I've tested the changes as far as possible by running partial builds but it won't fully go through because of this. I'll just have to hope that I covered all of the changes needed.

While it builds I could start looking at the code in the other packages, but without the gecko headers to build against I won't be able to compile any of those other changes either, so I'm going to leave those until tomorrow as well.

Here's hoping the build goes through okay!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
20 Dec 2023 : Day 113 #
It's a new dawn, it's a new day. As I write this the sun is peaking up over the horizon casting all the clouds orange against the pastel blue sky, trees silhouetted against the skyline. It feels like a good day to start moving forwards with this and implementing "hidden windows" in the sailfish-browser code.
 
Trees silhouetted against an orange-blue sunrise

Thankfully I've nearly returned to full health again and as I write this am feeling a lot better. Everyone has been so generous with their kind wishes, it's made a real difference. Wherever you are in the world, I hope you're in good health, and if not I hope your recovery is swift and full!

I'm also very excited about the fact my talk on all this gecko development has been accepted for presentation at FOSDEM in the FOSS on Mobile devroom! If you're planning to be in Brussels yourself on the 3-4 February I really hope to see you there.

I'm just about to get on to starting coding when I notice a notification icon on my Matrix client. It seems emilio has got back to me. Yesterday I asked on the Mozilla Matrix "Printing" channel whether I'd need to implement some "window hiding" feature in the front-end to handle these empty print windows. Here's the reply.
 
Well instead of a window you can create a browser like this: https://searchfox.org/mozilla-central/search?q=symbol:%23handleStaticCloneCreatedForPrint

It's a brief but pithy answer. Digging through the links provided, a few things spring to mind:
  1. The handleStaticCloneCreatedForPrint() method in printUtils.js doesn't exist in ESR 91. Maybe I'll have to create it?
  2. Nor does the createParentBrowserForStaticClone() method in the same file; I'll probably have to back port that too.
  3. The OPEN_PRINT_BROWSER case in browser.js exists in browser.js in ESR 91, but I'm not sure whether it's getting executed. I should check.
  4. I'm wondering if, in order to go down this route, I'd need to call startPrintWindow() in printUtils.js to initiate a print, rather than the CanonicalBrowserContext::print() method I'm currently using.
Lots of questions. The first thing to check is whether the OPEN_PRINT_BROWSER execution path is being taken at all in my current build. As this is JavaScript code I can't use the debugger to find out; I'll need to put some debug print statements into the code instead.

As I dig around in the code I quickly come to realise that browser.js isn't a file that's used by sailfish-browser. I guess this functionality is largely handled by the front-end QML code instead. However the printUtils.js file is there in omni.ja so I can still add some debug output to that.
  /**  
   * Initialize a print, this will open the tab modal UI if it is enabled or
   * defer to the native dialog/silent print.
   *
   * @param aBrowsingContext
   *        The BrowsingContext of the window to print.
   *        Note that the browsing context could belong to a subframe of the
   *        tab that called window.print, or similar shenanigans.
   * @param aOptions
   *        {openWindowInfo}      Non-null if this call comes from window.print().
   *                              This is the nsIOpenWindowInfo object that has to
   *                              be passed down to createBrowser in order for the
   *                              child process to clone into it.
   *        {printSelectionOnly}  Whether to print only the active selection of
   *                              the given browsing context.
   *        {printFrameOnly}      Whether to print the selected frame.
   */
  startPrintWindow(aBrowsingContext, aOptions) {
    dump("PRINT: startPrintWindow\n");
[...]

  /**  
   * Starts the process of printing the contents of a window.
   *
   * @param aBrowsingContext
   *        The BrowsingContext of the window to print.
   * @param {Object?} aPrintSettings
   *        Optional print settings for the print operation
   */
  printWindow(aBrowsingContext, aPrintSettings) {
    dump("PRINT: printWindow\n");
[...]
I pack up these changes into omni.ja, run the browser and set the print running. There's plenty of debug output, but neither of these new entries show up.

So the next step is to switch out the call to CanonicalBrowserContext::print() to a call to startPrintWindow() if that's possible.

Doing this directly doesn't give good results, for multiple reasons. First we get an error stating that MozElements isn't defined. That's coming from the definition of PrintPreview in the PrintUtils.js file:
class PrintPreview extends MozElements.BaseControl {
[...]
The definition of MozElements happens in customElements.js and should be a global within the context of a chrome window. As the text at the top of the file explains: "This is loaded into chrome windows with the subscript loader.". For the time being I can work around this by calling PrintUtils.printWindow() instead of PrintUtils.startPrintWindow() because the former doesn't make use of the preview window.

However, when I do this the script complains that document is undefined, caused by this condition in printWindow():
    const printPreviewIsOpen = !!document.getElementById(
      "print-preview-toolbar"
    );
Again, I can work around this by setting printPreviewIsOpen to false and commenting out these lines. Having made these two changes, the print works but opens a window, just as before.

Looking more carefully through the ESR 91 code and the code that emilio mentioned, it looks to me like this is the critical part (although I'm not in the least bit certain about this):
    if (openWindowInfo) {
      let printPreview = new PrintPreview({
        sourceBrowsingContext: aBrowsingContext,
        openWindowInfo,
      });
      let browser = printPreview.createPreviewBrowser("source");
      document.documentElement.append(browser);
      // Legacy print dialog or silent printing, the content process will print
      // in this <browser>.
      return browser;
    }
This will create the preview window which will be used for the silent printing. But even this doesn't seem to be doing what we need: it's still creating the browser window.

The fact that neither document nor MozElements is defined makes me think that this needs to be called from inside a chrome window for this to work. And I don't know how to do that.

I've come full circle again. I've sent a message to emilio for clarification, but I'm yet again brought back to the thought that I'm going to need to hide this window manually, whether it's the print preview browser, or a tab that's opened to hold the cloned document in.

I feel like I've been chasing my tail yet again. I need some positive thoughts: tomorrow I plan to be back to full health and doing some actual coding.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
19 Dec 2023 : Day 112 #
I'm still feeling poorly today, which is very frustrating. When I've got a temperature I just can't focus well on the code, everything swims around and refuses to settle in to a comprehensible form. I find myself drifting from one file to another and losing the thread of where I've been and why I'm here. So you'll have to excuse me if some things I write today look a little nonsensical.

There's a silver lining though, in the form of the kind words I received from Thigg, Valorsoguerriero97, poetaster and (privately) throwaway69 via the Sailfish Forum. Thank you! It really helps motivate me to continue onwards. And it amazes me that you have the stamina to keep up with these posts!

As we discussed yesterday, looking at this bit of code inside nsGlobalWindowOuter::Print() we can see that if the condition is entered into the browser context is just copied over directly from the source to the new context:
  nsAutoSyncOperation sync(docToPrint, SyncOperationBehavior::eAllowInput);
  AutoModalState modalState(*this);

  nsCOMPtr<nsIContentViewer> cv;
  RefPtr<BrowsingContext> bc;
  bool hasPrintCallbacks = false;
  if (docToPrint->IsStaticDocument() &&
      (aIsPreview == IsPreview::Yes ||
       StaticPrefs::print_tab_modal_enabled())) {
    if (aForWindowDotPrint == IsForWindowDotPrint::Yes) {
      aError.ThrowNotSupportedError(
          "Calling print() from a print preview is unsupported, did you intend "
          "to call printPreview() instead?");
      return nullptr;
    }
    // We're already a print preview window, just reuse our browsing context /
    // content viewer.
    bc = sourceBC;
I'm left wondering whether, if we went through that branch, we might end up just using the existing context. It's clear from the code later that if this were to happen, the code that opens the new window would be skipped.

It's also clearly not meant to be doing this: forcing execution through this branch would be a dubious long shot at best. This bit of code is only supposed to be used if the print preview window context is to be re-used. We're not creating a print preview window, so if we hack this, it'll be attempting to use the actual context for the page instead.

So while I don't expect it to produce good results, I'm interested to see what will happen.

Looking at the condition that's gating the code block, the following part of the condition will be false:
  aIsPreview == IsPreview::Yes
Therefore what we'll need is for both docToPrint->IsStaticDocument() and StaticPrefs::print_tab_modal_enabled() to be true. Checking the about:config page I note that the print.tab_model.enabled value is already set to true, so we just need the docToPrint->IsStaticDocument() call to return true for the condition to hold.

To test all this out I put a breakpoint just before this block of code using the debugger and examine the state of the system when it hits. It turns out that docToPrint->IsStaticDocument() is indeed false, but we can switch its value using the debugger to force our way into the conditional code block.
$ EMBED_CONSOLE=1 MOZ_LOG="EmbedLite:5" gdb sailfish-browser
[...]
(gdb) b nsGlobalWindowOuter.cpp:5287
Breakpoint 1 at 0x7fba96f364: file dom/base/nsGlobalWindowOuter.cpp, line 5287.
(gdb) c
Continuing.
[...]
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, nsGlobalWindowOuter::Print
    (nsIPrintSettings*, nsIWebProgressListener*, nsIDocShell*,
    nsGlobalWindowOuter::IsPreview, nsGlobalWindowOuter::IsForWindowDotPrint,
    std::function<void (mozilla::dom::PrintPreviewResultInfo const&)>&&,
    mozilla::ErrorResult&) (this=this@entry=0x7f88ab8100,
    aPrintSettings=aPrintSettings@entry=0x7e3553a790,
    aListener=aListener@entry=0x7f8a6730d0,
    aDocShellToCloneInto=aDocShellToCloneInto@entry=0x0,
    aIsPreview=aIsPreview@entry=nsGlobalWindowOuter::IsPreview::No,
    aForWindowDotPrint=aForWindowDotPrint@entry=nsGlobalWindowOuter::
    IsForWindowDotPrint::No, aPrintPreviewCallback=..., aError=...)
    at dom/base/nsGlobalWindowOuter.cpp:5287
5287      nsAutoSyncOperation sync(docToPrint, SyncOperationBehavior::eAllowInput);
(gdb) n
5288      AutoModalState modalState(*this);
(gdb) p docToPrint->IsStaticDocument()
Attempt to take address of value not located in memory.
(gdb) p docToPrint
$1 = {mRawPtr = 0x7f89077650}
(gdb) p aIsPreview
$2 = nsGlobalWindowOuter::IsPreview::No
(gdb) p docToPrint.mRawPtr.mIsStaticDocument
$3 = false
(gdb) set variable docToPrint.mRawPtr.mIsStaticDocument = true
(gdb) p docToPrint.mRawPtr.mIsStaticDocument
$4 = true
(gdb) c
Continuing.
[New LWP 19986]

Thread 13 "Socket Thread" received signal SIGPIPE, Broken pipe.

Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
nsPrintJob::FindFocusedDocument (this=this@entry=0x7f885e3020,
    aDoc=aDoc@entry=0x7f89077650)
    at layout/printing/nsPrintJob.cpp:2411
2411      nsPIDOMWindowOuter* window = aDoc->GetOriginalDocument()->GetWindow();
(gdb) bt
#0  nsPrintJob::FindFocusedDocument (this=this@entry=0x7f885e3020,
    aDoc=aDoc@entry=0x7f89077650) at layout/printing/nsPrintJob.cpp:2411
#1  0x0000007fbc3bac6c in nsPrintJob::DoCommonPrint
    (this=this@entry=0x7f885e3020, aIsPrintPreview=aIsPrintPreview@entry=false,
    aPrintSettings=aPrintSettings@entry=0x7e3553a790,
    aWebProgressListener=aWebProgressListener@entry=0x7f8a6730d0,
    aDoc=aDoc@entry=0x7f89077650) at layout/printing/nsPrintJob.cpp:548
#2  0x0000007fbc3bb718 in nsPrintJob::CommonPrint
    (this=this@entry=0x7f885e3020, aIsPrintPreview=aIsPrintPreview@entry=false,
    aPrintSettings=aPrintSettings@entry=0x7e3553a790,
    aWebProgressListener=aWebProgressListener@entry=0x7f8a6730d0,
    aSourceDoc=aSourceDoc@entry=0x7f89077650)
    at layout/printing/nsPrintJob.cpp:488
#3  0x0000007fbc3bb840 in nsPrintJob::Print (this=this@entry=0x7f885e3020,
    aSourceDoc=<optimized out>, aPrintSettings=aPrintSettings@entry=0x7e3553a790,
    aWebProgressListener=aWebProgressListener@entry=0x7f8a6730d0)
    at layout/printing/nsPrintJob.cpp:824
#4  0x0000007fbc108fe4 in nsDocumentViewer::Print (this=0x7f8905e190,
    aPrintSettings=0x7e3553a790, aWebProgressListener=0x7f8a6730d0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#5  0x0000007fba96f49c in nsGlobalWindowOuter::Print(nsIPrintSettings*,
    nsIWebProgressListener*, nsIDocShell*, nsGlobalWindowOuter::IsPreview,
    nsGlobalWindowOuter::IsForWindowDotPrint, std::function<void
    (mozilla::dom::PrintPreviewResultInfo const&)>&&, mozilla::ErrorResult&)
    (this=this@entry=0x7f88ab8100,
    aPrintSettings=aPrintSettings@entry=0x7e3553a790,
    aListener=aListener@entry=0x7f8a6730d0,
    aDocShellToCloneInto=aDocShellToCloneInto@entry=0x0,
    aIsPreview=aIsPreview@entry=nsGlobalWindowOuter::IsPreview::No,
    aForWindowDotPrint=aForWindowDotPrint@entry=nsGlobalWindowOuter::
    IsForWindowDotPrint::No, aPrintPreviewCallback=..., aError=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#6  0x0000007fbc7eb714 in mozilla::dom::CanonicalBrowsingContext::Print
    (this=this@entry=0x7f88cb8b70, aPrintSettings=0x7e3553a790, aRv=...)
    at include/c++/8.3.0/bits/std_function.h:402
#7  0x0000007fbab82f08 in mozilla::dom::CanonicalBrowsingContext_Binding::print
    (args=..., void_self=0x7f88cb8b70, obj=..., cx_=0x7f881df400)
    at BrowsingContextBinding.cpp:4674
#8  mozilla::dom::CanonicalBrowsingContext_Binding::print_promiseWrapper
    (cx=0x7f881df400, obj=..., void_self=0x7f88cb8b70, args=...)
    at BrowsingContextBinding.cpp:4688
[...]
#34 0x0000007fbd16635c in js::jit::MaybeEnterJit (cx=0x7f881df400, state=...)
    at js/src/jit/Jit.cpp:207
#35 0x0000007f8824be41 in ?? ()
Backtrace stopped: Cannot access memory at address 0x56e206215288
(gdb)
Okay, so this plan clearly isn't going to work.

Putting this dead-end to one side, we still have two options on the table:
  1. Figure out what's happening in ESR 78 where the window isn't being created.
  2. Re-code the front-end to hide the additional window.
The more I look at the code the more I think I'm going to have to go down the second route. Nevertheless I'd like to try to compare execution with ESR 78 one more time to try to figure out how it can get away without creating a clone. In particular, there is this BuildNestedPrintObjects() method in ESR 78 which I still suspect is performing the role of the cloning. This code still exists in ESR 91 and the debugger tells me it's not being called. But for comparison I'd really like to know whether it's being called in ESR 78.

Unfortunately the debugger just refuses to work properly on the ESR 78 code as installed from the repository:
$ EMBED_CONSOLE=1 MOZ_LOG="EmbedLite:5" gdb sailfish-browser
(gdb) r
(gdb) b BuildNestedPrintObjects
Breakpoint 1 at 0x7fbc20b7a8: file layout/printing/nsPrintJob.cpp, line 403.
(gdb) c
Continuing.

Thread 8 "GeckoWorkerThre" hit Breakpoint 1, BuildNestedPrintObjects (aDocument=
dwarf2read.c:10473: internal-error: process_die_scope::process_die_scope
    (die_info*, dwarf2_cu*): Assertion `!m_die->in_process' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.

dwarf2read.c:10473: internal-error: process_die_scope::process_die_scope
    (die_info*, dwarf2_cu*): Assertion `!m_die->in_process' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
Command aborted.
(gdb) 
This is basically scuppering any attempt I make to breakpoint on these functions, or indeed anything nearby. I've tried this on two separate devices now and get the same results, so I'm pretty sure this is due to the debug symbols in the repository or a bug in gdb rather than some specific misconfiguration of gdb on my phone.

To try to address this, I'm going to rebuild ESR 78 and install completely new debug packages on my device. That will at least discount the possibility of it being something corrupt about the debug symbols coming from the official repositories. But to do that, I'll need to rebuild the ESR 78 code.

I've set the build going. While it builds I can't do much else except read through the code some more. But I also decide to try to ask around on the Mozilla Matrix "Printing" channel to see if anyone has any advice about how to tackle this. Here's the message I posted:
 
Hi. I have a query about cloning the document for printing PDF to file. I'm upgrading gecko from ESR78 to ESR91 for Sailfish OS (a mobile Linux variant) where we have a "Save page to PDF" option in the Qt UI. In ESR78 the nsPrintJob::DoCommonPrint() method is called but in ESR91 this no longer seems to work so I call CanonicalBrowsingContext::Print() instead (seems to relate to D87063). The former clones the doc using (I think) BuildNestedPrintObjects(), but the latter seems to call OpenInternal() inside nsGlobalWindowOuter::Print() to open a new window for it instead. Is this correct, or am I misunderstanding the changes? The reason I ask is that the latter opens a new blank window for the clone to go into, which I'm trying to avoid.

I post this at 15:47 and wait. By the evening I've received a reply from emilio:
 
We still clone the old doc. In order to avoid the new window you need to handle OPEN_PRINT_BROWSER flag

That's really helpful; it suggests that hiding the extra window in the user interface really is the right way to address this. I post a follow-up to get clarification on this.
 
Thanks for your reply emilio. So I have to allow the window to be created, but hide it in the front-end based on the fact the OPEN_PRINT_BROWSER flag (from nsOpenWindowInfo::GetIsForPrinting()) is set?

As I write this I'm still awaiting a reply, but I've already come to a conclusion on this: I'll need to add code so that some windows can be created but hidden from the front-end. I already have some ideas for how this can work. It will require some bigger changes than I was hoping for, but on the plus side, most if not all of the changes will happen in Sailfish code, rather than gecko code.

And you never know, there may be some other use for the functionality in the future as well.

The ESR 78 build is still chugging away. It's quite late here and I'm feeling rotten. I've got barely a fraction of the things I was hoping to get completed today done, but I have at least reached a conclusion for how to proceed with this printing situation. So the day hasn't been a complete wipe-out.

I really hope I'm feeling better in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 Dec 2023 : Day 111 #
I'm still feeling unwell. Plus I had a long couple of days at work yesterday and the day before. All in all this has left me in a bit of a mess and I spent last night trying to figure out how the parent browsing context and the new browsing context relate to one another. Although I did get to a piece of code that looked relevant, I wasn't able to piece together the processes either side that led up to it and that led away from it, all of which should have re-converged inside nsWindowWatcher::OpenWindowInternal().

What I did do was add in some code — mimicking the code further up the stack which usually performs the task elsewhere — to create the new browser context from the parent browser context into this method, in an attempt to short circuit the process that would usually require a new window to be created.

It built overnight and now I'm testing it out.

Unfortunately it leaves us with a segfault occurring inside nsGlobalWindowOuter::Print(). Let's take a record of the backtrace.
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 6812]
nsGlobalWindowOuter::Print(nsIPrintSettings*, nsIWebProgressListener*,
    nsIDocShell*, nsGlobalWindowOuter::IsPreview,
    nsGlobalWindowOuter::IsForWindowDotPrint, std::function<void
    (mozilla::dom::PrintPreviewResultInfo const&)>&&, mozilla::ErrorResult&)
    (this=this@entry=0x7f8843e530, 
    aPrintSettings=aPrintSettings@entry=0x7e33d83c80,
    aListener=aListener@entry=0x7f88f44c00,
    aDocShellToCloneInto=aDocShellToCloneInto@entry=0x0, 
    aIsPreview=aIsPreview@entry=nsGlobalWindowOuter::IsPreview::No, 
    aForWindowDotPrint=aForWindowDotPrint@entry=nsGlobalWindowOuter::
    IsForWindowDotPrint::No, aPrintPreviewCallback=..., aError=...)
    at dom/base/nsGlobalWindowOuter.cpp:5351
5351        cloneDocShell->GetContentViewer(getter_AddRefs(cv));
(gdb) bt
#0  nsGlobalWindowOuter::Print(nsIPrintSettings*, nsIWebProgressListener*,
    nsIDocShell*, nsGlobalWindowOuter::IsPreview,
    nsGlobalWindowOuter::IsForWindowDotPrint, std::function<void
    (mozilla::dom::PrintPreviewResultInfo const&)>&&, mozilla::ErrorResult&)
    (this=this@entry=0x7f8843e530, 
    aPrintSettings=aPrintSettings@entry=0x7e33d83c80,
    aListener=aListener@entry=0x7f88f44c00,
    aDocShellToCloneInto=aDocShellToCloneInto@entry=0x0, 
    aIsPreview=aIsPreview@entry=nsGlobalWindowOuter::IsPreview::No, 
    aForWindowDotPrint=aForWindowDotPrint@entry=nsGlobalWindowOuter::
    IsForWindowDotPrint::No, aPrintPreviewCallback=..., aError=...)
    at dom/base/nsGlobalWindowOuter.cpp:5351
#1  0x0000007fbc7eb714 in mozilla::dom::CanonicalBrowsingContext::Print
    (this=this@entry=0x7f88c83400, aPrintSettings=0x7e33d83c80, aRv=...)
    at include/c++/8.3.0/bits/std_function.h:402
#2  0x0000007fbab82f08 in mozilla::dom::CanonicalBrowsingContext_Binding::print
    (args=..., void_self=0x7f88c83400, obj=..., cx_=0x7f881df400)
    at BrowsingContextBinding.cpp:4674
#3  mozilla::dom::CanonicalBrowsingContext_Binding::print_promiseWrapper
    (cx=0x7f881df400, obj=..., void_self=0x7f88c83400, args=...)
    at BrowsingContextBinding.cpp:4688
#4  0x0000007fbb2e2960 in mozilla::dom::binding_detail::GenericMethod
    <mozilla::dom::binding_detail::NormalThisPolicy,
    mozilla::dom::binding_detail::ConvertExceptionsToPromises>
    (cx=0x7f881df400, argc=<optimized out>, vp=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/CallArgs.h:207
[...]
#29 0x0000007fbd1662cc in js::jit::MaybeEnterJit (cx=0x7f881df400, state=...)
    at js/src/jit/Jit.cpp:207
#30 0x0000007f8824be41 in ?? ()
Backtrace stopped: Cannot access memory at address 0x4d673b58921e
(gdb) 
Here's the code (it's the last line that's causing the segfault):
    nsCOMPtr<nsIDocShell> cloneDocShell = bc->GetDocShell();
    MOZ_DIAGNOSTIC_ASSERT(cloneDocShell);
    cloneDocShell->GetContentViewer(getter_AddRefs(cv));
And here's the reason for the segfault:
(gdb) p cloneDocShell
$1 = {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}
(gdb) 
So, from this, it looks very much like the browser context simply doesn't have a docShell. Maybe there's a whole bunch of other stuff it doesn't — but should — have as well?

Looking back at the original code that creates the new browser context inside EmbedLiteViewChild::InitGeckoWindow(), there's this snippet of code following the creation that looks particularly relevant:
  // nsWebBrowser::Create creates nsDocShell, calls InitWindow for nsIBaseWindow,
  // and finally creates nsIBaseWindow. When browsingContext is passed to
  // nsWebBrowser::Create, typeContentWrapper type is passed to the nsWebBrowser
  // upon instantiation.
  mWebBrowser = nsWebBrowser::Create(mChrome, mWidget, browsingContext,
                                     nullptr);
With the short-circuit we introduced yesterday this is no longer being executed. So no browser context. So segfaults. So no good.

Back to the drawing board and, in particular, to figuring out the path to how this EmbedLiteViewChild::InitGeckoWindow() method gets called.

It would be so nice to cheat on this. I could execute the code in the debugger, stick a breakpoint on the method and observe the path in the backtrace, were it not for the fact I introduced that short circuit. Since I did, this method no longer gets called, so the breakpoint will no longer fire.

The short circuit is broken anyway so I've removed it and set the build running. But it will be hours before it completes so I may as well peruse the code in the meantime in case I can figure it out manually.

I've also just noticed that just after creation of the mWebBrowser that we saw above, there's this call here:
  rv = mWebBrowser->SetVisibility(true);
I'm also interested to know whether this could be of use to us, so that's also something to follow in the code (although I don't think it'll make any difference on the tab display side).

So from manual inspection, this InitGeckoWindow() gets triggered by the EmbedLiteViewChild constructor. The constructor isn't called directly, but is inherited by EmbedLiteViewThreadChild and EmbedLiteViewProcessChild. I think the browser uses the thread version so I'm going to follow that route.

An instance of EmbedLiteViewThreadChild is created in the EmbedLiteAppThreadChild::AllocPEmbedLiteViewChild() method, which isn't called from anywhere in the main codebase. However, just to highlight how frustratingly convoluted this is, there is some generated code which is presumably the place responsible for calling it:
auto PEmbedLiteAppChild::OnMessageReceived(const Message& msg__) ->
    PEmbedLiteAppChild::Result
{
[...]
    switch (msg__.type()) {
    case PEmbedLiteApp::Msg_PEmbedLiteViewConstructor__ID:
        {
[...]
            msg__.EndRead(iter__, msg__.type());
            PEmbedLiteViewChild* actor =
                (static_cast<EmbedLiteAppChild*>(this))->
                AllocPEmbedLiteViewChild(windowId, id, parentId,
                parentBrowsingContext, isPrivateWindow, isDesktopMode);
[...]
The sending of this message (through a bit of generated indirection) happens as a result of a call to PEmbedLiteAppParent::SendPEmbedLiteViewConstructor(), which (finally we got there) is called here:
EmbedLiteView*
EmbedLiteApp::CreateView(EmbedLiteWindow* aWindow, uint32_t aParent,
  uintptr_t aParentBrowsingContext, bool aIsPrivateWindow, bool isDesktopMode)
{
  LOGT();
  NS_ASSERTION(mState == INITIALIZED, "The app must be up and runnning by now");
  static uint32_t sViewCreateID = 0;
  sViewCreateID++;

  PEmbedLiteViewParent* viewParent = static_cast<PEmbedLiteViewParent*>(
      mAppParent->SendPEmbedLiteViewConstructor(aWindow->GetUniqueID(),
          sViewCreateID, aParent, aParentBrowsingContext, aIsPrivateWindow,
          isDesktopMode));
  EmbedLiteView* view = new EmbedLiteView(this, aWindow, viewParent, sViewCreateID);
  mViews[sViewCreateID] = view;
  return view;
}
This is part of the main codebase, not auto-generated.

There's always a balance between minimising changes to the entire codebase and minimising changes to the gecko codebase. My preference is for the latter wherever possible since it will help minimise maintenance in the long run. In this case, working on this principle, it looks like the best thing to do is to allow the print status to be passed down through the EmbedLite code and into the front end. This will allow the front end to choose how to deal with the new window, which after some additional front end changes, could be to hide the window.

Before I do that, I'm going to double check the reason why all of this isn't necessary in ESR 78. I think I've reached the stage where I understand the process flow in the ESR 91 print, browser context and window creation code sufficiently to make it worthwhile attempting a comparison.

So what I want to do is to check whether nsGlobalWindowOuter::Print() is called as part of the ESR 78 process and, if so, figure out what happens at this point:
    // We're already a print preview window, just reuse our browsing context /
    // content viewer.
    bc = sourceBC;
    nsCOMPtr<nsIDocShell> docShell = bc->GetDocShell();
    if (!docShell) {
      aError.ThrowNotSupportedError("No docshell");
      return nullptr;
    }
If that code is being executed it might imply a route that could work on ESR 91 that also avoids having to open the new window.

That's it for today though. I'll have to pick this question up again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
17 Dec 2023 : Day 110 #
Unfortunately I woke up this morning with some kind of Winter ailment: headache, sore throat, cough and all the rest. This has knocked me out for pretty much the entire day, including all of my planned gecko work.

The thing I hate most about being unwell is that it totally throws off all of my plans. I really hate that. It's not just my gecko work, but I had big plans to work on a Rust project today as well.

As a consequence it'll be a short one today, and potentially also for the next couple of days. Hopefully I'll be able to gain momentum just as soon as I'm back to my normal self.

Nevertheless, as yesterday we're continuing to work through the printer code to try to remove the window that's appearing when the print starts. Now that our examination has reached the sailfish-browser code I think we can start calling it a tab now. This is the code we ended up at last night:
quint32 DeclarativeWebPageCreator::createView(const quint32 &parentId,
    const uintptr_t &parentBrowsingContext)
{
    QPointer<DeclarativeWebPage> oldPage = m_activeWebPage;
    m_model->newTab(QString(), parentId, parentBrowsingContext);

    if (m_activeWebPage && oldPage != m_activeWebPage) {
        return m_activeWebPage->uniqueId();
    }
    return 0;
}
Now it's time to check out the m_model which we can see declared in the class header:
    QPointer<DeclarativeTabModel> m_model;
I've looked through the code a few times now and exactly what's happening is confusing me greatly. It looks very much like the parentBrowsingContext is passed in to creatView() which passes it along to newTab(). This transition causes a name change to browsingContext and this value gets stored with the new tab:
    Tab tab;
    tab.setTabId(nextTabId());
    tab.setRequestedUrl(url);
    tab.setBrowsingContext(browsingContext);
    tab.setParentId(parentId);
There's no particular magic happening here:
void Tab::setBrowsingContext(uintptr_t browsingContext)
{
    Q_ASSERT_X(m_browsingContext == 0, Q_FUNC_INFO,
        "Browsing context can be set only once.");
    m_browsingContext = browsingContext;
}
My concern is that this is the same browsing context that's eventually going to be returned by nsWindowWatcher::OpenWindowInternal(). After we've gone all the way to sailfish-browser and back again the new browser context gets extracted like this:
    nsCOMPtr<nsIWebBrowserChrome> newChrome;
    rv = CreateChromeWindow(parentChrome, chromeFlags, openWindowInfo,
                            getter_AddRefs(newChrome));

    // Inside CreateChromeWindow
    RefPtr<BrowsingContext> parentBrowsingContext = aOpenWindowInfo->GetParent();
    Tab tab;
    tab.setBrowsingContext(parentBrowsingContext);

	// After CreateChromeWindow
    nsCOMPtr<nsIDocShellTreeItem> newDocShellItem = do_GetInterface(newChrome);
    RefPtr<BrowsingContext> newBC = newDocShellItem->GetBrowsingContext();
Note that the above isn't real code, it's just a summary of the steps that happen at different stages to give an idea of how the variables are moving around.

This is such a web. Somewhere the parentBrowsingContext going in has to touch the newBC coming out. It looks like the place may be in the EmbedLiteViewChild::InitGeckoWindow() method where we have this:
  RefPtr<BrowsingContext> browsingContext = BrowsingContext::CreateDetached
    (nullptr, parentBrowsingContext, nullptr, EmptyString(),
    BrowsingContext::Type::Content);
This seems to happen when an instance of EmbedLiteViewChild is created.

I'm going to try to short-circuit this by adding the following code to nsWindowWatcher::OpenWindowInternal() before all this happens.
  if (!newBC) {
    bool isForPrinting = openWindowInfo->GetIsForPrinting();
    if (isForPrinting) {
      RefPtr<BrowsingContext> parentBrowsingContext = openWindowInfo->GetParent();
      newBC = BrowsingContext::CreateDetached(nullptr, parentBrowsingContext,
        nullptr, EmptyString(), BrowsingContext::Type::Content);
    }
  }
Taking a look at this with fresh eyes in the morning will &mash; I'm sure — help. Maybe I'll feel a little more cogent tomorrow. This is as far as I can take it today.

It's building now so it's a good time for me to pause. I'll aim to pick this up again in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
16 Dec 2023 : Day 109 #
It's back to trying to hide the extraneous print window today. If you've been following over the last few days, you'll know that the "Save page to PDF" functionality is now working, but plagued by an errant window that insists on opening during the print.

The reason for the window needing to exist is completely clear: the page needs to be cloned into a browser context and the process of creating a browser context involves creating a window for it to live in. This wasn't a problem for ESR 78... I'm not exactly sure why and that's something I should look into.

But first I'm going to look at the code in WindowCreator.cpp that lives in the EmbedLite portion of the gecko project. Recall that we were working through this yesterday and it looked like this:
  if (isForPrinting) {
    return NS_OK;
  }

  mChild->CreateWindow(parentID, reinterpret_cast(parentBrowsingContext.get()), aChromeFlags, &createdID, aCancel);

  if (*aCancel) {
    return NS_OK;
  }

  nsresult rv(NS_OK);
  nsCOMPtr<nsIWebBrowserChrome> browser;
  nsCOMPtr<nsIThread> thread;
  NS_GetCurrentThread(getter_AddRefs(thread));
  while (!browser && NS_SUCCEEDED(rv)) {
    bool processedEvent;
    rv = thread->ProcessNextEvent(true, &processedEvent);
    if (NS_SUCCEEDED(rv) && !processedEvent) {
      rv = NS_ERROR_UNEXPECTED;
    }
    EmbedLiteViewChildIface* view = mChild->GetViewByID(createdID);
    if (view) {
      view->GetBrowserChrome(getter_AddRefs(browser));
    }
  }

  // check to make sure that we made a new window
  if (_retval) {
      NS_ADDREF(*_retval = browser);
      return NS_OK;
  }
I've included slightly more more of the code today because the chunk in the middle is important. Just to dissect this a little, the first conditional return is code I added in the hope that it would avoid creation of the window. The problem with this is that it means _retval never gets set and this is the return value needed in order for the browser context to be created.

We can tell more than this though. In order for _retval to be set we need browser to be set and that will only happen if:
  1. Execution goes inside the while loop;
  2. and createdID has been set;
  3. which requires that CreateWindow() is called.
In summary, we can't avoid creating the window at this point. The fact that the while loop is waiting for the browser chrome to exist makes it look like we can't avoid creating the window at all.

But that doesn't mean we can't hide it, so let's pursue that goal for now.

Checking the WindowCreator.h header we can see that the mChild that handles the call to CreateWindow() is a class that implements the EmbedLiteAppChildIface interface. There's actually only one concrete class that does this which is EmbedLiteAppChild. All this does is send a CreateWindow message.

There are a few candidates for classes that might receive this. It could be EmbedLiteAppThreadParent, EmbedLiteAppProcessParent or ContentParent. Only the first two are part of the EmbedLite code and they both end up doing the same thing, which is making the following call:
  *createdID = mApp->CreateWindowRequested(chromeFlags, parentId, parentBrowsingContext);
In both cases mApp is an instance of EmbedLiteApp. I'm wondering why we have both of these. I'm wondering if one is used for the browser and the other is used for the WebView. Or maybe Sailfish OS only uses one of them. I must remember to investigate this further at some point.

Also worth noting is that the createdID returned by this call is exactly the value we're interested in. We need this to be set.

Let's continue on into EmbedLiteApp.

In this method there's a snippet of code that searches for the parent based on the parent ID and then calls this:
  uint32_t viewId = mListener ? mListener->CreateNewWindowRequested(
    chromeFlags, view, parentBrowsingContext) : 0;
Once again it's the return value that we need. The mListener is an instance of EmbedLiteAppListener, that's actually defined in the same header file as EmbedLiteApp. This is an interface and we need something to implement it. But there's nothing in gecko that does.

After some scrabbling around and scratching of my head the reason becomes clear: there's nothing that implements it in gecko because the class that implements it is in qtmozembed. Which means we've finally broken through to the sailfish-browser frontend. The implementation is in qmozcontext.cpp:
uint32_t QMozContextPrivate::CreateNewWindowRequested(const uint32_t &chromeFlags,
  EmbedLiteView *aParentView, const uintptr_t &parentBrowsingContext)
{
    Q_UNUSED(chromeFlags)

    uint32_t parentId = aParentView ? aParentView->GetUniqueID() : 0;
    qCDebug(lcEmbedLiteExt) << "QtMozEmbedContext new Window requested: parent:"
      << (void *)aParentView << parentId;
    uint32_t viewId = QMozContext::instance()->createView(parentId,
      parentBrowsingContext);
    return viewId;
}
Once again, it's the return value, viewId, that we're particularly concerned about. The call to createView() is bounced by the QMozContext instance (it's presumably a singleton) to mViewCreator which is an instance of QMozViewCreator which is an abstract class that has no implementation in qtmozembed.

And that's because the implementation comes from the sailfish-browser repository in the form of the DeclarativeWebPageCreator class.

Here's the implementation:
quint32 DeclarativeWebPageCreator::createView(const quint32 &parentId,
  const uintptr_t &parentBrowsingContext)
{
    QPointer<DeclarativeWebPage> oldPage = m_activeWebPage;
    m_model->newTab(QString(), parentId, parentBrowsingContext);

    if (m_activeWebPage && oldPage != m_activeWebPage) {
        return m_activeWebPage->uniqueId();
    }
    return 0;
}
Now it really feels like we're getting close! We're switching terminology from windows to tabs. Plus we have no more repositories to go in to. Somewhere around here we're going to have to start thinking about adding in some way to hide the window, if we're going to go down that route.

Alright, my train is coming in to King's Cross and I have a busy evening tonight with my work Christmas Party, so this is likely to be all I have time for today. It feels like we made good progress though and are reaching a point where I might be able to start making changes to the code to actually fix the issue. That's always the goal!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
15 Dec 2023 : Day 108 #
This morning: a successful build. That's not really so surprising given the minimal changes I made yesterday, but I've messed up smaller pieces of code before, so you can never be sure.

So, packages built, installed and run, and what do we have? In order to get the debug output I have to set the MOZ_LOG environment variable to include "EmbedLite:5" so that the LOGE() messages will show. Here's what happens when I press the "Save web page as PDF" option:
$ EMBED_CONSOLE=1 MOZ_LOG="EmbedLite:5" sailfish-browser
[...]
[Parent 15259: Unnamed thread 7060002670]: E/EmbedLite FUNC::virtual nsresult 
    mozilla::embedlite::EmbedLiteAppChild::Observe(nsISupports*, const char*,
    const char16_t*):68 topic:embed:download
[Parent 15259: Unnamed thread 7060002670]: W/EmbedLite ERROR:
    EmbedLite::virtual nsresult WindowCreator::CreateChromeWindow
    (nsIWebBrowserChrome*, uint32_t, nsIOpenWindowInfo*, bool*,
    nsIWebBrowserChrome**):61 PRINT: isForPrinting: 1
EmbedliteDownloadManager error: [Exception... "The request is not allowed."
    nsresult: "0x80530021 (NS_ERROR_DOM_NOT_ALLOWED_ERR)"  location:
    "JS frame :: resource://gre/modules/DownloadCore.jsm ::
    DownloadError :: line 1755"  data: no]
[Parent 15259: Unnamed thread 7060002670]: E/EmbedLite FUNC::virtual nsresult mozilla::embedlite::EmbedLiteAppChild::Observe(nsISupports*, const char*,
    const char16_t*):68 topic:embed:download
JavaScript error: , line 0: uncaught exception: Object
CONSOLE message:
[JavaScript Error: "uncaught exception: Object"]
That's a bit messy, but if you look through carefully it's possible to see the output PRINT: isForPrinting: 1. That's a good sign: it shows that in WindowCreator::CreateChromeWindow() we can find out whether this is a window that needs to be hidden or not.

But the rest is less encouraging. Apart from the logging I also added an early return to the method to prevent the window from actually being created. I got the method to return NS_OK in the hope that whatever happens further down the stack might not notice. But from this debug output we can see that it did notice, throwing an exception "The request is not allowed."

From DOMException.h we can see that this NS_ERROR_DOM_NOT_ALLOWED_ERR that we're getting is equivalent to NotAllowedError which appears one in DownloadCore.jsm. However in this particular instance it's just some code conditioned on the error. What we need is some code that's actually generating the error. Looking through the rest of the code, it all looks a bit peculiar: this error is usually triggered by a an authentication failure, which doesn't fit with what we're doing here at all.

There are only a few places where it seems to be used for other purposes. One of them is the StyleSheet::InsertRuleIntoGroup() method where it seems to be caused by a failed attempt to modify a group.
nsresult StyleSheet::InsertRuleIntoGroup(const nsACString& aRule,
                                         css::GroupRule* aGroup,
                                         uint32_t aIndex) {
  NS_ASSERTION(IsComplete(), "No inserting into an incomplete sheet!");
  // check that the group actually belongs to this sheet!
  if (this != aGroup->GetStyleSheet()) {
    return NS_ERROR_INVALID_ARG;
  }

  if (IsReadOnly()) {
    return NS_OK;
  }

  if (ModificationDisallowed()) {
    return NS_ERROR_DOM_NOT_ALLOWED_ERR;
  }
[...]
I'm running sailfish-browser through the debugger with a break point on this method to see whether this is where it's coming from. If it is, I'm not sure what that will tell us.

But the breakpoint isn't triggered, so it's not this bit of code anyway. After grepping the code a bit more and sifting carefully through various files, I eventually realise that there's an instance of this error that could potentially be generated directly after our call to OpenInternal() in nsGlobalWindowOuter.cpp:
[...]
      aError = OpenInternal(u""_ns, u""_ns, u""_ns,
                            false,             // aDialog
                            false,             // aContentModal
                            true,              // aCalledNoScript
                            false,             // aDoJSFixups
                            true,              // aNavigate
                            nullptr, nullptr,  // No args
                            nullptr,           // aLoadState
                            false,             // aForceNoOpener
                            printKind, getter_AddRefs(bc));
      if (NS_WARN_IF(aError.Failed())) {
        return nullptr;
      }
    }
    if (!bc) {
      aError.ThrowNotAllowedError("No browsing context");
      return nullptr;
    }
That looks like a far more promising case. The debugger won't let me put a breakpoint directly on the line that's throwing the exception here, but it will let me put one on the OpenInternal() call, so I can set that and step through to check whether this error is the one causing the output.
(gdb) break nsGlobalWindowOuter.cpp:5329
Breakpoint 4 at 0x7fba96fad0: file dom/base/nsGlobalWindowOuter.cpp, line 5329.
(gdb) c
Continuing.
[LWP 17314 exited]
[Parent 16702: Unnamed thread 7f88002670]: E/EmbedLite FUNC::virtual nsresult mozilla::embedlite::EmbedLiteAppChild::Observe(nsISupports*, const char*,
    const char16_t*):68 topic:embed:download
[Switching to LWP 16938]

Thread 8 "GeckoWorkerThre" hit Breakpoint 4, nsGlobalWindowOuter::Print
    (nsIPrintSettings*, nsIWebProgressListener*, nsIDocShell*,
    nsGlobalWindowOuter::IsPreview, nsGlobalWindowOuter::IsForWindowDotPrint,
    std::function<void (mozilla::dom::PrintPreviewResultInfo const&)>&&,
    mozilla::ErrorResult&) (this=this@entry=0x7f88564870,
    aPrintSettings=aPrintSettings@entry=0x7e352ed060,
    aListener=aListener@entry=0x7f89bfa6b0, 
    aDocShellToCloneInto=aDocShellToCloneInto@entry=0x0,
    aIsPreview=aIsPreview@entry=nsGlobalWindowOuter::IsPreview::No, 
    aForWindowDotPrint=aForWindowDotPrint@entry=nsGlobalWindowOuter::
    IsForWindowDotPrint::No, aPrintPreviewCallback=..., aError=...)
    at dom/base/nsGlobalWindowOuter.cpp:5329
5329          aError = OpenInternal(u""_ns, u""_ns, u""_ns,
(gdb) n
[Parent 16702: Unnamed thread 7f88002670]: W/EmbedLite ERROR:
    EmbedLite::virtual nsresult WindowCreator::CreateChromeWindow
    (nsIWebBrowserChrome*, uint32_t, nsIOpenWindowInfo*, bool*,
    nsIWebBrowserChrome**):61 PRINT: isForPrinting: 1
5329          aError = OpenInternal(u""_ns, u""_ns, u""_ns,
(gdb) n
30      ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsError.h:
    No such file or directory.
(gdb) n
5343        if (!bc) {
(gdb) p bc
$1 = {mRawPtr = 0x0}
(gdb) n
5344          aError.ThrowNotAllowedError("No browsing context");
(gdb) n
5345          return nullptr;
(gdb) 
So that's the one. It's also clear from this why the error is happening: by not creating the window we're obviously also causing the creation of the browser context bc to fail.

The obvious follow-up question is why the lack of window is preventing the browsing context from getting created. Well have to follow the metaphorical rabbit down the rabbit hole to find out. So the context is returned as the last parameter of the OpenInternal() call:
nsresult nsGlobalWindowOuter::OpenInternal(
    const nsAString& aUrl, const nsAString& aName, const nsAString& aOptions,
    bool aDialog, bool aContentModal, bool aCalledNoScript, bool aDoJSFixups,
    bool aNavigate, nsIArray* argv, nsISupports* aExtraArgument,
    nsDocShellLoadState* aLoadState, bool aForceNoOpener, PrintKind aPrintKind,
    BrowsingContext** aReturn)
In practice it's the domReturn variable inside this method that interests us. This is set as the last parameter of OpenWindow2() called inside this method:
      rv = pwwatch->OpenWindow2(this, url, name, options,
                                /* aCalledFromScript = */ true, aDialog,
                                aNavigate, argv, isPopupSpamWindow,
                                forceNoOpener, forceNoReferrer, wwPrintKind,
                                aLoadState, getter_AddRefs(domReturn));
And then this comes back from the call to OpenWindowInternal() that's being called from inside this method, again as the last parameter:
  return OpenWindowInternal(aParent, aUrl, aName, aFeatures, aCalledFromScript,
                            dialog, aNavigate, argv, aIsPopupSpam,
                            aForceNoOpener, aForceNoReferrer, aPrintKind,
                            aLoadState, aResult);
In this method the variable we're interested in is newBC which ends up turning in to the returned browser context value. Now this doesn't get directly returned by the next level. Instead there's some code that looks like this:
      /* We can give the window creator some hints. The only hint at this time
         is whether the opening window is in a situation that's likely to mean
         this is an unrequested popup window we're creating. However we're not
         completely honest: we clear that indicator if the opener is chrome, so
         that the downstream consumer can treat the indicator to mean simply
         that the new window is subject to popup control. */
      rv = CreateChromeWindow(parentChrome, chromeFlags, openWindowInfo,
                              getter_AddRefs(newChrome));
      if (parentTopInnerWindow) {
        parentTopInnerWindow->Resume();
      }

      if (newChrome) {
        /* It might be a chrome AppWindow, in which case it won't have
            an nsIDOMWindow (primary content shell). But in that case, it'll
            be able to hand over an nsIDocShellTreeItem directly. */
        nsCOMPtr<nsPIDOMWindowOuter> newWindow(do_GetInterface(newChrome));
        nsCOMPtr<nsIDocShellTreeItem> newDocShellItem;
        if (newWindow) {
          newDocShellItem = newWindow->GetDocShell();
        }
        if (!newDocShellItem) {
          newDocShellItem = do_GetInterface(newChrome);
        }
        if (!newDocShellItem) {
          rv = NS_ERROR_FAILURE;
        }
        newBC = newDocShellItem->GetBrowsingContext();
      }
Looking at this code, the most likely explanation for newBC being null is that newChrome is being returned as null from the CreateChromeWindow() call. So let's follow this lead into CreateChromeWindow(). Now we're interested in the newWindowChrome variable and we have some code that looks like this:
  bool cancel = false;
  nsCOMPtr<nsIWebBrowserChrome> newWindowChrome;
  nsresult rv = mWindowCreator->CreateChromeWindow(
      aParentChrome, aChromeFlags, aOpenWindowInfo, &cancel,
      getter_AddRefs(newWindowChrome));

  if (NS_SUCCEEDED(rv) && cancel) {
    newWindowChrome = nullptr;
    return NS_ERROR_ABORT;
  }

  newWindowChrome.forget(aResult);
The mWindowCreator->CreateChromeWindow() call there is important, because that's the line calling the method which we've hacked around with. I carefully arranged things so that the method would leave NS_SUCCEEDED(rv) as true, so it must be the last parameter which is returning null.

So finally we reach high enough up the stack that we're in the EmbedLite code, and the reason for the null return is immediately clear from looking at the WindowCreator::CreateChromeWindow() implementation. In this method it's the _retval variable that's of interest and the code I added causes the method to return before it gets set.
  if (isForPrinting) {
    return NS_OK;
  }

  mChild->CreateWindow(parentID, reinterpret_cast<uintptr_t>
    (parentBrowsingContext.get()), aChromeFlags, &createdID, aCancel);
[...]
  // check to make sure that we made a new window
  if (_retval) {
      NS_ADDREF(*_retval = browser);
      return NS_OK;
  }
We're going to need a better way to solve this. Unfortunately that won't happen this evening as I have a very early start tomorrow, so I'm going to have to leave it there for today. Still, this will be a good place — with something tangible — to pick up from in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
14 Dec 2023 : Day 107 #
We're still splashing around in the docks looking at printing today, hoping to set sail. Yesterday it actually felt like we made pretty good progress getting the print promise to act as expected, so that the user interface works as it should. The remaining issue is that there's a blank window opening every time we print. The window closes once the print is complete, but it's a messy experience for the end user.

Yesterday it was possible to narrow down the code that's triggering the window to open. There's a call to nsGlobalWindowOuter::OpenInternal() in nsGlobalWindowOuter::Print() that eventually leads to a call to and nsWindowWatcher::OpenWindowInternal() which seems to be doing most of the work.

What I'm interested in today is the link between this and the Qt code in sailfish-browser that handles the windows (or "tabs" in sailfish-browser parlance) and actually creates the window on screen.

There's a parameter passed in to nsWindowWatcher::OpenWindowInternal() which indicates that the window is being created for the purposes of: the PrintKind aPrintKind parameter. If this parameter can be accessed from the Sailfish-specific part of the code then it may just be possible to persuade sailfish-browser to open the window "in the background" so that the user doesn't know it's there.

While I'm looking into this I'll also be trying to figure out whether we can avoid the call to open the window altogether. Everything up to nsWindowWatcher::OpenWindowInternal() in the call stack is essential, because it's there that the browser context is created and that's a bit we definitely need. We need a browser context to clone the document into. But the actual window chrome being displayed on screen? Hopefully that part can be skipped.

I've placed a breakpoint on nsWindowWatcher::OpenWindowInternal() and plan to see where that takes us.

Once the breakpoint hits and after stepping through most of the method I eventually get to this:
939             parentTopInnerWindow->Suspend();
(gdb) 
[LWP 30328 exited]
948           rv = CreateChromeWindow(parentChrome, chromeFlags, openWindowInfo,
(gdb) n
359     ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:
        No such file or directory.
(gdb) 
[W] unknown:0 - bool DBWorker::execute(QSqlQuery&) failed execute query
[W] unknown:0 - "INSERT INTO tab (tab_id, tab_history_id) VALUES (?,?);"
[W] unknown:0 - QSqlError("19", "Unable to fetch row",Gecko-dev
                          "UNIQUE constraint failed: tab.tab_id")
[LWP 30138 exited]
[LWP 30313 exited]
[LWP 30314 exited]
[LWP 30123 exited]
950           if (parentTopInnerWindow) {
(gdb) 
That group of LWPs ("lightweight processes" or threads as they are otherwise known) being created are as a result of the window opening. So it's clearly the CreateChromeWindow() call that's triggering the window to open. The errors that follow it could well be coming from sailfish-browser rather than the gecko library, but I'm not sure whether they're errors to worry about, or just artefacts of everything being slowed down due to debugging. I don't recall having seem them when running the code normally.

Let's follow this code a bit more. The nsWindowWatcher::CreateChromeWindow() method is mercifully short. The active ingredient of the method is this bit here:
  nsCOMPtr<nsIWebBrowserChrome> newWindowChrome;
  nsresult rv = mWindowCreator->CreateChromeWindow(
      aParentChrome, aChromeFlags, aOpenWindowInfo, &cancel,
      getter_AddRefs(newWindowChrome));
The mWindowCreator variable is an instance of nsIWindowCreator so the next step is to find out what that is. Stepping through gives us a clue.
419       nsCOMPtr newWindowChrome;
(gdb) 
1363    ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:
        No such file or directory.
(gdb) s
WindowCreator::CreateChromeWindow (this=0x7f889ce190, aParent=0x7f88ba5450,
    aChromeFlags=4094, aOpenWindowInfo=0x7f8854a7b0, aCancel=0x7f9f3d019f, 
    _retval=0x7f9f3d01a0) at mobile/sailfishos/utils/WindowCreator.cpp:44
44        NS_ENSURE_ARG_POINTER(_retval);
(gdb) 
So we've finally reached some Sailfish-specific code. If there's some way to check whether this is a print window or not, it may be possible to stop the window being shown at this point. There is this aOpenWindowInfo object being passed in which is of type nsIOpenWindowInfo. Checking the nsIOpenWindowInfo.idl file we can see that there is a relevant attribute that's part of the object:
  /** Whether this is a window opened for printing */
  [infallible]
  readonly attribute boolean isForPrinting;
Disappointingly the object refuses to yield its contents using the debugger.
(gdb) p aOpenWindowInfo
$3 = (nsIOpenWindowInfo *) 0x7f8854a7b0
(gdb) p *aOpenWindowInfo
$4 = {<nsISupports> = {_vptr.nsISupports = 0x7fbf7dfc00
      <vtable for nsOpenWindowInfo+16>}, <No data fields>}
(gdb) p aOpenWindowInfo->GetIsForPrinting()
Cannot evaluate function -- may be inlined
(gdb) 
Never mind, let's continue digging down into the code. So from here the method calls mChild->CreateWindow() which sends the stack down a rabbit hole of different calls which I've not yet followed to the end. However I do notice that the aOpenWindowInfo object doesn't go any further. So if the info about this being a print window needs extracting, it has to be done here.

I'm going to put some debug printw in here, but I'll also amend the code to cancel the window opening at this on condition that the window is a print window. Then I'll have to build the library to see how that's worked out.

Here's the small piece of code I've added, just before the window is created (which you can see on the last line):
  bool isForPrinting = aOpenWindowInfo->GetIsForPrinting();
  LOGE("PRINT: isForPrinting: %d", isForPrinting);

  if (isForPrinting) {
    return NS_OK;
  }

  mChild->CreateWindow(parentID, reinterpret_cast<uintptr_t>
    (parentBrowsingContext.get()), aChromeFlags, &createdID, aCancel);
I've set it building, which may take a little time, so I'm going to take a break from this while it does. I'll return with some results, hopefully, tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
13 Dec 2023 : Day 106 #
We've not quite left the printing harbour, but we're getting closer to sailing. Yesterday we got to the point where it was possible to print a copy of the page, but there were some glitches: a new blank tab is opened when the print starts, and the user interface doesn't recognise when the print finishes.

I'm hoping these might be easy to fix, but at this point, without further investigation, that's impossible to tell.

So, let's get stuck in to getting things seaworthy.

First up, I've amended the print() method to assume it returns a promise rather than the progress and status change messages that the previous code assumed. I've also added some debug prints so that it now looks like this:
    this._browsingContext = BrowsingContext.getFromWindow(win)

    try {
      dump("PRINT: printing\n");
      this._browsingContext.print(printSettings)
        .then(() => dump("PRINT: Printing finished\n"))
        .catch(exception => dump("PRINT: Printing exception: " + exception + "\n"));
    } finally {
      // Remove the print object to avoid leaks
      this._browsingContext = null;
      dump("PRINT: finally\n");
    }

    dump("PRINT: returning\n");
    let fileInfo = await OS.File.stat(targetPath);
    aSetProgressBytesFn(fileInfo.size, fileInfo.size, false);
When I execute this the following output is generated:
PRINT: printing
PRINT: finally
PRINT: returning
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
PRINT: Printing finished
JavaScript error: file:///usr/lib64/mozembedlite/components/
    EmbedLiteChromeManager.js, line 170: TypeError: chromeListener is undefined
onWindowClosed@file:///usr/lib64/mozembedlite/components/EmbedLiteChromeManager.js:170:7
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteChromeManager.js:201:12
That all looks pretty healthy. The promise is clearly being returned and eventually resolved. There's also positive results in the user interface: now that this is working correctly the window that opens at the start of the printing process now also closes at the end of it. The menu item is no longer greyed out.

So probably the code that was there before was failing because of the incorrect semantics, which also caused the print process not to complete cleanly. One small issue is that the menu item doesn't get greyed out at all now. It should be greyed out while the printing is taking place.

My suspicion is that this is because the function is returning immediately, leaving the promise to run asynchronously, rather than waiting for the promise to resolve before returning. To test the theory out I've updated the code inside the try clause to look like this:
      dump("PRINT: printing\n");
      await new Promise((resolve, reject) => {
        this._browsingContext.print(printSettings)
        .then(() => {
          dump("PRINT: Printing finished\n")
          resolve();
        })
        .catch(exception => {
          dump("PRINT: Printing exception: " + exception + "\n");
          reject(new DownloadError({ result: exception, inferCause: true }));
        });
      });
Now when it's run we see the following output:
PRINT: printing
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
PRINT: Printing finished
PRINT: finally
PRINT: returning
JavaScript error: file:///usr/lib64/mozembedlite/components/
    EmbedLiteChromeManager.js, line 170: TypeError: chromeListener is undefined
onWindowClosed@file:///usr/lib64/mozembedlite/components/EmbedLiteChromeManager.js:170:7
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteChromeManager.js:201:12
That's pretty similar to the output from last time, but notice how the "finally" and "returning" output now waits for the printing to finish before appearing. That's because of the await we've added to the promise. But the good news is that this also produces better results with the user interface too: the menu item is greyed out until the printing finishes, at which point it's restored. The remorse timer and notifications work better too.

So it now appears to be completing cleanly with good results. That means the only thing now to try to fix is the new window that's opening.

One of the nice things about the fact the printing is working is that I can now debug it properly and exclusively on the ESR 91 side to see what's happening (in the C++ portions at least).

My first breakpoints go on the various ShowPrintDialog() methods. This could potentially be the source of the extra window. But when I run the executable and trigger a PDF save the breakpoint isn't hit. So I guess it's not. Instead the SetupSilentPrinting() method is being called, following the explanation in nsIPrintSettings.idl:
  /**
   * We call this function so that anything that requires a run of the event loop
   * can do so safely. The print dialog runs the event loop but in silent printing
   * that doesn't happen.
   *
   * Either this or ShowPrintDialog (but not both) MUST be called by the print
   * engine before printing, otherwise printing can fail on some platforms.
   */
  [noscript] void SetupSilentPrinting();
The backtrace when this hits looks like this:
Thread 8 "GeckoWorkerThre" hit Breakpoint 2,
    nsPrintSettingsQt::SetupSilentPrinting (this=0x7e2c255760)
    at widget/qt/nsPrintSettingsQt.cpp:383
383         return NS_OK;
(gdb) bt
#0  nsPrintSettingsQt::SetupSilentPrinting (this=0x7e2c255760)
    at widget/qt/nsPrintSettingsQt.cpp:383
#1  0x0000007fbc3bb4e4 in nsPrintJob::DoCommonPrint (this=this@entry=0x7e2c14ba30, aIsPrintPreview=aIsPrintPreview@entry=false, 
    aPrintSettings=aPrintSettings@entry=0x7e2c255760,
    aWebProgressListener=aWebProgressListener@entry=0x7f8a0902b0,
    aDoc=aDoc@entry=0x7f8abf4040)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:869
#2  0x0000007fbc3bb718 in nsPrintJob::CommonPrint (this=this@entry=0x7e2c14ba30,
    aIsPrintPreview=aIsPrintPreview@entry=false, 
    aPrintSettings=aPrintSettings@entry=0x7e2c255760,
    aWebProgressListener=aWebProgressListener@entry=0x7f8a0902b0, 
    aSourceDoc=aSourceDoc@entry=0x7f8abf4040)
    at layout/printing/nsPrintJob.cpp:488
#3  0x0000007fbc3bb840 in nsPrintJob::Print (this=this@entry=0x7e2c14ba30,
    aSourceDoc=<optimized out>, aPrintSettings=aPrintSettings@entry=0x7e2c255760, 
    aWebProgressListener=aWebProgressListener@entry=0x7f8a0902b0)
    at layout/printing/nsPrintJob.cpp:824
#4  0x0000007fbc108fe4 in nsDocumentViewer::Print (this=0x7e2ffabd80,
    aPrintSettings=0x7e2c255760, aWebProgressListener=0x7f8a0902b0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#5  0x0000007fba96f49c in nsGlobalWindowOuter::Print(nsIPrintSettings*,
    nsIWebProgressListener*, nsIDocShell*, nsGlobalWindowOuter::IsPreview,
    nsGlobalWindowOuter::IsForWindowDotPrint, std::function>void
    (mozilla::dom::PrintPreviewResultInfo const&)>&&, mozilla::ErrorResult&)
    (this=this@entry=0x7f88b85de0,
    aPrintSettings=aPrintSettings@entry=0x7e2c255760,
    aListener=aListener@entry=0x7f8a0902b0,
    aDocShellToCloneInto=aDocShellToCloneInto@entry=0x0, 
    aIsPreview=aIsPreview@entry=nsGlobalWindowOuter::IsPreview::No, 
    aForWindowDotPrint=aForWindowDotPrint@entry=nsGlobalWindowOuter::
    IsForWindowDotPrint::No, aPrintPreviewCallback=..., aError=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#6  0x0000007fbc7eb714 in mozilla::dom::CanonicalBrowsingContext::Print
    (this=this@entry=0x7f88b85870, aPrintSettings=0x7e2c255760, aRv=...)
    at cross/aarch64-meego-linux-gnu/include/c++/8.3.0/bits/std_function.h:402
#7  0x0000007fbab82f08 in mozilla::dom::CanonicalBrowsingContext_Binding::print
    (args=..., void_self=0x7f88b85870, obj=..., cx_=0x7f881df460)
    at BrowsingContextBinding.cpp:4674
#8  mozilla::dom::CanonicalBrowsingContext_Binding::print_promiseWrapper
    (cx=0x7f881df460, obj=..., void_self=0x7f88b85870, args=...)
    at BrowsingContextBinding.cpp:4688
[...]
#34 0x0000007fbd16635c in js::jit::MaybeEnterJit (cx=0x7f881df460, state=...)
    at js/src/jit/Jit.cpp:207
#35 0x0000007f882b8211 in ?? ()
Backtrace stopped: Cannot access memory at address 0x228ea24c1e0c
(gdb) 
The thing I find interesting about this is that it's entering the nsPrintSettingsQt version of the SetupSilentPrinting() method. That's Sailfish-specific code coming from patch 002 "Bring back Qt Layer" and so worth taking a look at.

Unfortunately after looking through it carefully the nsPrintSettingsQt implementation doesn't yield any secrets. The SetupSilentPrinting() method is essentially empty, which isn't out of line with the GTK or default implementations. I don't see anything else in there that shouts "open new window"; it just looks like a largely passive class for capturing settings.

Nevertheless this callstack can still be useful for us. I notice that, even though the app is still stuck on this SetupSilentPrinting() breakpoint, the new blank window has already opened — hanging in a state of suspended animation — on my phone. We can also see the call to CanonicalBrowsingContext::Print() is item six in the stack.

That means that the trigger for opening the window must be somewhere between these two points in the callstack. My next task will be to work my way through all of them to see if one of them could be the culprit. Six methods to work through isn't too many. Here's a slightly cleaner version of the stack to work with:
  1. nsPrintSettingsQt::SetupSilentPrinting() file nsPrintSettingsQt.cpp line 383
  2. nsPrintJob::DoCommonPrint() file nsPrintJob.cpp line 768
  3. nsPrintJob::CommonPrint() file nsPrintJob.cpp line 488
  4. nsPrintJob::Print() file nsPrintJob.cpp line 824
  5. nsDocumentViewer::Print() file nsDocumentViewer.cpp line 2930
  6. nsGlobalWindowOuter::Print() file nsGlobalWindowOuter.cpp line 5412
  7. CanonicalBrowsingContext::Print() file CanonicalBrowsingContext.cpp line 682
Inside the nsGlobalWindowOuter::Print() method, between the start of the method and the call to nsDocumentViewer::Print() on line 5412, I see the following bit of code:
      aError = OpenInternal(u""_ns, u""_ns, u""_ns,
                            false,             // aDialog
                            false,             // aContentModal
                            true,              // aCalledNoScript
                            false,             // aDoJSFixups
                            true,              // aNavigate
                            nullptr, nullptr,  // No args
                            nullptr,           // aLoadState
                            false,             // aForceNoOpener
                            printKind, getter_AddRefs(bc));
I'm wondering whether that might be the opening of a window; all the signs are that it is. I've placed a breakpoint on nsGlobalWindowOuter::Print() and plan to step through the method to this point to try to find out.

As I step through, the moment I step over this OpenInternal() call, the window opens in the browser. The printKind parameter is set to PrintKind::InternalPrint which makes me think that the window should be hidden, or something to that effect. Here's the debugging step-through for anyone interested:
Thread 8 "GeckoWorkerThre" hit Breakpoint 2,
    nsGlobalWindowOuter::Print(nsIPrintSettings*, nsIWebProgressListener*,
    nsIDocShell*, nsGlobalWindowOuter::IsPreview,
    nsGlobalWindowOuter::IsForWindowDotPrint, std::function<void
    (mozilla::dom::PrintPreviewResultInfo const&)>&&, mozilla::ErrorResult&) (
    this=this@entry=0x7f888d23e0,
    aPrintSettings=aPrintSettings@entry=0x7e2f2f0ce0,
    aListener=aListener@entry=0x7ea013c4a0, 
    aDocShellToCloneInto=aDocShellToCloneInto@entry=0x0,
    aIsPreview=aIsPreview@entry=nsGlobalWindowOuter::IsPreview::No, 
    aForWindowDotPrint=aForWindowDotPrint@entry=nsGlobalWindowOuter::
    IsForWindowDotPrint::No, aPrintPreviewCallback=..., aError=...)
    at dom/base/nsGlobalWindowOuter.cpp:5258
5258        PrintPreviewResolver&& aPrintPreviewCallback, ErrorResult& aError) {
(gdb) n
[LWP 8816 exited]
5261          do_GetService("@mozilla.org/gfx/printsettings-service;1");
(gdb) n
867     ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h: No such file or directory.
(gdb) n
5268      nsCOMPtr<nsIPrintSettings> ps = aPrintSettings;
(gdb) n
867     ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h: No such file or directory.
(gdb) n
5274      RefPtr<Document> docToPrint = mDoc;
(gdb) n
5280      RefPtr<BrowsingContext> sourceBC = docToPrint->GetBrowsingContext();
(gdb) n
5287      nsAutoSyncOperation sync(docToPrint, SyncOperationBehavior::eAllowInput);
(gdb) n
5288      AutoModalState modalState(*this);
(gdb) n
5290      nsCOMPtr<nsIContentViewer> cv;
(gdb) p modalState
$1 = {mModalStateWin = {mRawPtr = 0x7f888d23e0}}
(gdb) n
5291      RefPtr<BrowsingContext> bc;
(gdb) n
5292      bool hasPrintCallbacks = false;
(gdb) n
5293      if (docToPrint->IsStaticDocument() &&
(gdb) n
5320        if (aDocShellToCloneInto) {
(gdb) p aDocShellToCloneInto
$2 = (nsIDocShell *) 0x0
(gdb) n
5325          AutoNoJSAPI nojsapi;
(gdb) n
5326          auto printKind = aForWindowDotPrint == IsForWindowDotPrint::Yes
(gdb) n
5329          aError = OpenInternal(u""_ns, u""_ns, u""_ns,
(gdb) p printKind
$3 = nsGlobalWindowOuter::PrintKind::InternalPrint
(gdb) n
[LWP 8736 exited]
[LWP 8737 exited]
[LWP 8711 exited]
[New LWP 9166]
[New LWP 9167]
5329          aError = OpenInternal(u""_ns, u""_ns, u""_ns,
(gdb) p aError
$4 = (mozilla::ErrorResult &) @0x7f9f3d8dc0: {<mozilla::binding_danger::
    TErrorResult<mozilla::binding_danger::AssertAndSuppressCleanupPolicy>> =
    {mResult = nsresult::NS_OK, mExtra = {mMessage = 0x41e,
    mJSException = {asBits_ = 1054}, mDOMExceptionInfo = 0x41e}},
    <No data fields>}
(gdb) 
Looking through the nsGlobalWindowOuter code eventually leads me to the nsWindowWatcher::OpenWindowInternal() method. There's a lot happening in this method. About halfway through there are some comments about visibility, which pique my interest because I'm wondering whether the window that's opening has a visibility flag which is either not being set properly, or being ignored.

But I've reached the end of my mental capacity today, it's time for bed. So I'll have to come back to this tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
12 Dec 2023 : Day 105 #
We're back looking at printing today after about a week of work now. Yesterday we looked at the parent-child structure of the PBrowser interface and came to the conclusion that we should be calling CanonicalBrowsingContext::Print() in the DownloadPDFServer code rather than the nsIWebBrowserPrint::Print() call that's there now. It'd be a good thing to try at least. Our route in to the call is through windowRef which is a Ci.nsIDOMWindow. So the question I want to answer today is: "given a Ci.nsIDOMWindow, how do a find my way to calling something in a CanonicalBrowsingContext object?"

It's a pretty simple question, but as is often the case with object-oriented code, finding a route from one to the other is not always obvious. It's obfuscated by the class hierarchy and child-parent message-passing, and made even more complex by gecko's approach to reflection using nsISupports runtime type discovery.

I'll need to look through this code carefully again.

[...]

I've been poring over the code for some time now and reading around the BrowsingContext documentation, but still not made any breakthrough headway. The one thing I did find was that it's possible to collect the browsing context from the DOM window:
      browsingContext = BrowsingContext.getFromWindow(domWin);
This was taken from Prompt.jsm which executes some code similar to the above. That's making use of the following static call in the BrowsingContext.h header:
  static already_AddRefed<BrowsingContext> GetFromWindow(
      WindowProxyHolder& aProxy);
As far as I can tell BrowsingContext isn't pulled into the JavaScript as any sort of prototype or object. It's just there already.

There's also this potentially useful static function for getting the CanonicalBrowsingContext from an ID:
  static already_AddRefed<CanonicalBrowsingContext> Get(uint64_t aId);
Unfortunately I'm not really sure where I'm supposed to get the ID from. It's added into a static hash table and if I had a BrowsingContext already I could get the ID, but without that first, it's not clear where I might extract it from.

So I'm going to try going down the BrowsingContext.getFromWindow() route. If I'm going to use this I've already got a good idea about where it should go, which is in the DownloadPDFSaver code.

So I've added some debug prints to the DownloadPDFSaver in DownloadCore.jsm to try to figure out if we can extract the BrowsingContext from the windowRef using this method. Here's what I added:
    dump("PRINT: win: " + win + "\n");
    this._webBrowserPrint = win.getInterface(Ci.nsIWebBrowserPrint);
    dump("PRINT: webBrowserPrint: " + this._webBrowserPrint + "\n");
    this._browsingContext = BrowsingContext.getFromWindow(win)
    dump("PRINT: BrowsingContext: " + this._browsingContext + "\n");
I've not changed any of the functional code though, so I'm not expecting this to fix the segfault; this is just to extract some hints. Here's what it outputs to the console when I try running this and selecting the "Save web page as PDF" option:
PRINT: win: [object Window]
PRINT: webBrowserPrint: [xpconnect wrapped nsIWebBrowserPrint]
PRINT: BrowsingContext: [object CanonicalBrowsingContext]
Segmentation fault (core dumped)
This is... well it's pretty exciting for me if I'm honest. That last print output suggests that it's successfully extracted some kind of CanonicalBrowsingContext object, which is exactly what we're after. So the next step is to call the print() method on it to see what happens.

Having added that code and selected the option to safe as PDF, there's now some rather strange and dubious looking output sent to the console. It's the same output that we see when the browser starts:
PRINT: win: [object Window]
PRINT: webBrowserPrint: [xpconnect wrapped nsIWebBrowserPrint]
PRINT: BrowsingContext: [object CanonicalBrowsingContext]
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
[...]
On the other hand, looking into the downloads folder, there's a new PDF output that looks encouragingly non-empty. I wonder what it will contain?
$ cd Downloads/
$ ls -l
total 4624
-rw------- 1 defaultuser defaultuser       0 Dec  7 21:38 'Jolla(10).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  8 23:36 'Jolla(11).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  8 23:47 'Jolla(12).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec 10 21:50 'Jolla(13).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec 10 21:53 'Jolla(14).pdf'
-rw-rw-r-- 1 defaultuser defaultuser 4673253 Dec 10 21:57 'Jolla(15).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  5 22:17 'Jolla(2).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  5 22:17 'Jolla(3).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  6 08:23 'Jolla(4).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  6 08:27 'Jolla(5).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  6 22:32 'Jolla(6).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  6 23:05 'Jolla(7).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  7 19:23 'Jolla(8).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  7 21:24 'Jolla(9).pdf'
-rw------- 1 defaultuser defaultuser       0 Dec  5 22:16  Jolla.pdf
That 4673253 byte file that's been output must have something interesting inside it, surely?
 
Four screenshots showing the printing process: first the Jolla webpage in the browser; second a blank window during printing; third the output PDF with similar graphics on; fourth the browser menu with saving to PDF disabled

Well as you can see the actual PDF print out is a bit rubbish, but I don't think that's anything I've done: that's just the inherent difficulty of providing decent PDF printouts of dynamic webpages. This is actually exactly the PDF output we need.

I admit I'm pretty happy about this. All that reading of documentation and scattered code seems to have paid off. There is a slight problem though, in that the process seems to open a new window when the printing takes place. That's not ideal and will have to be fixed.

Also, having printed out a page, the "Save web page as PDF" option is now completely greyed out in the user interface. It's not possible to print another page. That feels more like the consequence of a promise not resolving, or some completion message not being received, rather than anything more intrinsic though.

I've done some brief testing of other functionality in the browser. Nothing else seems to be broken and the browser didn't crash. So that's also rather encouraging.

I'm going to call it a night: finish on a high. There's still plenty of work to be done with the PDF printing: prevent the extra window from opening; ensure the option to print is restored once the printing is complete. But those will have to be for tomorrow. The fact saving to PDF is working is a win already.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
11 Dec 2023 : Day 104 #
We're back on printing again today. Yesterday we tracked down the nsGlobalWindowOuter::Print() method which appears to be responsible for cloning the document ready for printing. That, in turn, appears to be called by BrowserChild::RecvPrint(). And this method deserves some explanation.

We've discussed the gecko IPC mechanism in previous posts, in fact way back on Day 7. If you read those then... well, first, kudos for keeping up! But second you'll recall there are ipdl files which define an interface and that the build process generates parent and child interfaces from these that allow message passing from one to the other.

Whenever we see a SendName() or RecvName() method like this RecvPrint() method we have here, it's a good sign that this is related to this message passing. The Send method is called on one thread and the message, which is a bit like a remote method call, triggers the Recv method to be called on a (potentially different) thread. That's my rather basic understanding of it, at any rate.

The message, along with the sending and receiving mechanisms, are generated by the build process in the form of a file that has a P at the start of the name. I'm not exactly sure what the P stands for. The sender tends to be called the parent actor, and the receiver the child actor. That's why we're seeing RecvPrint() in the BrowserChild class.

If we look at the class definition for BrowserChild in the header file it looks like this:
/**
 * BrowserChild implements the child actor part of the PBrowser protocol. See
 * PBrowser for more information.
 */
class BrowserChild final : public nsMessageManagerScriptExecutor,
                           public ipc::MessageManagerCallback,
                           public PBrowserChild,
                           public nsIWebBrowserChrome,
                           public nsIEmbeddingSiteWindow,
                           public nsIWebBrowserChromeFocus,
                           public nsIInterfaceRequestor,
                           public nsIWindowProvider,
                           public nsSupportsWeakReference,
                           public nsIBrowserChild,
                           public nsIObserver,
                           public nsIWebProgressListener2,
                           public TabContext,
                           public nsITooltipListener,
                           public mozilla::ipc::IShmemAllocator {
[...]
Notice how it inherits from the PBrowserChild class. That's the child actor class interface that's autogenerated from the PBrowser.ipdl file. If we look in the PBrowser.ipdl file we can see the definition of the Print() method we're interested in:
    /**
     * Tell the child to print the current page with the given settings.
     *
     * @param aBrowsingContext the browsing context to print.
     * @param aPrintData the serialized settings to print with
     */
    async Print(MaybeDiscardedBrowsingContext aBC, PrintData aPrintData);
That's not quite the end of it though, because — to round things off — there's also a PBrowserParent class that's been generated from the IPDL file as well. Like the child class it has both a header and a source file. In the source file we can find the definition for the SendPrint() method like this:
auto PBrowserParent::SendPrint(
        const MaybeDiscardedBrowsingContext& aBC,
        const PrintData& aPrintData) -> bool
{
All of this is inherited by the BrowserParent class in the BrowserParent.h file like this:
/**
 * BrowserParent implements the parent actor part of the PBrowser protocol. See
 * PBrowser for more information.
 */
class BrowserParent final : public PBrowserParent,
                            public nsIDOMEventListener,
                            public nsIAuthPromptProvider,
                            public nsSupportsWeakReference,
                            public TabContext,
                            public LiveResizeListener {
[...]
This class doesn't override the SendPrint() method but it does inherit it. So there's quite a structure and class hierarchy that's built up from these IPDL files.

The key takeaway for what we're trying to achieve is that if we want to trigger the nsGlobalWindowOuter::Print() method, we're going to need to call the BrowserParent::SendPrint() method from somewhere. Checking through the code it's clear that nothing is inheriting from BrowserParent but there are plenty of places which give access to the BrowserParent interface.

For example the BrowserBridgeParent class has this method:
  BrowserParent* GetBrowserParent() { return mBrowserParent; }
There are quite a few other similar method scattered around the place and I honestly don't know which I'm supposed to end up using.
$ grep -rIn "BrowserParent\* Get" * --include="*.h"
layout/base/PresShell.h:181:
    static dom::BrowserParent* GetCapturingRemoteTarget() {
docshell/base/CanonicalBrowsingContext.h:222:
    BrowserParent* GetBrowserParent() const;
dom/base/nsFrameLoader.h:338:
    BrowserParent* GetBrowserParent() const;
dom/base/PointerLockManager.h:38:
    static dom::BrowserParent* GetLockedRemoteTarget();
dom/base/nsContentUtils.h:501:
    static mozilla::dom::BrowserParent* GetCommonBrowserParentAncestor(
dom/ipc/BrowserHost.h:54:
    BrowserParent* GetActor() { return mRoot; }
dom/ipc/WindowGlobalParent.h:105:
    BrowserParent* GetBrowserParent();
dom/ipc/BrowserParent.h:114:
    static BrowserParent* GetFocused();
dom/ipc/BrowserParent.h:116:
    static BrowserParent* GetLastMouseRemoteTarget();
dom/ipc/BrowserParent.h:118:
    static BrowserParent* GetFrom(nsFrameLoader* aFrameLoader);
dom/ipc/BrowserParent.h:120:
    static BrowserParent* GetFrom(PBrowserParent* aBrowserParent);
dom/ipc/BrowserParent.h:122:
    static BrowserParent* GetFrom(nsIContent* aContent);
dom/ipc/BrowserParent.h:124:
    static BrowserParent* GetBrowserParentFromLayersId(
dom/ipc/BrowserBridgeParent.h:43:
    BrowserParent* GetBrowserParent() { return mBrowserParent; }
dom/events/TextComposition.h:82:
    BrowserParent* GetBrowserParent() const { return mBrowserParent; }
dom/events/IMEStateManager.h:55:
    static BrowserParent* GetActiveBrowserParent() {
dom/events/PointerEventHandler.h:95:
    static dom::BrowserParent* GetPointerCapturingRemoteTarget(
dom/events/EventStateManager.h:1050:
    dom::BrowserParent* GetCrossProcessTarget();
Hopefully this will all become clear in due course.

I'm also interested to discover that the CanonicalBrowsingContext class has a Print() method that calls SendPrint():
already_AddRefed<Promise> CanonicalBrowsingContext::Print(
    nsIPrintSettings* aPrintSettings, ErrorResult& aRv) {
  RefPtr<Promise> promise = Promise::Create(GetIncumbentGlobal(), aRv);
  if (NS_WARN_IF(aRv.Failed())) {
    return promise.forget();
  }
[...]

  auto* browserParent = GetBrowserParent();
  if (NS_WARN_IF(!browserParent)) {
    promise->MaybeReject(ErrorResult(NS_ERROR_FAILURE));
    return promise.forget();
  }

  RefPtr<embedding::PrintingParent> printingParent =
      browserParent->Manager()->GetPrintingParent();

  embedding::PrintData printData;
  nsresult rv = printingParent->SerializeAndEnsureRemotePrintJob(
      aPrintSettings, listener, nullptr, &printData);
  if (NS_WARN_IF(NS_FAILED(rv))) {
    promise->MaybeReject(ErrorResult(rv));
    return promise.forget();
  }

  if (NS_WARN_IF(!browserParent->SendPrint(this, printData))) {
    promise->MaybeReject(ErrorResult(NS_ERROR_FAILURE));
  }
  return promise.forget();
#endif
}
Maybe we're going to either have to call this or do something similar. Let's head back to the code we're already using to do the printing. Recall that this lives in DownloadCore.jsm in the form of our newly added DownloadPDFSaver class.

Now that I'm comparing against the two there are some aspects that I think it's worth taking note of. The main input to the nsIWebBrowserPrint::Print() method that we're currently calling in DownloadPDFSaver takes in an object that implements the nsIPrintSettings interface and returns a promise. From the code I listed above for CanonicalBrowsingContext::Print() you'll notice that this also takes in an object that implements the nsIPrintSettings interface and returns a promise.

So calling switching to call the latter may require only minimal changes. The question then is where to get the CanonicalBrowsingContext from. The method we're currently using is hanging off of a windowRef:
    let win = this.download.source.windowRef.get();
[...]
    this._webBrowserPrint = win.getInterface(Ci.nsIWebBrowserPrint);
Because it's JavaScript from this there's absolutely no indication of what type win is of course. I'm going to need to know in order to make progress.

Digging back through the source in DownloadCore.jsm I can see that windowRef gets set to something that implements Ci.nsIDOMWindow:
DownloadSource.fromSerializable = function(aSerializable) {
[...]
  } else if (aSerializable instanceof Ci.nsIDOMWindow) {
    source.url = aSerializable.location.href;
    source.isPrivate = PrivateBrowsingUtils.isContentWindowPrivate(
      aSerializable
    );
    source.windowRef = Cu.getWeakReference(aSerializable);
[...]
I've spent a lot of time analysing the code today without actually making any changes at all to the code itself. I'm keen to actually try some things out in the JavaScript to see whether I can extract something useful from the windowRef that will allow us to call the SendPrint() method that we're so interested in. But that will have to wait for tomorrow now.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
10 Dec 2023 : Day 103 #
Yesterday and over the last few days we've been looking into printing. There were some errors that needed fixing in the JavaScript and now we're digging down into the C++ code. Something has changed in the neath and the print code is expecting the page to have been "cloned" when it hasn't been. That's where we've got up to. But figuring out where the cloning is supposed to happen is proving to be difficult. It's been made harder by the fact that placing breakpoints on the print code causes the debugger to crash — apparently a bug in our version of gdb — and so we don't have anything to compare against.

So my plan is to read through the print code once again. The answer must be in there somewhere.

One thing I did manage to establish using the debugger is that the nsPrintObject::InitAsRootObject() method is being called in the ESR 91 code. That could turn out to be useful because although not in the ESR 91 version, in the ESR 87 version this is where the clone appears to take place:
nsresult nsPrintObject::InitAsRootObject(nsIDocShell* aDocShell, Document* aDoc,
                                         bool aForPrintPreview) {
[...]
  mDocument = aDoc->CreateStaticClone(mDocShell);
[...]
  return NS_OK;
}

If we look at the history of the file it may give some hints about where this clone was moved to. I need to do a git blame on a line that I can see has changed between the two.

It turns out that git blame isn't too helpful because the change appears to have mostly deleted lines rather than added or changed them. Unfortunately git blame simply doesn't work very well for deleted lines. I want to use what I think of as reverse git blame, which is using git log with the -S parameter. This searchers the diff of every commit for a particular string, which will include deleted items.

Here's what comes up as the first hit:
$ git log -1 -S "CreateStaticClone" layout/printing/nsPrintObject.cpp
commit 044b3c4332134ac0c94d4916458f9930d5091c6a
Author: Emilio Cobos Álvarez <emilio@crisal.io>
Date:   Tue Aug 25 17:45:12 2020 +0000

    Bug 1636728 - Centralize printing entry points in nsGlobalWindowOuter, and
    move cloning out of nsPrintJob. r=jwatt,geckoview-reviewers,smaug,agi
    
    This centralizes our print and preview setup in nsGlobalWindowOuter so
    that we never re-clone a clone, and so that we reuse the window.open()
    codepath to create the browsing context to clone into.
    
    For window.print, for both old print dialog / silent printing and new
    print preview UI, we now create a hidden browser (as in with visibility:
    collapse, which takes no space but still gets a layout box).
    
     * In the modern UI case, this browser is swapped with the actual print
       preview clone, and the UI takes care of removing the browser.
    
     * In the print dialog / silent printing case, the printing code calls
       window.close() from nsDocumentViewer::OnDonePrinting().
    
     * We don't need to care about the old print preview UI for this case
       because it can't be open from window.print().
    
    We need to fall back to an actual window when there's no
    nsIBrowserDOMWindow around for WPT print tests and the like, which don't
    have one. That seems fine, we could special-case this code path more if
    needed but it doesn't seem worth it.
    
    Differential Revision: https://phabricator.services.mozilla.com/D87063
"Move cloning out of nsPrintJob". That sounds relevant. A lot of the changes are happening inside nsGlobalWindowOuter.cpp as the commit message suggests. And sure enough, right in there is a brand new nsGlobalWindowOuter::Print() method which now performs the cloning. It's a very long method which I'll have to read through in full, but here's the active ingredient in relation to cloning:
Nullable<WindowProxyHolder> nsGlobalWindowOuter::Print(
    nsIPrintSettings* aPrintSettings, nsIWebProgressListener* aListener,
    nsIDocShell* aDocShellToCloneInto, IsPreview aIsPreview,
    IsForWindowDotPrint aForWindowDotPrint,
    PrintPreviewResolver&& aPrintPreviewCallback, ErrorResult& aError) {
#ifdef NS_PRINTING
[...]
  if (docToPrint->IsStaticDocument() &&
      (aIsPreview == IsPreview::Yes ||
       StaticPrefs::print_tab_modal_enabled())) {
[...]
  } else {
[...]
    nsAutoScriptBlocker blockScripts;
    RefPtr<Document> clone = docToPrint->CreateStaticClone(
        cloneDocShell, cv, ps, &hasPrintCallbacks);
    if (!clone) {
      aError.ThrowNotSupportedError("Clone operation for printing failed");
      return nullptr;
    }
  }
[...]
  return WindowProxyHolder(std::move(bc));
#else
  return nullptr;
#endif  // NS_PRINTING
}
The next obvious thing we should check is whether this method is getting called at all. My guess is not, in which case it would be nice to know how we're supposed to be calling it, and maybe we can figure that out by comparing the diff in the above commit with a callstack we get from the debugger.

First things first: does it get called?
$ EMBED_CONSOLE=1 gdb sailfish-browser
(gdb) b nsGlobalWindowOuter::Print
(gdb) info break
Num Type       Disp Enb Address            What
1   breakpoint keep y   0x0000007fba96f28c in nsGlobalWindowOuter::Print
                                           (nsIPrintSettings*,
                                           nsIWebProgressListener*, nsIDocShell*,
                                           nsGlobalWindowOuter::IsPreview,
                                           nsGlobalWindowOuter::IsForWindowDotPrint,
                                           std::function<void (mozilla::dom::
                                           PrintPreviewResultInfo const&)>&&,
                                           mozilla::ErrorResult&) 
                                           at dom/base/nsGlobalWindowOuter.cpp:5258
(gdb) r
[...]
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1018]
nsPrintJob::FindFocusedDocument (this=this@entry=0x7e31d9f2e0,
    aDoc=aDoc@entry=0x7f89079430)
    at layout/printing/nsPrintJob.cpp:2411
2411      nsPIDOMWindowOuter* window = aDoc->GetOriginalDocument()->GetWindow();
(gdb) 
So that's a segfault before a breakpoint: the code isn't being executed. The next step then is to try to find out where it gets executed in the changes of the upstream commit.

It looks like it might happen in the BrowserChild::RecvPrint() method:
mozilla::ipc::IPCResult BrowserChild::RecvPrint(const uint64_t& aOuterWindowID,
                                                const PrintData& aPrintData);
This goes on to call nsGlobalWindowOuter::Print() like this:
    outerWindow->Print(printSettings,
                       /* aListener = */ nullptr,
                       /* aWindowToCloneInto = */ nullptr,
                       nsGlobalWindowOuter::IsPreview::No,
                       nsGlobalWindowOuter::IsForWindowDotPrint::No,
                       /* aPrintPreviewCallback = */ nullptr, rv);
This is also not getting executed and I'm wondering whether maybe it should be.

I have to be honest, I'm not really sure which direction this is going to take tomorrow. What I am fairly sure about is that spending a night asleep won't make things any less clear! So until tomorrow it is.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
9 Dec 2023 : Day 102 #
Over the last couple of days I've been looking into the printing stack, so that the new browser version will support the "Print to PDF" feature of sailfish-browser. There have been some changes to the print setup pipeline and the latest situation is that the page appears to need to be cloned before it can be printed, but currently that's not happening.

Yesterday I came to the conclusion however that the code is similar enough to that in ESR 78 that I might be able to figure out the parts that are missing by stepping through both side-by-side in the debugger. So that's my task for today.

I've set up both phones with an ssh connection into both from my laptop. I'm debugging ESR 78 on one and ESR 91 on the other using gdb. I'm particularly interested in four points in the code which I've attached breakpoints to:
(gdb) info break
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   <PENDING>  Document::CreateStaticClone
2       breakpoint     keep y   <PENDING>  Document::CloneDocHelper
3       breakpoint     keep y   <PENDING>  nsPrintObject::InitAsRootObject
4       breakpoint     keep y   <PENDING>  nsPrintJob::FindFocusedDOMWindow
Unfortunately debugging the regular ESR 78 install proves to be harder than I'd expected — which is odd because it's not the first time I've done it — as it just crashes out when attaching the breakpoint.
(gdb) b Document::CreateStaticClone
dwarf2read.c:10473: internal-error: process_die_scope::process_die_scope
    (die_info*, dwarf2_cu*): Assertion `!m_die->in_process' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
I've tried reinstalling the browser and debug packages but that didn't help. I've rebooted my phone and that didn't help. I've even tried debugging on a different phone. None of these provides a solution. This feels like it's been a bit of a wild goose chase.

It does at least only seem to apply to certain breakpoints:
Thread 1 "sailfish-browse" received signal SIGINT, Interrupt.
0x0000007fef978718 in pthread_cond_wait () from /lib64/libpthread.so.0
(gdb) b nsPrintJob::FindFocusedDOMWindow
Breakpoint 1 at 0x7ff4501dd0: file layout/printing/nsPrintJob.cpp, line 2606.
(gdb) b nsPrintObject::InitAsRootObject
Breakpoint 2 at 0x7ff4508ab0: file layout/printing/nsPrintObject.cpp, line 165.
(gdb) Document::CloneDocHelper
Undefined command: "Document".  Try "help".
(gdb) b Document::CloneDocHelper
dwarf2read.c:10473: internal-error: process_die_scope::process_die_scope
    (die_info*, dwarf2_cu*): Assertion `!m_die->in_process' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y
Okay, so I'll just go with the ones that don't destroy the debugger.

I've studiously added them, carefully avoiding the destructive ones, let's see if either of these get hit.
(gdb) info break
Num Type       Disp Enb Address            What
1   breakpoint keep y   0x0000007ff4501dd0 in nsPrintJob::FindFocusedDOMWindow() const 
                                           at layout/printing/nsPrintJob.cpp:2606
2   breakpoint keep y   0x0000007ff4508ab0 in nsPrintObject::InitAsRootObject
                                           (nsIDocShell*, mozilla::dom::Document*, bool) 
                                           at layout/printing/nsPrintObject.cpp:165
One of them does hit, but it's the uninteresting one; the one I knew was going to hit.
Thread 10 "GeckoWorkerThre" hit Breakpoint 1, nsPrintJob::FindFocusedDOMWindow
    (this=this@entry=0x7fb9491bc0)
    at layout/printing/nsPrintJob.cpp:2606
2606	already_AddRefed<nsPIDOMWindowOuter>
        nsPrintJob::FindFocusedDOMWindow() const {
When I re-run with just the a single breakpoint on nsPrintObject::InitAsRootObject() I find it does hit, but frustratingly id then triggers the same problem as before, preventing me from getting a backtrace:
Thread 10 "GeckoWorkerThre" hit Breakpoint 1, nsPrintObject::InitAsRootObject
    (this=0x7fb872cee0, aDocShell=0x7fb8ceb560, aDoc=aDoc@entry=
dwarf2read.c:10473: internal-error: process_die_scope::process_die_scope
    (die_info*, dwarf2_cu*): Assertion `!m_die->in_process' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
So I've gone back to debugging ESR 91 without reference to ESR 78. There the breakpoints are at least working without causing the debugger to crash.

I notice that nsPrintJob::Initialize() is being hit. This is important because in ESR 78 I think this is where the clone is happening (although due to my failed debugging experience I don't know for sure). I have an ESR 91 backtrace:
Thread 8 "GeckoWorkerThre" hit Breakpoint 7, nsPrintJob::Initialize
    (this=this@entry=0x7f89255bb0, aDocViewerPrint=0x7f891b05e0,
    aDocShell=0x7f88630188, aOriginalDoc=0x7f891a24d0, aScreenDPI=288) at
    layout/printing/nsPrintJob.cpp:383
383                                     float aScreenDPI) {
(gdb) bt
#0  nsPrintJob::Initialize (this=this@entry=0x7f89255bb0,
    aDocViewerPrint=0x7f891b05e0, aDocShell=0x7f88630188,
    aOriginalDoc=0x7f891a24d0, aScreenDPI=288)
    at layout/printing/nsPrintJob.cpp:383
#1  0x0000007fbc108fb0 in nsDocumentViewer::Print (this=0x7f891b05d0,
    aPrintSettings=0x7e2cf1be70, aWebProgressListener=0x7e24b5e420)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsDeviceContext.h:88
#2  0x0000007fb9cb2e9c in _NS_InvokeByIndex () at xpcom/reflect/xptcall/md/unix/
    xptcinvoke_asm_aarch64.S:74
[...]
#32 0x0000007fbd16635c in js::jit::MaybeEnterJit (cx=0x7f88234da0, state=...)
    at js/src/jit/Jit.cpp:207
#33 0x0000007f8830db51 in ?? ()
Backtrace stopped: Cannot access memory at address 0x88d65bb3f780
(gdb) 
I might come back to this debugging, but right now I need a break from it. So I'll spend a bit of time following up the differences between ESR 78 and ESR 91 through changesets instead.

This has been a frustrating experience today.

But despite all this failure I do feel like I'm gradually building up a clearer picture of how things are supposed to work. With any luck that means I'm getting closer to figuring out what the underlying issue is.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
8 Dec 2023 : Day 101 #
It's bright and early, the build completed successfully, it's time to continue with the PDF printing work I started yesterday. First order of the day is to check whether the changes I made yesterday have had the desired effect.

You'll recall I'd restored the DownloadPDFSaver execution path, code which I eventually plan to move into EmbedliteDownloadManager; and that I'd then removed the [noscript] annotations from the print() method of the nsIWebBrowserPrint interface.

Those annotations were preventing the print() method from being called. With any luck making that small change will restore the functionality (at least, up to the next point at which it's broken). Let's see.
$ EMBED_CONSOLE=1 sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "libGLESv2_adreno.so" not found
library "eglSubDriverAndroid.so" not found
greHome from GRE_HOME:/usr/bin
[...]
Segmentation fault (core dumped)
The error has gone, but it's been replaced by something more dramatic. The PDF print option now generates no output except the consequences of a segfault. That's okay though, because we can use the debugger to find out where the segfault is occurring. And this isn't totally unexpected: as I mentioned previously, the changes I made to get the print code to build were pretty barbaric and I always expected to have to revisit them.
$ EMBED_CONSOLE=1 gdb sailfish-browser
[...]
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 23729]
nsPrintJob::FindFocusedDocument (this=this@entry=0x7f8a637e20,
    aDoc=aDoc@entry=0x7f890638e0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/printing/
    nsPrintJob.cpp:2411
2411      nsPIDOMWindowOuter* window = aDoc->GetOriginalDocument()->GetWindow();
(gdb) bt
#0  nsPrintJob::FindFocusedDocument (this=this@entry=0x7f8a637e20,
    aDoc=aDoc@entry=0x7f890638e0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/printing/
    nsPrintJob.cpp:2411
#1  0x0000007fbc3bac6c in nsPrintJob::DoCommonPrint
    (this=this@entry=0x7f8a637e20, aIsPrintPreview=aIsPrintPreview@entry=false, 
    aPrintSettings=aPrintSettings@entry=0x7f89bc2ff0,
    aWebProgressListener=aWebProgressListener@entry=0x7f8a9f68a0,
    aDoc=aDoc@entry=0x7f890638e0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/printing/
    nsPrintJob.cpp:548
#2  0x0000007fbc3bb718 in nsPrintJob::CommonPrint (this=this@entry=0x7f8a637e20,
    aIsPrintPreview=aIsPrintPreview@entry=false, 
    aPrintSettings=aPrintSettings@entry=0x7f89bc2ff0,
    aWebProgressListener=aWebProgressListener@entry=0x7f8a9f68a0, 
    aSourceDoc=aSourceDoc@entry=0x7f890638e0) at /usr/src/debug/
    xulrunner-qt5-91.9.1-1.aarch64/layout/printing/nsPrintJob.cpp:488
#3  0x0000007fbc3bb840 in nsPrintJob::Print (this=this@entry=0x7f8a637e20,
    aSourceDoc=<optimized out>, aPrintSettings=aPrintSettings@entry=0x7f89bc2ff0, 
    aWebProgressListener=aWebProgressListener@entry=0x7f8a9f68a0) at /usr/src/
    debug/xulrunner-qt5-91.9.1-1.aarch64/layout/printing/nsPrintJob.cpp:824
#4  0x0000007fbc108fe4 in nsDocumentViewer::Print (this=0x7f89066fe0,
    aPrintSettings=0x7f89bc2ff0, aWebProgressListener=0x7f8a9f68a0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
[...]
#35 0x0000007fbd16635c in js::jit::MaybeEnterJit (cx=0x7f881df990, state=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/js/src/jit/Jit.cpp:207
#36 0x0000007f8824c3d1 in ?? ()
Backtrace stopped: Cannot access memory at address 0x764fdb5afe82
(gdb) 
That's quite a backtrace. The error is certainly happening inside the print stack with the top of the callstack in nsPrintJob::FindFocusedDocument(). Maybe there's some difficulty figuring out which document is to be printed?

While doing this I notice that the remorse timer now opens and a file is output to disk. When I saw this I admit it got me pretty excited; it hinted at the slight possibility the error was happening after the print completed. That would have made things easier. But in fact it's just outputting empty files right now:
$ ls -l ~/Downloads/
total 60
-rw-------    1 defaultu defaultu         0 Dec  5 22:17 Jolla(2).pdf
-rw-------    1 defaultu defaultu         0 Dec  5 22:17 Jolla(3).pdf
-rw-------    1 defaultu defaultu         0 Dec  6 08:23 Jolla(4).pdf
-rw-------    1 defaultu defaultu         0 Dec  6 08:27 Jolla(5).pdf
-rw-------    1 defaultu defaultu         0 Dec  5 22:16 Jolla.pdf
The generation of these files actually happens in the JavaScript code, before any printing has been attempted, as you can see here from our DownloadPDFSaver class:
    // An empty target file must exist for the PDF printer to work correctly.
    let file = await OS.File.open(targetPath, { truncate: true });
    await file.close();
So we can't really divine much from the existence of these files after all unfortunately. It demonstrates the print code is getting called, but we knew that already from the debugger output.

So looking at the code in nsPrintJob.cpp the line that's causing the segfault is the following:
  nsPIDOMWindowOuter* window = aDoc->GetOriginalDocument()->GetWindow();
It looks like the problem here isn't really to do with printing, but rather to do with getting the Window info. We can see from the backtrace that aDoc isn't null, so presumably aDoc->GetOriginalDocument() is returning null. Here's what that method, defined in Document.h, does:
  /**
   * If this document is a static clone, this returns the original
   * document.
   */
  Document* GetOriginalDocument() const {
    MOZ_ASSERT(!mOriginalDocument || !mOriginalDocument->GetOriginalDocument());
    return mOriginalDocument;
  }
Let's check this with the debugger then.
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 16121]
nsPrintJob::FindFocusedDocument (this=this@entry=0x7f88ec2740, aDoc=aDoc@entry=0x7f89154290)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/printing/nsPrintJob.cpp:2411
2411      nsPIDOMWindowOuter* window = aDoc->GetOriginalDocument()->GetWindow();
(gdb) p aDoc
$1 = (nsPrintJob::Document *) 0x7f89154290
(gdb) p aDoc->mIsStaticDocument
$3 = false
(gdb) p aDoc->mOriginalDocument
$2 = {mRawPtr = 0x0}
Definitely null. In theory this will only be non-null if the document is a static clone, flagged by mIsStaticDocument as you can see in the comment here:
  // If mIsStaticDocument is true, mOriginalDocument points to the original
  // document.
  RefPtr<Document> mOriginalDocument;
There's also some useful info in the comments for the method that returns mStaticDocument. It looks like a static clone of the document should be being made as part of the print process:
  /**
   * Returns true if this document is a static clone of a normal document.
   *
   * We create static clones for print preview and printing (possibly other
   * things in future).
   *
   * Note that static documents are also "loaded as data" (if this method
   * returns true, IsLoadedAsData() will also return true).
   */
  bool IsStaticDocument() const { return mIsStaticDocument; }
Either this isn't happening, or the process is somehow broken.

The process for all this happening is through a call to Document::CreateStaticClone(). This sets the mCreatingStaticClone flag to true. This flag then gets transferred over to mIsStaticDocument in a call to Document::CloneDocHelper().

One place where I notice this CreateStaticClone() method is called is in BrowserChild.cpp where it does seem to be in relation to printing:
mozilla::ipc::IPCResult BrowserChild::RecvCloneDocumentTreeIntoSelf(
    const MaybeDiscarded<BrowsingContext>& aSourceBC,
    const embedding::PrintData& aPrintData) {
#ifdef NS_PRINTING
[...]
  printSettingsSvc->DeserializeToPrintSettings(aPrintData, printSettings);

  RefPtr<Document> clone;
  {
    AutoPrintEventDispatcher dispatcher(*sourceDocument);
    nsAutoScriptBlocker scriptBlocker;
    bool hasInProcessCallbacks = false;
    clone = sourceDocument->CreateStaticClone(ourDocShell, cv, printSettings,
                                              &hasInProcessCallbacks);
    if (NS_WARN_IF(!clone)) {
      return IPC_OK();
    }
  }

  return RecvUpdateRemotePrintSettings(aPrintData);
#endif
  return IPC_OK();
}
It might be useful to know if this is being called at any point during our print process. I've attached a breakpoint to it and we can see. Just in case, I've also added a breakpoint to RecvCloneDocumentTreeIntoSelf().

When printing, neither breakpoint is hit.

So I'm now visually comparing the execution flow in nsPrintJob::DoCommonPrint() (this is the second method in the call stack) between ESR 78 and ESR 91. The code has changed quite a bit. The main changes seem to have happened in this commit:
$ git log -1 ada89eea81989
commit ada89eea819891081b040ab527fdea5752e77e89
Author: Bob Owen <bobowencode@gmail.com>
Date:   Mon Aug 3 14:23:56 2020 +0000

    Bug 1653334 part 2: Cache the selection ranges on subdocuments as we build
    the nsPrintObject tree. r=jwatt
    
    This also refactors the selection printing code, so that as we build the tree we
    record which nsPrintObject should be used if printing a Selection is chosen.
    
    Differential Revision: https://phabricator.services.mozilla.com/D85600
Rather unexpectedly though, the code that's causing the segfault is also clearly present in the ESR 78 code, albeit in a different method. That's really helpful since it means we can compare the execution flow between the two, find the parts that are being missed in ESR 91 and (hopefully) figure out why. This makes me feel a whole lot more confident about finding a solution for this.

But that's it for today. Tomorrow I'll start this comparison between the ESR 78 and ESR 91 print execution flow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
7 Dec 2023 : Day 100 #
Good morning! It's Day 100 of my gecko dev diary. As I mentioned yesterday, that feels like it might be a milestone, although it's Day 128 that I'm holding out for (which, if things continue as they are, will be on 2nd January).

To celebrate this centenary of sorts, sailfish-gecko artist-in-residence Thigg has generated another wonderful image for us! This is gecko accepting pull requests: look at the gecko absorbing all those wholesome changes! Thank you Thigg for another wonderful and colourful way to brighten up the blog!
 
A gecko lizard made up of light floating in the air. Below a crowd of people throw papers into the air which are sucked up into the gecko. It's as strange and wonderful as it sounds.

Yesterday we already started looking at the printing pipeline and discovered that the trigger we've been using up to now, which involves calling Downloads.createDownload() with a saver: "pdf" parameter is no longer going to work in ESR 91. That entire execution path has been removed.

We could patch it back in again, but I'd expect there to be a better way available now which we should be using instead.

In ESR 78 the code to create a savable PDF file of a page eventually ends up creating a DownloadPDFSaver which can be found in DownloadCore.jsm. This class has been removed, but that doesn't mean we can't use it for inspiration.

A key part of this class, the bit that triggers the printing process, appears to be this:
    this._webBrowserPrint = win.getInterface(Ci.nsIWebBrowserPrint);

    try {
      await new Promise((resolve, reject) => {
        this._webBrowserPrint.print(printSettings, {
          onStateChange(webProgress, request, stateFlags, status) {
            if (stateFlags & Ci.nsIWebProgressListener.STATE_STOP) {
              if (!Components.isSuccessCode(status)) {
                reject(new DownloadError({ result: status, inferCause: true }));
              } else {
                resolve();
              }
            }
          },
[...]
I'm encouraged by the fact that, although the DownloadPDFSaver class has been removed, the nsIWebBrowserPrint interface is alive and well and living in nsIWebBrowserPrint.idl. There's also a C++ implementation of it.

But rather than try to reverse engineer this interface, I'm going to try to figure out how this gets called in existing Firefox implementations. That might provide us some inspiration for how we're supposed to do it here.

Digging through all of the files, the most encouraging seems to be gecko-dev/toolkit/components/printing/content/print.js which contains plenty of references to PDF generation. So my task for today will be to read through this file and try to digest it.

[...]

Having read through print.js in some detail I'm not sure this is where I need to be. It provides the print dialogue box and while it does allow the page to be printed out as a PDF, this is all a bit heavyweight what we need.

I've also been carefully reading through the DownloadCore code and in particular the DownloadPDFSaver class. In practice there's actually not as much going on there as I'd originally thought. There's quite a bit of indirection as a result of the serializable constructors, but in practice our sailfish-browser path is just creating an instance of DownloadPDFSaver and using it as the "saver" entry in a Download object.

But what does the DownloadPDFSaver do? It essentially has execute and cancel methods. The former just sets up the print settings for a standard PDF printout, after which it extracts the nsIWebBrowserPrint interface from the current window and executes the print() method on it. The latter is even simpler: it just calls cancel on the same interface.

Consequently if we were to implement a process for printing out the page it would have to go through exactly the same steps. So we may as well use what's there.

However, what I'm thinking is that rather than revert the revert that removed the changes, essentially forward-porting them to the newer code with a patch, it would be better to just drag this functionality into the EmbedLite code, inside the EmbedliteDownloadManager in particular.

We'll still be using the standard print interface, we just won't have to rely on the expectation of patches applying cleanly to future versions of what — it turns out — is not always the most stable codebase.

The only downside to this approach is that I don't yet know whether the underlying printing mechanisms are going to work. That means I'll be making a lot of changes without being able to adequately test them.

Given that, as a first step I plan to reapply the reverted changes, get the underlying print mechanism working and only then move the changes into EmbedliteDownloadManager.

This seems like a good plan to me.

So, for now I've reapplied the changes to reintroduce DownloadPDFSaver. The changes really aren't so big and, in fact, only apply to the single file DownloadCore.js. I could rebuild and reinstall the package to test it now, but given that there are only a few changes to a single file, all of which are on the JavaScript side, I think I'll have a go at applying them directly on my development phone.

The file I'm changing exists in omni.ja so I can use my script to pack and repack the file.
$ ./omni.sh unpack
Omni action: unpack
Unpacking from: /usr/lib64/xulrunner-qt5-91.9.1
Unpacking to:   ./omni

$ find . -iname "DownloadCore.jsm"
./omni/modules/DownloadCore.jsm
$ vim ./omni/modules/DownloadCore.jsm
$ ./omni.sh pack
Omni action: pack
Packing from: ./omni
Packing to:   /usr/lib64/xulrunner-qt5-91.9.1

$ EMBED_CONSOLE=1 sailfish-browser
Having applied these changes, now when I select the "Save web page as PDF" option I get a new error:
EmbedliteDownloadManager error: this._webBrowserPrint.print is not a function
JavaScript error: , line 0: uncaught exception: Object
CONSOLE message:
[JavaScript Error: "uncaught exception: Object"]
That's progress of sorts! The this._webBrowserPrint object that's causing this new error is accessed like this:
    this._webBrowserPrint = win.getInterface(Ci.nsIWebBrowserPrint);
So the object should be an instance of nsIWebBrowserPrint. Checking the interface definition file I can see there is a print() method defined like this:
  /**
   * Print the specified DOM window
   *
   * @param aThePrintSettings - Printer Settings for the print job, if
   *                            aThePrintSettings is null then the global PS
   *                            will be used.
   * @param aWPListener - is updated during the print
   * @return void
   */
  [noscript] void print(in nsIPrintSettings aThePrintSettings,
                        in nsIWebProgressListener aWPListener);
This is a slight change from the ESR 78 signature which looked like this:
  void print(in nsIPrintSettings aThePrintSettings,
             in nsIWebProgressListener aWPListener);
This change could be what's causing the problems. A quick run of git blame shows that the [noscript] tags were added in order to block access to these calls from JavaScript, so that would definitely explain what's going wrong.
$ git log -1 fbca1f9c2385b
commit fbca1f9c2385be0178d4dff3debef2728b049d74
Author: Jonathan Watt <jwatt@jwatt.org>
Date:   Sun Jul 12 16:39:30 2020 +0000

    Bug 1652337. Prevent script from calling nsIWebBrowserPrint.print().
    r=bobowen
    
    Differential Revision: https://phabricator.services.mozilla.com/D83264
Here's what the bug description explains about this:
 
There is only one "caller" left in the dead code that should be removed by bug 1641805. I don't have the time to figure out a proper patch to that bug right now, but that shouldn't stop us marking this API [noscript] to make reviews of changes to its implementation easier to review.

Well, we've just added that caller back in. And even if we make the cleaner changes that I proposed in the text earlier, we're still going to need access to this print() method. So I'm going to revert these [noscript] annotations as well. Unfortunately that means a full rebuild is in order before I'll be able to test them, which will take until the morning.

So I've set the build going. Let's see what this looks like tomorrow.

If you'd like to catch up one of the other 99 diary entries, they're all available on my Gecko-dev Diary page.
Comment
6 Dec 2023 : Day 99 #
It's the morning, the world is waking up after a night of torrential rain here in Cambridgeshire. And my build has completed.

I've copied the packages over to my development phone, installed them and have them ready to run. The thing I'm testing today is the change I made over the last couple of days to fix the preferences that were using the VarCache, but which are now using static preferences.

There are a bunch of user-defined preferences kept in the ~/.local/share/org.sailfishos/browser/.mozilla/prefs.js file which I want to take a look at first.
$ cat prefs.js 
// Mozilla User Preferences

// DO NOT EDIT THIS FILE.
//
// If you make changes to this file while the application is running,
// the changes will be overwritten when the application exits.
//
// To change a preference value, you can either:
// - modify it via the UI (e.g. via about:config in the browser); or
// - set it within a user.js file in your profile.

user_pref("app.update.lastUpdateTime.addon-background-update-timer", 1701505984);
user_pref("app.update.lastUpdateTime.region-update-timer", 1701624416);
user_pref("app.update.lastUpdateTime.search-engine-update-timer", 1701624536);
user_pref("app.update.lastUpdateTime.services-settings-poll-changes", 1701505864);
user_pref("app.update.lastUpdateTime.user-agent-updates-timer", 1701419251);
user_pref("app.update.lastUpdateTime.xpi-signature-verification", 1701505588);
[...]
I want to remove all the preferences in the embedlite group, as well as the keyword.enabled preference, since these are the ones I made changes to:
user_pref("embedlite.compositor.external_gl_context", true);
user_pref("embedlite.inputItemSize", "50");
user_pref("embedlite.zoomMargin", "20");
[...]
user_pref("keyword.enabled", true);
All four are now purged from the file. Time to fire up the browser to see what happens. My plan here is to test a couple of things:
  1. Whether changing the values sticks across execution runs.
  2. Whether changing the values affects behaviour.
Both of these should be the case. Let's see.

The results, I would say, are interesting. First of all the embedlite static preferences. I'm testing with the embedlite.azpc.handle.longtap preference because this has a very clear and testable impact on the user experience. When active, pressing and holding on an image will bring up a menu that allows you to save the image to disk. When disabled, pressing and holding on an image should have no effect.
 
Three phone screenshots: about:config page showing embedlite.apz.json.longtap checked; a page showing it unchecked; and a page showing the result of long-tapping on an image when the option is enabled, with the option to save the image out

From testing, the preference is working as expected. Long tap on an image works when the preference is set to true, but does nothing when it's set to false. That's the most important part of the test and gets a confirmatory tick: test passed.

Checking between execution runs I can see the preference is also sticky. Closing and reopening the browser doesn't affect the setting; on reopening it's set to false if it was disabled before closing and set to true otherwise: test passed.

I also checked that the preference value gets written to the prefs.js file and it does when it's set to false. When it's set to true it's removed because that's the default value. Great!

The static preferences are looking good. But one of the changes we made used a normal preference rather than a static preference and that was the keyword.enabled flag. This controls whether search is allowed in the address bar.

For this test I set the preference to true, typed some text in the address bar and checked that search was performed using my chosen search provider. I then set the preference to false and checked that the error page is displayed. All of this worked as expected: test passed.
 
Three phone screenshots: about:config page showing keyword.enabled checked; a page showing it unchecked; and a page showing an error stating that the address isn't valid (this is the result when the option is disabled)

Finally, does the keyword.enabled value stick across runs? I'm surprised to discover that it doesn't; every time the browser is restarted it's set back to true. I can also see that it's always set to true in the prefs.js file: test failed.

This last failure surprises me and it makes me wonder whether there's some code somewhere that's forcing it to be true? A quick grep of the code throws the following up:
$ grep -rIn "keyword.enabled" * -B1
apps/core/declarativewebutils.cpp-138-    // Enable internet search
apps/core/declarativewebutils.cpp:139:
    webEngineSettings->setPreference(QString("keyword.enabled"), QVariant(true));
I guess that explains it then: every time the browser is started it forcefully sets this value to true. I don't really know why this is done, it could be that the original value was false and it had to be enabled somewhere, or that it was to protect users from accidentally messing up their own settings. The commit message for the change from over a decade ago doesn't really help either:
$ git log -1 eeea59ae2
commit eeea59ae23466204efbbafca417f3f7542deff66
Author: Dmitry Rozhkov <dmitry.rozhkov@jollamobile.com>
Date:   Thu Aug 22 11:13:41 2013 +0300

    [sailfish-browser] Enable internet search. Fixes JB#5503, JB#5504, JB#5505,
        JB#1200
Nevertheless it does explain the behaviour we're seeing now, so that means everything is working as it should after all.

I've pushed my changes to the remote repository and so have also closed the issue.

This evening my plan is to move on to issue #1030 to try to fix printing. You might think that printing on a mobile browser is a bit redundant, but in sailfish-browser we use it to support export to PDF. I broke the printing functionality quite badly by essentially removing chunks of it to ensure the code would build. It'll be good to go back and set that straight.

[...]

So it's now the evening and time to start looking into the printing situation. The easiest way to check this is to try out the "Save web page as PDF" feature of the browser. But when I press the menu option to try this out nothing happens and I see the following output at the console:
CONSOLE message:
[JavaScript Error: "aSerializable.url is undefined"
    {file: "resource://gre/modules/DownloadCore.jsm" line: 1496}]
DownloadSource.fromSerializable@resource://gre/modules/DownloadCore.jsm:1496:5
Download.fromSerializable@resource://gre/modules/DownloadCore.jsm:1282:38
D_createDownload@resource://gre/modules/Downloads.jsm:108:39
observe/<@file:///usr/lib64/mozembedlite/components/EmbedliteDownloadManager.js:257:48
That immediately gives us a few leads to follow up. Looking inside the EmbedliteDownloadManager.js file, which is part of the embedlite-components repository, we can see the following code which relates to this:
          case "saveAsPdf":
            if (Services.ww.activeWindow) {
              (async function() {
                let list = await Downloads.getList(Downloads.ALL);
                let download = await Downloads.createDownload({
                  source: Services.ww.activeWindow,
                  target: data.to,
                  saver: "pdf",
                  contentType: "application/pdf"
                });
                download["saveAsPdf"] = true;
                download.start();
                list.add(download);
              })().then(null, Cu.reportError);
            } else {
              Logger.warn("No active window to print to pdf")
            }
            break;
The call that's causing the error is this one:
                let download = await Downloads.createDownload({
                  source: Services.ww.activeWindow,
                  target: data.to,
                  saver: "pdf",
                  contentType: "application/pdf"
                });
That takes us here:
  createDownload: function D_createDownload(aProperties) {
    try {
      return Promise.resolve(Download.fromSerializable(aProperties));
    } catch (ex) {
      return Promise.reject(ex);
    }
  },
Which takes us here:
Download.fromSerializable = function(aSerializable) {
  let download = new Download();
  if (aSerializable.source instanceof DownloadSource) {
    download.source = aSerializable.source;
  } else {
    download.source = DownloadSource.fromSerializable(aSerializable.source);
  }
Which ends up here:
DownloadSource.fromSerializable = function(aSerializable) {
  let source = new DownloadSource();
  if (isString(aSerializable)) {
    // Convert String objects to primitive strings at this point.
    source.url = aSerializable.toString();
  } else if (aSerializable instanceof Ci.nsIURI) {
    source.url = aSerializable.spec;
  } else {
    // Convert String objects to primitive strings at this point.
    source.url = aSerializable.url.toString();
Leaving us with the error "aSerializable.url is undefined". Unravelling all of this we can see that what went in as aProperties ended up as aSerializable. The properties set in aProperties are source, target, saver and contentType. Definitely no url.

In practice though, what it really wants is the source parameter sent in at the start to be either an instance of DownloadSource or a serialised version of it. We're setting it to Services.ww.activeWindow so it's probably also worth working out what Services.ww.activeWindow is returning.

The Services.ww reference appears to be to an implementation of nsIWindowWatcher. Here's the property in question:
  /**
      Retrieves the active window from the focus manager.
  */
  readonly attribute mozIDOMWindowProxy activeWindow;
I did have to make changes to some of the window referencing code, so it's possible I broke something. But it's also possible that the DownloadSource interface has changed. To check the latter I'm going to compare it against the equivalent ESR 78 code.

And there is a difference. It's right down at the DownloadSource.fromSerializable() level where there used to be a check for whether source was an instance of Ci.nsIDOMWindow and which now does something slightly different.

Here's the commit that made the change:
$ git log -1 -S "Ci.nsIDOMWindow" -- toolkit/components/downloads/DownloadCore.jsm
commit 258369999a13027375f4fa496a7d4b23fb1eddfa
Author: Jonathan Kew <jkew@mozilla.com>
Date:   Mon Jul 20 16:04:35 2020 +0000

    Bug 1641805 - Remove support for creating a DownloadSource from an
    nsIDOMWindow. r=mak
    
    Differential Revision: https://phabricator.services.mozilla.com/D84137
This is actually quite a significant change and the issue that describes it is even more alarming.
 
For Fission, all printing will need to be initiated from the parent process. All code that calls nsIWebBrowserPrint.print in the child process needs to be rewritten to invoke printing via the parent process, or otherwise removed.

As noted further down in the issue description, the changes that introduced this code may be a useful template.

It's been a long day for me today already, so I'm going to call it a night here. This clearly needs some more investigation which I'll need to pick up tomorrow.

I note with some trepidation that it's Day 99 today. That means tomorrow is the next order of magnitude of time spent working of this, from a base 10 perspective at least (my preference would be to work to base 2, which would make 128 the next big event). That may feel like an awfully long time and I know there are many Sailfish OS users who would just like this to be done and ready. All I can say is that every step brings us closer to a proper release and the quality really does improve with each one. I'm confident we'll get there.

If you'd like to catch up on all the diary entries, they're all available on my Gecko-dev Diary page.
Comment
5 Dec 2023 : Day 98 #
When I woke up this morning my build had failed. The changes I made to the preferences yesterday introduced an error into the C++ code:
224:55.80 mobile/sailfishos
224:57.07 ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
          EmbedLiteCompositorBridgeParent.cpp:19:10: fatal error:
          mozilla/StaticPrefs_embedlite.h: No such file or directory
224:57.07  #include "mozilla/StaticPrefs_embedlite.h" // for StaticPrefs::embedlite_*()
224:57.07           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224:57.07 compilation terminated.
So the StaticPrefs_embedlite.h header file, which I'd assumed would be generated by the build process, can't be found. Maybe it wasn't generated after all?

Although my assumptions were wrong, they were only partially wrong, because there is a file called StaticPrefList_embedlite.h that's been generated and it does contain the preferences I added to StaticPrefList.yaml. Here's a snippet:
// This file was generated by generate_static_pref_list.py 
// from modules/libpref/init/StaticPrefList.yaml. DO NOT EDIT.

ALWAYS_PREF(
  "embedlite.azpc.handle.viewport",
   embedlite_azpc_handle_viewport,
   embedlite_azpc_handle_viewport,
  bool, true
)
[...]
And there is also a file called StaticPrefs_embedlite.h which also contains what we'd expect based on what's there for the other groups:
// This file was generated by generate_static_pref_list.py from
// modules/libpref/init/StaticPrefList.yaml. DO NOT EDIT.
// Include it to gain access to StaticPrefs::embedlite_*.

#ifndef mozilla_StaticPrefs_embedlite_h
#define mozilla_StaticPrefs_embedlite_h

#include "mozilla/StaticPrefListBegin.h"
#include "mozilla/StaticPrefList_embedlite.h"
#include "mozilla/StaticPrefListEnd.h"

#endif  // mozilla_StaticPrefs_embedlite_h
So we're not missing the StaticPrefs_embedlite.h file, it's being correctly generated by generate_static_pref_list.py, it's just not being picked up by our #include directive.

I guess my task for today will be to figure out why not.

Doing a quick grep on the source tree shows that other examples of these headers are appearing in more of the generated output than the embedlite version is appearing in. For example in obj-build-mer-qt-xr/modules/libpref/backend.mk and in obj-build-mer-qt-xr/generated-sources.json. That must be a warning sign.

A bit more searching turns up the fact that the various groups are all listed in the pref_groups variable of the gecko-dev/modules/libpref/moz.build file. This turns out to be confirmed by the documentation at the head of the StaticPrefList.yaml file:
# Please follow the existing prefs naming convention when considering adding a
# new pref, and don't create a new pref group unless it's appropriate and there
# are likely to be multiple prefs within that group. (If you do, you'll need to
# update the `pref_groups` variable in modules/libpref/moz.build.)
It feels like there's a moral to be had about reading documentation here, but I can't quite put my finger on it. The good news is that there's nothing else in the documentation about other places the group needs to be added, so with any luck making this one change should be enough to do the trick.

I've added embedlite to the list of groups in the moz.build file.
pref_groups = [
    "accessibility",
    "alerts",
[...]
    "editor",
    "embedlite",
    "extensions",
    "findbar",
[...]
I'll need to do another full rebuild to see whether this has fixed anything.

Before I do that and call it a day, I want to try to pick off another of the JavaScript errors that are being output on startup. We have this issue #1041 about the EmbedLiteChromeManager component not being able to find a file:
JavaScript error: file:///usr/lib64/mozembedlite/components/
    EmbedLiteChromeManager.js, line 213: NS_ERROR_FILE_NOT_FOUND: 
Maybe, just maybe, this will be an easy one to track down and fix.

This file is part of the embedlite-components repository. Line 213, the place that's generating the error, doesn't look too auspicious:
      AboutCertViewerHandler.init();
Here's that line with a bit more context:
  observe(aSubject, aTopic, aData) {
    let self = this;
    switch (aTopic) {
[...]
    case "browser-delayed-startup-finished":
      AboutCertViewerHandler.init();
      Services.obs.removeObserver(this, "browser-delayed-startup-finished");
      break;
[...]
This AboutCertViewerHandler is defined lazily, so it could well be that the problem is a missing AboutCertViewerHandler.jsm source file:
XPCOMUtils.defineLazyModuleGetters(this, {
  AboutCertViewerHandler: "resource://gre/modules/AboutCertViewerHandler.jsm",
  ContentLinkHandler: "chrome://embedlite/content/ContentLinkHandler.jsm",
  Feeds: "chrome://embedlite/content/Feeds.jsm"
});
In that list, both ContentLinkHandler.jsm and Feeds.jsm can be found in the embedlite-components code.

On the other hand, there is a file called AboutCertViewerHandler.jsm in the ESR 78 gecko source that's been removed from the ESR 91 source. Well, not removed, but renamed:
$ git diff 92c10e5f503ad8e~ 92c10e5f503ad8e
[...]
diff --git a/toolkit/components/certviewer/AboutCertViewerHandler.jsm
           b/toolkit/components/certviewer/AboutCertViewerParent.jsm
similarity index 50%
rename from toolkit/components/certviewer/AboutCertViewerHandler.jsm
rename to toolkit/components/certviewer/AboutCertViewerParent.jsm
index 390fa3113836..16cedeafee9d 100644
--- a/toolkit/components/certviewer/AboutCertViewerHandler.jsm
+++ b/toolkit/components/certviewer/AboutCertViewerParent.jsm
What was AboutCertViewerHandler.jsm is apparently now AboutCertViewerParent.jsm. For reference, here's the commit message, although I'm not sure this on its own is super-enlightening.
$ git log -1 92c10e5f503ad8e16424e554f8fb6393ff77152f
commit 92c10e5f503ad8e16424e554f8fb6393ff77152f
Author: Neil Deakin <neil@mozilla.com>
Date:   Thu Jun 25 01:13:05 2020 +0000

    Bug 1646197, convert about:certificate to JSWindowActor instead of old RPM,
    r=johannh
    
    Differential Revision: https://phabricator.services.mozilla.com/D80017
The actual changeset D80017 associated with this is more helpful though. It makes clear that both the init() and uninit() have both been removed and don't need to be called any more. So we can remove them from our embedlite-components code too. We do need to check whether we're using this certificate code anywhere though.

A swift grep of the code shows it is being used, but only in the existing gecko code (none of the EmbedLite additions) and only in EmbedLiteChromeManager.js. In the latter it's only being used for init() and uninit(), so hopefully that means there are no other changes to make apart from removing these references.

The truth of whether it's needed in some other way may only become apparent later if there's some other broken functionality, but my gut tells me this won't happen.

These two calls that we've removed seem to be the only reason why we're interested in the "browser-delayed-startup-finished" and "xpcom-shutdown" event messages as well, so I've also removed the observers related to these:
$ git diff
diff --git a/jscomps/EmbedLiteChromeManager.js b/jscomps/EmbedLiteChromeManager.js
index da6151b..60e2aef 100644
--- a/jscomps/EmbedLiteChromeManager.js
+++ b/jscomps/EmbedLiteChromeManager.js
@@ -17,7 +17,6 @@ const { Services } = ChromeUtils.import("resource://gre/modules/Services.jsm");
 const { NetErrorHelper } = ChromeUtils.import("chrome://embedlite/content/NetErrorHelper.jsm")
 
 XPCOMUtils.defineLazyModuleGetters(this, {
-  AboutCertViewerHandler: "resource://gre/modules/AboutCertViewerHandler.jsm",
   ContentLinkHandler: "chrome://embedlite/content/ContentLinkHandler.jsm",
   Feeds: "chrome://embedlite/content/Feeds.jsm"
 });
@@ -137,8 +136,6 @@ EmbedLiteChromeManager.prototype = {
     Services.obs.addObserver(this, "embed-network-link-status", true)
     Services.obs.addObserver(this, "domwindowclosed", true);
     Services.obs.addObserver(this, "keyword-uri-fixup", true);
-    Services.obs.addObserver(this, "browser-delayed-startup-finished");
-    Services.obs.addObserver(this, "xpcom-shutdown");
   },
 
   onWindowOpen(aWindow) {
@@ -209,13 +206,6 @@ EmbedLiteChromeManager.prototype = {
       Services.io.offline = network.offline;
       Services.obs.notifyObservers(null, "network:link-status-changed",
                                    network.offline ? "down" : "up");
-    case "browser-delayed-startup-finished":
-      AboutCertViewerHandler.init();
-      Services.obs.removeObserver(this, "browser-delayed-startup-finished");
-      break;
-    case "xpcom-shutdown":
-      AboutCertViewerHandler.uninit();
-      break;
     default:
       Logger.debug("EmbedLiteChromeManager subject", aSubject, "topic:", aTopic);
     }
Removing code from our downstream changes always fills me with a warm and fuzzy feeling. Less code to maintain in future.

It's worth noting that this change is essentially reversing commit b1510a7a that added the initialisation and deinitialisation calls in to the EmbedLite code.

The commit message doesn't provide a huge amount of context, but it doesn't appear to be a smaller part of a larger changeset, so hopefully this is an atomic change without other consequences elsewhere.
$ git log -1 b1510a7a
commit b1510a7a5771906e44b4247bd75271c1bb5c54f6 (upstream/jb56094, andrew-korolev-omp/jb56094)
Author: Andrew den Exter <andrew.den.exter@jolla.com>
Date:   Mon Nov 15 06:46:16 2021 +0000

    [embedlite-components] Initialize the certificate viewer handler.
    JB#56094 OMP#JOLLA-492
Okay, I've made and committed these changes; I've built and installed the resulting embedlite-components packages. The errors seen in the console output are now gone and I don't see any new issues arising.

I think that's enough for today. I'm going to hit the main gecko-dev build off and see what happens with it tomorrow.

If you'd like to catch up on all the diary entries, they're all available on my Gecko-dev Diary page.
Comment
4 Dec 2023 : Day 97 #
It's another freezing cold day today. It certainly feels like winter now. All the better to be snug inside and doing some coding. As I write this it's already the evening; over the last few days I've been working on fixing address bar search. That's now working (up to a point, because not all of the available search providers' sites quite work correctly yet), but while investigating it I hit the issue of the VarCache having been removed. These need to be converted into statis preferences, as detailed in task #1027.

I've been psyching myself up for tackling these preferences. I'm not sure, but I suspect this task is going to require a bit of investigation and thought. Here's a bit of code that needs fixing. I've helpfully left some comments there for myself to pick up, which offers me some solid ground to start from.
static void ReadAZPCPrefs()
{
  // TODO: Switch these to use static prefs
  // See https://firefox-source-docs.mozilla.org/modules/libpref/index.html#static-prefs
  // Example: https://phabricator.services.mozilla.com/D40340

  // Init default azpc notifications behavior
  //Preferences::AddBoolVarCache(&sHandleDefaultAZPC.viewport,
    "embedlite.azpc.handle.viewport", true);
  //Preferences::AddBoolVarCache(&sHandleDefaultAZPC.singleTap,
    "embedlite.azpc.handle.singletap", false);
  //Preferences::AddBoolVarCache(&sHandleDefaultAZPC.longTap,
    "embedlite.azpc.handle.longtap", false);
  //Preferences::AddBoolVarCache(&sHandleDefaultAZPC.scroll,
    "embedlite.azpc.handle.scroll", true);

  //Preferences::AddBoolVarCache(&sPostAZPCAsJson.viewport,
    "embedlite.azpc.json.viewport", true);
  //Preferences::AddBoolVarCache(&sPostAZPCAsJson.singleTap,
    "embedlite.azpc.json.singletap", true);
  //Preferences::AddBoolVarCache(&sPostAZPCAsJson.doubleTap,
    "embedlite.azpc.json.doubletap", false);
  //Preferences::AddBoolVarCache(&sPostAZPCAsJson.longTap,
    "embedlite.azpc.json.longtap", true);
  //Preferences::AddBoolVarCache(&sPostAZPCAsJson.scroll,
    "embedlite.azpc.json.scroll", false);

  //Preferences::AddBoolVarCache(&sAllowKeyWordURL, "keyword.enabled",
    sAllowKeyWordURL);

  sHandleDefaultAZPC.viewport = true; // "embedlite.azpc.handle.viewport"
  sHandleDefaultAZPC.singleTap = false; // "embedlite.azpc.handle.singletap"
  sHandleDefaultAZPC.longTap = false; // "embedlite.azpc.handle.longtap"
  sHandleDefaultAZPC.scroll = true; // "embedlite.azpc.handle.scroll"

  sPostAZPCAsJson.viewport = true; // "embedlite.azpc.json.viewport"
  sPostAZPCAsJson.singleTap = true; // "embedlite.azpc.json.singletap"
  sPostAZPCAsJson.doubleTap = false; // "embedlite.azpc.json.doubletap"
  sPostAZPCAsJson.longTap = true; // "embedlite.azpc.json.longtap"
  sPostAZPCAsJson.scroll = false; // "embedlite.azpc.json.scroll"

  sAllowKeyWordURL = sAllowKeyWordURL;
    // "keyword.enabled" (intentionally retained for clarity)
}
In this text I've suggested to convert these into static prefs. Like normal prefers static prefs are handled via libpref, but as I understand it they're more efficient because they're defined as variables which can be requested to mirror the preference value, but which can't be set within the code. Normal prefs use a hashtable lookup which makes them somewhat slower to access.

I think static prefs are going to work better in the majority of cases for what we need because many of them will actually be referenced quite regularly. The one case that doesn't fit this pattern, somewhat ironically given it's the one that sparked this journey, is keywords.enabled. That's only called on the few occassions when something is entered into to address bar.

But for the APZ preferences, I've created static preferences in StaticPrefList.yaml like this:
-   name: embedlite.azpc.handle.viewport
    type: bool
    value: true
    mirror: always

-   name: embedlite.azpc.handle.singletap
    type: bool
    value: false
    mirror: always
[...]
These will get pulled into the build process which will generate a header file "StaticPrefs_embedlite.h" which I've added to the top of the EmbedLiteViewChild.cpp which should then allows me to write code like this:
  if (StaticPrefs::embedlite_azpc_handle_viewport()) {
    mHelper->UpdateFrame(aRequest);
  }
These will be super-efficient. For the keyword search preference I've done something slightly different, like this:
  if (Preferences::GetBool("keyword.enabled", true)) {
    flags |= nsIWebNavigation::LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP;
    flags |= nsIWebNavigation::LOAD_FLAGS_FIXUP_SCHEME_TYPOS;
  }
One final thing to do with these preferences is note that they also appear in embedding.js. The libpref documentation has this to say about this:
 
If a static pref is defined in both StaticPrefList.yaml and a pref data file, the latter definition will take precedence. A pref shouldn’t appear in both StaticPrefList.yaml and all.js, but it may make sense for a pref to appear in both StaticPrefList.yaml and an app-specific pref data file such as firefox.js.

This makes me wonder whether embedding.js is more like all.js, or more like firefox.js. My instinct says the latter, but even then this explanation implies that there are only some situations in which it might make sense to include them in an app-specific file. An obvious case might be when different apps need different default values.

I think there's no need to store them in embedding.js. Removing them from this won't prevent the user changing them for storage within their profile, so I don't see any downsides. It just removes one more place to have to maintain the details. So I've also deleted the preferences from embedding.js.

Apart from these preferences in EmbedLiteViewChild.cpp there are a few other preferences that need dealing with as a consequence of my earlier destruction too:
$ grep -rIn "AddBoolVarCache" *
embedding/embedlite/utils/BrowserChildHelper.cpp:82:
  //Preferences::AddBoolVarCache(&sPostAZPCAsJsonViewport, "embedlite.azpc.json.viewport", false);
embedding/embedlite/embedthread/EmbedLiteCompositorBridgeParent.cpp:68:
  //Preferences::AddBoolVarCache(&mUseExternalGLContext,
embedding/embedlite/embedshared/nsWindow.cpp:63:
  //Preferences::AddBoolVarCache(&sUseExternalGLContext,
embedding/embedlite/embedshared/nsWindow.cpp:65:
  //Preferences::AddBoolVarCache(&sRequestGLContextEarly,
I've gone through each of these and also turned them into static prefs. With this process complete, the next step is to build the code and see whether this produces good results. The build process will need to generate the header files from StaticPrefList.yaml for this to work, otherwise the C++ code won't compile. So doing the build is going to be essential and likely it'll need a full rebuild as well.

[...]

Except the build has failed:
 7:15.53 ${PROJECT}/gecko-dev/modules/libpref/init/StaticPrefList.yaml: error:
 7:15.53   `embedlite.azpc.handle.viewport` pref must come before
     `zoom.minPercent` pref
It turns out the preferences have to be in alphabetical order based on the first word in the name. Interesting! So I've reordered the preferences and set the build going again.

So that has to be it for today. Tomorrow morning, once the build has hopefully completed, we can see whether it works or not.

Before signing off I want to note down one other point that I think is interesting. It's something that I've wondered for a long time but never got around to checking. If we look in the embedding.js file we see lots of lines like this:
pref("dom.w3c_touch_events.enabled", 1);
pref("dom.w3c_touch_events.legacy_apis.enabled", true);
[...]
I've always been curious to know what these lines actually do. In my exploration around the code today I noticed that this function is defined in prefcalls.js. So mystery solved, here's what this is doing:
function pref(prefName, value) {
  try {
    var prefBranch = getPrefBranch();

    if (typeof value == "string") {
      if (gIsUTF8) {
        prefBranch.setStringPref(prefName, value);
        return;
      }
      prefBranch.setCharPref(prefName, value);
    } else if (typeof value == "number") {
      prefBranch.setIntPref(prefName, value);
    } else if (typeof value == "boolean") {
      prefBranch.setBoolPref(prefName, value);
    }
  } catch (e) {
    displayError("pref", e);
  }
}
That all rather makes sense. Okay, that really is it for today.

If you'd like to catch up on all the diary entries, they're all available on my Gecko-dev Diary page.
Comment
3 Dec 2023 : Day 96 #
This morning I'm on the train headed for work. It's a very cold and frosty morning — it could almost be Finland — and my development environment isn't ideal for SSH-ing into my phone, but it's not busy so I've experienced worse!
 
SSH-ing into my phone from my laptop on the train

We finished up yesterday with the realisation that search provider installation was being blocked by changes that prevented the OpenSearch configuration being loaded from disk. The download scheme was being checked by a regex that only allowed "https" to be used.

This morning I overrode the regex to allow the "file" scheme to be used. Here's the resulting output.
Frame script: embedhelper.js loaded
SEARCH: getFixupURIInfo() fixupFlags: 8
SEARCH: engine num: 0
SEARCH: addOpenSearchEngine():
        file:///usr/lib64/mozembedlite/chrome/embedlite/content/bing.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri:
        file:///usr/lib64/mozembedlite/chrome/embedlite/content/bing.xml
SEARCH: _install: Downloading engine from:
        file:///usr/lib64/mozembedlite/chrome/embedlite/content/bing.xml
This shows that the code is getting past the check, but not much else. And crucially, the search providers still aren't being installed correctly. We're also getting the following errors a little later:
[JavaScript Error: "TypeError: this._engineToUpdate is null"
    {file: "resource://gre/modules/OpenSearchEngine.jsm" line: 138}]
_onLoad@resource://gre/modules/OpenSearchEngine.jsm:138:5
onStopRequest@resource://gre/modules/SearchUtils.jsm:92:10
This error is actually coming from one of the debug output lines I added myself:
    dump("SEARCH: _onLoad: Downloaded engine from:" + this._engineToUpdate.name + "\n");
I've removed the reference to this._engineToUpdate from this line and now it passes correctly.

But not only that, now the search functionality is back too!

This is great, but I'm a little nervous about the changes: there were some big changes made to remove the "file" scheme and I'll need to look carefully into why this was and whether it's safe to work around it. The changes I've made to allow it are minimal and I'd feared I'd have to restore far more of the changes in changeset D104010.

The good news is I can stop debugging my phone while I travel. Looking like someone from Mr Robot on the train isn't something I'm totally comfortable with! I'll return to these changes and put together a proper fix this evening.
 
Two phone screenshots; on the left the search shows an error; on the right a bing search for the phrase "test search"

[...]

It's evening now and time to turn the hacky changes I made earlier into a proper fix. Before I do that I want to reflect on a discussion with Ville Nummela on Mastodon. Ville — vige — is one of my ex-colleagues from Jolla and, you won't be surprised to hear, someone I have a lot of time and respect for. Ville pointed out that OpenSearch is still supported on Firefox, so if it's causing trouble like this for our version for the sailfish-browser, shouldn't it cause similar issues on Firefox?
 
My understanding is that Firefox still allows adding OpenSearch providers from websites - that's why https is still there. Do they reload the providers via https every time?

This is such a good question and something I should have thought about and investigated over the last few days. It turns out there is a good answer and that, as you might expect, Firefox isn't downloading the OpenSearch provider XML files every time it starts up.

It took me a while to dig through the code, but the answer is in the SearchSettings.jsm file from upstream gecko. This provides two crucial methods: SearchSettings._write() and SearchSettings.get(). The first of these writes the OpenSearch providers out to disk, but not in their original XML format, but rather in a compressed JSON format. The second of these reads them back in again from disk.

The reason why it's _write() and not write() is that it's not supposed to be called directly. Instead it's called automatically via _delayedWrite() when something changes:
  // nsIObserver
  observe(engine, topic, verb) {
    switch (topic) {
      case SearchUtils.TOPIC_ENGINE_MODIFIED:
        switch (verb) {
          case SearchUtils.MODIFIED_TYPE.ADDED:
          case SearchUtils.MODIFIED_TYPE.CHANGED:
          case SearchUtils.MODIFIED_TYPE.REMOVED:
            this._delayedWrite();
            break;
        }
        break;
[...]
Here are the first few lines including docstrings of each of the methods for reference. First for writing out to disk:
  /**
   * Writes the settings to disk (no delay).
   */
  async _write() {
    if (this._batchTask) {
      this._batchTask.disarm();
    }

    let settings = {};

    // Allows us to force a settings refresh should the settings format change.
    settings.version = SearchUtils.SETTINGS_VERSION;
    settings.engines = [...this._searchService._engines.values()];
    settings.metaData = this._metaData;
[...]
Second for reading in from disk. This is the part that replaces the code that we had to patch to allow loading of the XML file:
  /**
   * Reads the settings file.
   *
   * @param {string} origin
   *   If this parameter is "test", then the settings will not be written. As
   *   some tests manipulate the settings directly, we allow turning off writing to
   *   avoid writing stale settings data.
   * @returns {object}
   *   Returns the settings file data.
   */
  async get(origin = "") {
    let json;
    await this._ensurePendingWritesCompleted(origin);
    try {
      let settingsFilePath = PathUtils.join(
        await PathUtils.getProfileDir(),
        SETTINGS_FILENAME
      );
[...]
In both of these snippets of code we can see this important constant, which is the name of the file to save out or load in and which we can see initialised at the top of the same file:
const SETTINGS_FILENAME = "search.json.mozlz4";
If we look inside the ESR 91 settings folder we can see that this file is indeed being saved out:
 ls -l ~/.local/share/org.sailfishos/browser/.mozilla/search.json.mozlz4 
-rw-rw-r--    1 defaultu defaultu     17715 Dec  2 20:03 search.json.mozlz4
We can compare this to the XML files that sailfish-browser is saving out to be loaded at start up to reinitialise the OpenSearch providers:
$ ls -l ~/.local/share/org.sailfishos/browser/searchEngines/
total 16
-rw-rw-r--    1 defaultu defaultu      1949 Dec  2 19:08 duckduckgo.com.xml
-rw-rw-r--    1 defaultu defaultu       688 Dec  2 19:05 forum.sailfishos.org.xml
-rw-rw-r--    1 defaultu defaultu       549 Nov 26 14:48 github.com.xml
-rw-rw-r--    1 defaultu defaultu      3493 Dec  2 19:06 www.openstreetmap.org.xml
In conclusion, Firefox is using a different mechanism to save out and load in the search providers. The file is being saved out to our profile, but at no point are we loading it back in. Updating the code to load it in using the Firefox approach would be a nice fix: it would mean we wouldn't have to patch the XML loading code, it would mean less to maintain in future and it would also mean we're not saving out the same data to two different places.

This also helps answers my earlier question about why loading of these XML files was restricted to the https scheme. The good news is that the change I've made is probably fine, it just isn't being used by Firefox any more. But I've also created ticket #1048 suggesting that we update our code to move to the Firefox approach.

Now back to the changes we've already made. The fixes I added to allow loading using the file scheme as well as https are still just hacked onto my phone. With these added the output from the search installation at start up looks much cleaner:
SEARCH: engine num: 6
SEARCH: engine name: Bing
SEARCH: engine name: DuckDuckGo
SEARCH: engine name: GitHub
SEARCH: engine name: Yandex
SEARCH: engine name: Google
SEARCH: engine name: Yahoo
SEARCH: get defaultEngine: 
SEARCH: get defaultEngine: 
SEARCH: EmbedLiteSearchEngine setdefault received
SEARCH: EmbedLiteSearchEngine engine: DuckDuckGo
After removing all the debug output I'm left with just four simple changes. First in EmbedLiteViewChild.cpp we have this:
-static bool sAllowKeyWordURL = false;
+static bool sAllowKeyWordURL = true;
In OpenSearchEngine.jsm, mimicking the approach originally used to allow both "http" and "ftp" schemes, I've adjusted it to be both "http" and "file" schemes:
-    if (!/^https?$/i.test(loadURI.scheme)) {
+    if (!/^(?:https?|file)$/i.test(loadURI.scheme)) {
Both of these changes are in the gecko-dev repository, but there are also changes needed in the embedlite-components repository as well. In EmbedLiteSearchEngine.js:
-            Services.search.addEngine(data.uri, null, data.confirm).then(
+            Services.search.addOpenSearchEngine(data.uri, null).then(
And finally in PromptService.js:
+const { ComponentUtils } = ChromeUtils.import("resource://gre/modules/ComponentUtils.jsm");
[...]
-this.NSGetFactory = XPCOMUtils.generateNSGetFactory([PromptService, AuthPromptAdapterFactory]);
+this.NSGetFactory = ComponentUtils.generateNSGetFactory([PromptService, AuthPromptAdapterFactory]);
Apart from the first instance all of these changes are to the JavaScript. Since I've already compiled the single small C++ change I can, in effect, make all of these changes now without having to rebuild gecko.

Nevertheless to ensure everything has been changed correctly I'll leave the build running overnight and install it tomorrow morning.

One thing you may notice is that the sAllowKeyWordURL isn't actually how it's supposed to be for a proper fix. Rather than setting this statically to be true we should really be reading the value of the keyword.enabled preference. In fact there are a whole bunch of preferences like this that we need to fix.

Because of this it makes more sense to do them all together rather than finding a solution for just this particular case. So maybe fixing all of these preferences would make a sensible task to start tomorrow.

Step by step, slowly but surely, the browser functionality is improving where we're getting to the point that the browser is quite usable. But clearly there is still more to be done.

If you'd like to catch up on all the diary entries, they're all available on my Gecko-dev Diary page.
Comment
2 Dec 2023 : Day 95 #
Unfortunately I don't have as much time today as I've had the last few days to work on Gecko, but hopefully there's still scope to make some meaningful progress.

I'm still trying to track down the address bar issue that's preventing Web search from working. In theory, when you enter a phrase that's clearly not a URL (or begins with a question mark) the browser should delegate the action to your chosen search provider (as configured in the Settings).

Over the last couple of days we got to the point where a number of issues have been fixed (so no overt error messages) but during initialisation the list of search engines is empty. It should be full of stuff, so today I want to try to find out why.

So, I've added a load of debug dumps to various important files related to the OpenSearch capability: SearchService.jsm, SearchUtils.jsm and OpenSearchEngine.js. These will generate debug output in the log (output to the console) so that we can see what's happening when and in what sequence, as well as what's not happening.

From this we can clearly see that addOpenSearchEngine() is being called for each of the stored search providers:
SEARCH: getFixupURIInfo() fixupFlags: 8
CONSOLE message:
[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: "{file: "file:///usr/lib64/
    mozembedlite/components/EmbedLiteChromeManager.js" line: 213}]
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteChromeManager.js:213:7
SEARCH: engine num: 0
SEARCH: addOpenSearchEngine(): file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/bing.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri: file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/bing.xml
SEARCH: addOpenSearchEngine(): file:///home/defaultuser/.local/share/
        org.sailfishos/browser/searchEngines/duckduckgo.com.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri: file:///home/defaultuser/.local/share/
        org.sailfishos/browser/searchEngines/duckduckgo.com.xml
SEARCH: addOpenSearchEngine(): file:///home/defaultuser/.local/share/
        org.sailfishos/browser/searchEngines/github.com.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri: file:///home/defaultuser/.local/share/
        org.sailfishos/browser/searchEngines/github.com.xml
SEARCH: addOpenSearchEngine(): file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/google.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri: file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/google.xml
SEARCH: addOpenSearchEngine(): file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/yahoo.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri: file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/yahoo.xml
SEARCH: addOpenSearchEngine(): file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/yandex.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri: file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/yandex.xml
JavaScript error: chrome://embedlite/content/embedhelper.js, line 259:
    TypeError: sessionHistory is null
CONSOLE message:
[JavaScript Error: "TypeError: sessionHistory is null"
    {file: "chrome://embedlite/content/embedhelper.js" line: 259}]
receiveMessage@chrome://embedlite/content/embedhelper.js:259:29
We can also see from the stack where it's being called from, but that's not really the interesting part. The interesting part is what's not being output. Calls to addOpenSearchEngine() should almost immediately call the following:
      errCode = await new Promise(resolve => {
        engine._install(engineURL, errorCode => {
          resolve(errorCode);
        });
      });
I've added a bunch of dump outputs to the _install() method like this:
  _install(uri, callback) {
    dump("SEARCH: _install: uri: " + uri + "\n");
    let loadURI = uri instanceof Ci.nsIURI ? uri : SearchUtils.makeURI(uri);
    if (!loadURI) {
      throw Components.Exception(
        loadURI,
        "Must have URI when calling _install!",
        Cr.NS_ERROR_UNEXPECTED
      );  
    }   
    if (!/^https?$/i.test(loadURI.scheme)) {
      throw Components.Exception(
        "Invalid URI passed to SearchEngine constructor",
        Cr.NS_ERROR_INVALID_ARG
      );  
    }

    logConsole.debug("_install: Downloading engine from: ", loadURI.spec);
    dump("SEARCH: _install: Downloading engine from:" + loadURI.spec + "\n");
So the first dump() inside the _install() method is generating output, but the second isn't. Why not? I'll put some extra dump output inside those conditionals to see if the exceptions are firing.
SEARCH: addOpenSearchEngine(): file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/bing.xml
SEARCH: stack start
addOpenSearchEngine@resource://gre/modules/SearchService.jsm:1869:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:56:29
SEARCH: stack end
SEARCH: _install: uri: file:///usr/lib64/mozembedlite/chrome/
        embedlite/content/bing.xml
SEARCH: _install: Invalid URI passed to SearchEngine constructor
And now things are beginning to take shape.

The file itself exists:
$ file /usr/lib64/mozembedlite/chrome/embedlite/content/bing.xml
/usr/lib64/mozembedlite/chrome/embedlite/content/bing.xml:
    exported SGML document, ASCII text, with very long lines (1815)
But this bit of code is causing the failure here:
if (!/^https?$/i.test(loadURI.scheme)) {
  dump("SEARCH: _install: Invalid URI passed to SearchEngine constructor\n");
  throw Components.Exception(
    "Invalid URI passed to SearchEngine constructor",
    Cr.NS_ERROR_INVALID_ARG
  );
}
The sailfish-browser code is passing in a URL that's a local file here, using the file scheme, but it's performing a regex test so that it will only accept an https scheme to download the data. The obvious question that springs to mind is what changed between ESR 78 and ESR 91 to be causing this?

In ESR 78 there is no OpenSearchEngine.jsm file because it was renamed from SearchEngine.jsm. The two are close enough to be able to compare though. Here's the equivalent piece of code from ESR 78:
    switch (optionsURI.scheme) {
      case "https":
      case "http":
      case "ftp":
      case "data":
      case "file":
      case "resource":
      case "chrome":
        uri = optionsURI;
        break;
      default:
        throw Components.Exception(
          "Invalid URI passed to SearchEngine constructor",
          Cr.NS_ERROR_INVALID_ARG
        );
    }
That's quite a difference. Let's find out why this was changed.

Checking using git blame shows this was the most recent change to this line:
$ git log -1 3760df94f0b94
commit 3760df94f0b94d8aab5ed10c6177e8fad6345cc3
Author: Valentin Gosu <valentin.gosu@gmail.com>
Date:   Wed Apr 7 10:20:58 2021 +0000

    Bug 1692018 - Remove support for installing OpenSearch engines via ftp
    r=Standard8
    
    Differential Revision: https://phabricator.services.mozilla.com/D107790
With the actual change looking like this:
--- a/toolkit/components/search/OpenSearchEngine.jsm
+++ b/toolkit/components/search/OpenSearchEngine.jsm
@@ -91,7 +91,7 @@ class OpenSearchEngine extends SearchEngine {
         Cr.NS_ERROR_UNEXPECTED
       );
     }
-    if (!/^(?:https?|ftp)$/i.test(loadURI.scheme)) {
+    if (!/^https?$/i.test(loadURI.scheme)) {
       throw Components.Exception(
         "Invalid URI passed to SearchEngine constructor",
         Cr.NS_ERROR_INVALID_ARG
We'll need to go back a bit further than this. Here's the change that happened before:
$ git log -1 269a6fbae9928
commit 269a6fbae9928d4fef5ca4cc9ee2b112b2772191
Author: Mark Banner <standard8@mozilla.com>
Date:   Wed Feb 10 18:12:08 2021 +0000

    Bug 1690750 - Simplify OpenSearchEngine to only allow loading engines from
                  protocols where users can load them from. r=mak
    
    The urls where an OpenSearch engine can be loaded from are already limited
    in LinkHandlerChild. This is cleaning up and simplifying what the
    OpenSearchEngine allows, and as a result allows the load path handling to be
    greatly simplified.
    
    The test changes are due to no longer allowing chrome or file protocols. For
    future, we probably want to move away from OpenSearch for most of these, but
    the changes will make it easier to find the places to update.
    
    Differential Revision: https://phabricator.services.mozilla.com/D104010
This is a much more significant change, which includes the removal of the file scheme.

This is all useful stuff, even if it's not yet a solution to the problem. I'm going to have to leave it here for today, but will return to this tomorrow. The next thing will be to try letting the file protocol through the regex to see what happens. I'm not expecting that to be enough, and if not, we'll have to dig into the whole of the D104010 diff to find out what's changed and whether we'll need to revert it, or find some other solution.

The underlying issue here seems to be the phasing out of OpenSearch as a way of managing search providers. So we should probably look seriously at the Web Extension alternative as well. But that's for tomorrow.

If you'd like to catch up on all the diary entries, they're all available on my Gecko-dev Diary page.
Comment
1 Dec 2023 : Day 94 #
Good morning! It's a bright, crisp, cold morning in the UK today. Bright sun but also frost on the ground. And I'm feeling quite energised about getting to the bottom of this address bar search issue that I've been working on over the last couple of days.

If you've been following along you'll recall that we got to the point where the URIFixup class is requesting the search engine to use from the SearchService service. It goes like this:
    // Try falling back to the search service's default search engine
    // We must use an appropriate search engine depending on the private
    // context.
    let engine = isPrivateContext
      ? Services.search.defaultPrivateEngine
      : Services.search.defaultEngine;
To understand that I'm going to need a better way to debug the code. Gecko and EmbedLite both have their own logging mechanisms. EmbedLite has the EMBED_CONSOLE environment variable while most of gecko uses MOZ_LOG. Ultimately they EmbedLite logging is implemented using the dump() function. We can see this in the jsscripts/Loggger.js file in embedlite-components.

Not all of these logging approaches work well in different situations, but I did some testing and this dump() function works nicely in all the places I'm interested in. Now I wouldn't use it for real logging, but I'm just doing debugging so need an approach that will always work. There's no need for me to be able to activate and deactivate it later, because I'm going to remove all usage of it before I commit anything. So dump() is the perfect solution for my needs. I just have to remember to put newlines at the end of everything.

To complicate matters I want to compare the JavaScript execution between ESR 78 and ESR 91. That will require me to annotate on two devices. I'm working with omni.ja which we discussed on day 86. It's a bit of a pain to work with because the contents have to be unpacked, so that they can then be edited, repacked and reinstalled for every change. It's a messy process compared to being able to edit something in-place.

So for my first task today I'm going to create a helper shell script to make this process a little easier. The idea is to allow the omni.ja archive in the gecko install directory to be unpacked to a working folder where I can edit it. The script will then package it back up and reinstall it so that the changes take effect. Doing this cleanly with a single command to pack and unpack will make my life that little bit easier.

Here's what I've come up with.
#!/bin/bash

set -e
OPTION=$1
INSTALL="/usr/lib64/xulrunner-qt5-91.9.1"
WORKSPACE="./omni"
STARTUPCACHE="${HOME}/.local/share/org.sailfishos/browser/.mozilla/startupCache/"

if [ "${OPTION}" == "" ]; then
  OPTION="no action provided"
fi

echo "Omni action: ${OPTION}"

function unpack() {
  echo "Unpacking from: ${INSTALL}"
  echo "Unpacking to:   ${WORKSPACE}"
  echo ""
  if [ -d ${WORKSPACE} ]; then
    rm -r ${WORKSPACE}
  fi
  mkdir ${WORKSPACE}
  unzip -q ${INSTALL}/omni.ja -d ${WORKSPACE}
}

function pack() {
  echo "Packing from: ${WORKSPACE}"
  echo "Packing to:   ${INSTALL}"
  echo ""
  cd omni
  zip -qr9XD ../omni.ja *
  cd ..
  devel-su mv omni.ja ${INSTALL}/
  rm -rf ${STARTUPCACHE}
}

function help() {
  echo "Please pass in a parameter of either \"pack\" or \"unpack\"."
  echo ""
}

case ${OPTION} in
  "unpack") unpack;;
  "pack") pack;;
  *) help;;
esac
Notice that after packing up and installing the omni.js back into the gecko directory the script also deletes the startupCache. This is a common gotcha that I've been caught be several times myself. When gecko initialises it performs some serialisation of its own default JavaScript scripts. This takes some time, so the library dumps the result out into the startupCache to avoid having to do this every time the browser is started. However, this can mean that even after a file in omni.js has been changed, the edits won't affect execution which is using the files in the startupCache instead. Deleting the startupCache forces the library to recreate the startupCache from the newly installed files.

I've copied this script over to both my phones and carefully set the INSTALL path appropriate. This should make this whole debugging process easier. If you see me using either of these commands in future examples, it'll be making use of this script:
$ ./omni.sh unpack
$ ./omni.sh pack
Now it's time to pepper the code with some dump() output. I've started with some output at the start of the _setEngineDefault() function in SearchService.jsm like this:
  /**
   * Helper function to set the current default engine.
   *
   * @param {boolean} privateMode
   *   If true, sets the default engine for private browsing mode, otherwise
   *   sets the default engine for the normal mode. Note, this function does not
   *   check the "separatePrivateDefault" preference - that is up to the caller.
   * @param {nsISearchEngine} newEngine
   *   The search engine to select
   */
  _setEngineDefault(privateMode, newEngine) {
    dump("SEARCH: _setEngineDefault(): " + newEngine.name + "\n");
    dump("SEARCH: stack start\n");
    dump(Error().stack);
    dump("SEARCH: stack end\n");
    this._ensureInitialized();
[...]
I've also added some debug output in the URIFixup.jsm file as well:
  getFixupURIInfo(uriString, fixupFlags = FIXUP_FLAG_NONE) {
    let isPrivateContext = fixupFlags & FIXUP_FLAG_PRIVATE_CONTEXT;

    dump("SEARCH: getFixupURIInfo() fixupFlags: " + fixupFlags.toString(16) + "\n");
And finally also into the EmbedLiteSearchEngine.js file:
  case "setdefault": {
    dump("SEARCH: EmbedLiteSearchEngine setdefault received\n");
    var engine = Services.search.getEngineByName(data.name);
    if (engine) {
      dump("SEARCH: EmbedLiteSearchEngine engine: " + engine.name + "\n");
      Services.search.defaultEngine = engine;
      dump("SEARCH: EmbedLiteSearchEngine set\n");
[...]
Now when I run the ESR 78 code I see this in the output:
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
SEARCH: getFixupURIInfo() fixupFlags: 8
SEARCH: EmbedLiteSearchEngine setdefault received
SEARCH: EmbedLiteSearchEngine engine: DuckDuckGo
SEARCH: _setEngineDefault(): DuckDuckGo
SEARCH: stack start
_setEngineDefault@resource://gre/modules/SearchService.jsm:2992:10
set defaultEngine@resource://gre/modules/SearchService.jsm:3076:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:78:15
SEARCH: stack end
SEARCH: EmbedLiteSearchEngine set
From this we can tell that the search engine is being set to DuckDuckGo. From the call stack we can see this is triggered from line 78 of EmbedLiteSearchEngine.js.

But there's nothing similar appearing in the output from ESR 91:
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
SEARCH: getFixupURIInfo() fixupFlags: 8
JavaScript error: chrome://embedlite/content/embedhelper.js, line 259:
    TypeError: sessionHistory is null
CONSOLE message:
[JavaScript Error: "TypeError: sessionHistory is null"
    {file: "chrome://embedlite/content/embedhelper.js" line: 259}]
receiveMessage@chrome://embedlite/content/embedhelper.js:259:29
In particular, we can see here that there's no "EmbedLiteSearchEngine setdefault received" message appearing. That means there's no embedui:search message being received by EmbedLiteSearchEngine with the topic set to setdefault.

When I change the search provider in the settings on the ESR 78 version I also see this output:
SEARCH: EmbedLiteSearchEngine setdefault received
SEARCH: EmbedLiteSearchEngine engine: Google
SEARCH: _setEngineDefault(): Google
SEARCH: stack start
_setEngineDefault@resource://gre/modules/SearchService.jsm:2992:10
set defaultEngine@resource://gre/modules/SearchService.jsm:3076:10
observe@file:///usr/lib64/mozembedlite/components/EmbedLiteSearchEngine.js:78:15
SEARCH: stack end
SEARCH: EmbedLiteSearchEngine set
Whereas on ESR 91 I see no output at all when changing search providers. There's clearly something going wrong here and certainly the search isn't going to work until this is working, so fixing this is next on my list of tasks.

So the question now is, what should be sending this embedui:search message with setdefault topic and why is it either not being sent, or not being received?

To figure this out we're going over to the sailfish-browser repository and the settingmanger.cpp file that's kept there. In that we can see a SettingManager::setSearchEngine() method which is one of the places where this embedui:search messages is sent from.
(gdb) break SettingManager::setSearchEngine
Breakpoint 2 at 0x55555a7e30: file ../core/settingmanager.cpp, line 116.
(gdb) r
[...]
Thread 1 "sailfish-browse" hit Breakpoint 2, SettingManager::setSearchEngine
    (this=0x5555894f60) at ../core/settingmanager.cpp:116
116         if (m_searchEnginesInitialized) {
(gdb) p m_searchEnginesInitialized
$1 = false
Here we can see that the reason the event isn't being sent is because m_searchEnginesInitialized is set to false. Here's the bit of code that should be setting it to true just below in the same file (see the last line of this extract):
void SettingManager::handleObserve(const QString &message, const QVariant &data)
{
    const QVariantMap dataMap = data.toMap();
    if (message == QLatin1String("embed:search")) {
        QString msg = dataMap.value("msg").toString();
        if (msg == QLatin1String("init")) {
            const StringMap configs(
              OpenSearchConfigs::getAvailableOpenSearchConfigs());
            const QStringList configuredEngines = configs.keys();
            QStringList registeredSearches(dataMap.value(QLatin1String
              ("engines")).toStringList());
            QString defaultSearchEngine = dataMap.value(QLatin1String
              ("defaultEngine")).toString();
            m_searchEnginesInitialized = !registeredSearches.isEmpty();
As we can see from this, there may be two reasons why it's never being set to true. Either the embed:Search message with msg parameter set to init isn't being received, or it is being received and the registeredSearches list is empty. Let's find out which.
(gdb) break SettingManager::handleObserve
Breakpoint 3 at 0x55555a8188: file ../core/settingmanager.cpp, line 132.
(gdb) r
Thread 1 "sailfish-browse" hit Breakpoint 3, SettingManager::handleObserve
    (this=0x55558acfe0, message=..., data=...) at ../core/settingmanager.cpp:132
132         const QVariantMap dataMap = data.toMap();
(gdb) printqs5static message
(Qt5 QString)0xffffe6d8 length=19: "embed:nsPrefChanged"
(gdb) c
Thread 1 "sailfish-browse" hit Breakpoint 3, SettingManager::handleObserve
    (this=0x55558acfe0, message=..., data=...) at ../core/settingmanager.cpp:132
132         const QVariantMap dataMap = data.toMap();
(gdb) printqs5static message
(Qt5 QString)0xffffe6d8 length=19: "embed:nsPrefChanged"
(gdb) c
You'll notice that I'm using a non-standard printqs5static command here. This is a very handy gdb macro from Konrad Rosenbaum for printing out Qt QString structures so that they show the string inside. Using this is far easier than working through all of the substructures manually.

We can see that the messages coming through are of type embed:nsPrefChanged, not the embed::search messages I'm interested in. Stepping through the code I continued to get these embed:nsPrefChanged messages. Eventually I got bored of stepping through them and refined my breakpoint so it would only be hit if the message was the one I'm interested in:
(gdb) delete break 3
(gdb) b settingmanager.cpp:134
Breakpoint 4 at 0x55555a8288: file ../core/settingmanager.cpp, line 136.
(gdb) c
Thread 1 "sailfish-browse" hit Breakpoint 4, SettingManager::handleObserve
    (this=0x55558acfe0, message=..., data=...) at ../core/settingmanager.cpp:136
136                 const StringMap configs(OpenSearchConfigs::
                                            getAvailableOpenSearchConfigs());
(gdb) printqs5static message
(Qt5 QString)0xffffe6d8 length=12: "embed:search"
(gdb) printqs5static msg
(Qt5 QString)0xffffe4f8 length=4: "init"
(gdb) 
That's more like it. Now we can see that the init message is received. So what about that list of search engines? Well, as I continue to step through we can see something interesting.
(gdb) n
137                 const QStringList configuredEngines = configs.keys();
(gdb) n
JavaScript error: resource://gre/modules/SessionStoreFunctions.jsm, line 120:
    NS_ERROR_FILE_NOT_FOUND: 
CONSOLE message:
[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: "
    {file: "resource://gre/modules/SessionStoreFunctions.jsm" line: 120}]
SSF_updateSessionStoreForWindow@resource://gre/modules/
    SessionStoreFunctions.jsm:120:5
UpdateSessionStoreForStorage@resource://gre/modules/
    SessionStoreFunctions.jsm:54:35

138                 QStringList registeredSearches(dataMap.value(QLatin1String
                        ("engines")).toStringList());
(gdb) n
139                 QString defaultSearchEngine = dataMap.value(QLatin1String
                        ("defaultEngine")).toString();
(gdb) n
1066    /usr/include/qt5/QtCore/qstring.h: No such file or directory.
(gdb) n
140                 m_searchEnginesInitialized = !registeredSearches.isEmpty();
(gdb) n
108     /usr/include/qt5/QtCore/qlist.h: No such file or directory.
(gdb) p m_searchEnginesInitialized
$6 = false
(gdb) p registeredSearches
(gdb) p registeredSearches.p.d->end - registeredSearches.p.d->begin
$12 = 0
(gdb) n
144                 if (!m_searchEnginesInitialized) {
Note here that the registeredSearches.p.d->end - registeredSearches.p.d->begin calculation is returning the size of the registeredSearches list, as found in the Qt source code.

So, the message is being received, but the value that's supposed to contain the list of search engines is empty and moreover we're also getting the NS_ERROR_FILE_NOT_FOUND error midway through the process. That error may be a red herring (something random happening on a different thread), but we should keep it in mind just in case.

So, onward, let's find out why the list being sent is empty.

The place where this gets sent from is the EmbedLiteSearchEngine.js file which, on receiving the infamous "profile-after-change" event sends out the list of search engines like this:
  case "profile-after-change": {
    Services.obs.removeObserver(this, "profile-after-change");
    Services.search.getEngines().then((engines) => {
      let engineNames = engines.map(function (element) {
        return element.name;
      });
      let enginesAvailable = (engines && engines.length > 0);
      var messg = {
        msg: "init",
        engines: engineNames,
        defaultEngine: enginesAvailable && Services.search.defaultEngine
                         ? Services.search.defaultEngine.name : null
      }
      Services.obs.notifyObservers(null, "embed:search", JSON.stringify(messg));
    });
    break;
  }
We should get some debug output printed from this file to find out what's going on. Here's the newly annotated version:
  case "profile-after-change": {
    dump("SEARCH: profile-after-change\n");
    Services.obs.removeObserver(this, "profile-after-change");
    Services.search.getEngines().then((engines) => {
      dump("SEARCH: engine num: " + engines.length + "\n");
      let engineNames = engines.map(function (element) {
        dump("SEARCH: engine name: " + element.name + "\n");
        return element.name;
      }); 
[...]
And here's the output we get from this:
UserAgentOverrideHelper app-startup
SEARCH: profile-after-change
Created LOG for EmbedPrefs
[...]
Frame script: embedhelper.js loaded
SEARCH: getFixupURIInfo() fixupFlags: 8
SEARCH: engine num: 0
So no search engines. There should be search engines.

We need to follow up on this line here:
Services.search.getEngines()
This takes us all the way back to SearchService.jsm. Sometimes there are these strange circles in the code, but we're not quite back all the way to where we started, we're definitely making progress.

It's getting late here. I was really hoping to get this all sorted today, but it's turned out to be a deeper and more thorny problem than I'd anticipated. Not to worry, idle hands and all that; we're continuing to make important fixes as we work through the code. So we'll get there.

If you'd like to catch up on all the diary entries, they're all available on my Gecko-dev Diary page.
Comment