Personal Blog

View the blog index.

RSS feed Click the icon for the blog RSS feed.


5 most recent items

21 Jul 2024 : Day 295 #
Yesterday, you may recall, I didn't do any coding but rather started looking through some changes from Raine that I needed to incorporate into my local codebase. I'm continuing that today, in the hope of wrapping that up and then moving on to some of the other issues in the issue tracker on GitHub.

So I've reviewed and approved both of Raine's PRs. The first, which I already looked at yesterday, updates the sailfish-components-webview code to support the fromExternal flag. This change is needed to get the code to compile against the changes already made to gecko.

The second switches instances of XPCOMUtils for ComponentUtils since upstream has moved the generateNSGetFactory() method from the former to the latter. I'd already made this change in a few places, but Raine has propagated it across the entire codebase, which is great. After looking through and checking it, I've now approved it.

Those are the two PRs I can see, so I've moved on to the issues. I've marked Issue 1024 ("Restore WebRTC code to ESR 91") as completed. I addressed this issue in some of the most recent changes I made (between days 278 and 286). I've also closed Issue 1020 ("Fix ESR 91 rendering pipeline"). If you've been following along you'll know that this was one of the largest pieces of work I had to do, which I spent a total of 152 days on. First for the main browser rendering between days 51 and 83. Then for the WebView rendering between days 159 and 245. Then following on from this up to Day 277 for the WebGL. So I'm frankly pretty pleased to be able to finally close it.

To wrap up I've decided to start looking at Issue 1053, which is an epic test issue comprised of 22 separate things to check. It looks long, but at this stage I'm hoping I can go through and tick them all off pretty swiftly (we'll have to see whether this actually happens or not!).

The very first thing I try is private browsing mode as it looks like one of the easier things to check.

Immediately I hit a problem. You'll recall that yesterday I closed Issue 1051 ("Fix hang when calling window.setBrowserCover()") thinking it was fixed. Well, I was wrong, it is still there after all. So now I'll need to look in to this further.

The problem manifests itself when you switch between normal and private browsing. As I discovered first time around, the bug causes a hang rather than a crash. Which means I have to interrupt the execution in order to get a backtrace:
Thread 1 "sailfish-browse" received signal SIGINT, Interrupt.
0x0000007fef866718 in pthread_cond_wait () from /lib64/
(gdb) bt
#0  0x0000007fef866718 in pthread_cond_wait () from /lib64/
#1  0x0000007fef979924 in QWaitCondition::wait(QMutex*, unsigned long) () from /
#2  0x0000007ff0afde08 in ?? () from /usr/lib64/
#3  0x0000007ff0b00aa8 in ?? () from /usr/lib64/
#4  0x0000007ff0b01270 in ?? () from /usr/lib64/
#5  0x0000007ff05503dc in QWindow::event(QEvent*) () from /usr/lib64/
#6  0x0000007ff0b307b8 in QQuickWindow::event(QEvent*) () from /usr/lib64/
#7  0x0000007fefb31144 in QCoreApplication::notify(QObject*, QEvent*) () from /
#8  0x0000007fefb312e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/
#9  0x0000007ff0546488 in QGuiApplicationPrivate::processExposeEvent(
    QWindowSystemInterfacePrivate::ExposeEvent*) () from /usr/lib64/
#10 0x0000007ff05470b4 in QGuiApplicationPrivate::processWindowSystemEvent(
    QWindowSystemInterfacePrivate::WindowSystemEvent*) ()
   from /usr/lib64/
#11 0x0000007ff05256e4 in QWindowSystemInterface::sendWindowSystemEvents(
    QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib64/
#12 0x0000007fe7882c4c in ?? () from /usr/lib64/
#13 0x0000007fef2cfd34 in g_main_context_dispatch () from /usr/lib64/
#14 0x0000007fef2cffa0 in ?? () from /usr/lib64/
#15 0x0000007fef2d0034 in g_main_context_iteration () from /usr/lib64/
#16 0x0000007fefb83a90 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop:
    :ProcessEventsFlag>) () from /usr/lib64/
#17 0x0000007fefb2f608 in QEventLoop::exec(QFlags<QEventLoop::
    ProcessEventsFlag>) () from /usr/lib64/
#18 0x0000007fefb371d4 in QCoreApplication::exec() () from /usr/lib64/
#19 0x000000555557bf88 in main (argc=<optimized out>, argv=<optimized out>) at 
Getting a backtrace this was is always unsatisfactory because you don't know whether you stopped on the correct thread, or somewhere totally unrelated in one of the other threads that's whirring away at the same time. Gecko has tens of threads running simultaneously (last time this happened it was 70 in total), so that adds a pretty big dose of uncertainty.

I looked at this originally back on Day 122 and kept a record of all of the backtraces for all of the threads back then. They weren't, I have to say, super enlightening and I don't plan to repeat the process again.

The way I fixed it at the time was by amending BrowserPage.qml to remove the following line:
Here's that line in context:
    // Use Connections so that target updates when model changes.
    Connections {
        target: AccessPolicy.browserEnabled && webView && webView.tabModel || 
        ignoreUnknownSignals: true
        // Animate overlay to top if needed.
        onCountChanged: {
            if (webView.tabModel.count === 0) {
Today I'm going to comment that line out again, just to see whether this is actually an error with private browsing or not. When I do this there are immediately a couple of further errors coming from the QML:
[W] unknown:130 - file:///usr/share/sailfish-browser/pages/components/
    Overlay.qml:130: Error: Insufficient arguments
Thankfully these QML errors are easier to fix. You may recall that some time back I was having trouble with DuckDuckGo rendering. The issue turned out to be related to the Sec-Fetch-* headers being set incorrectly. You can refer back to what I wrote on Day 140 for some of the background if you're interested. But the crucial point is that I had to add a fromExternal flag to various methods, so that it could be passed on for use with the Sec-Fetch-* headers.

It turns out I'd not added a parameter for this flag to all the places where it's called. To fix this I've added the fromExternal to all of the methods that need it.

Following this change the private browsing now seems to work as expected. I still need to fix the hang, but that'll be a task for tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
20 Jul 2024 : Day 294 #
No coding today. Almost every day the first thing I do is dive into the code: reading it, writing it, debugging it. After 294 days (with a few gaps, admittedly) I've grown accustomed to this way of working. Diving in to the code each morning has been comforting: a natural context switch between the harsh vagaries of the world and the reassuring consistency of the source.

(I'm overdoing it of course. Many times the code has been frustrating, incomprehensible and capricious. But I'm going to forget that today for the sake of a more convenient narrative).

The point I'm trying to get to is that, like a furry friend in a nineties Grolsch advert, the browser is not ready yet. But despite that, it feels like it's time for a step change. From focusing entirely on development to figuring out where things go from here and how the browser actually gets into users' hands.

So no coding today. Instead I'm going to be looking through the issues on GitHub, replying to comments and trying to figure out the best way to make packages available.

Things could fail disastrously at this point. If the browser goes out too early and doesn't have the functionality people are expecting, then all this effort will be wasted. On the other hand, I know the Sailfish community is driven by optimism and a sense of adventure: willing to try experimental software and also understanding that there are some sacrifices needed for the sake of user freedoms.

But the entire motivation for this work comes from an understanding that a phone needs a browser. It's critical infrastructure, not nice-to-have. So it really does have to work. In fact, ESR 91 has to work better than ESR 78 if the effort is going to have been worth it.

So let me talk briefly about what I've actually done today. Raine (rainemak) has gone to the trouble of testing out the code using the latest internal Jolla development build of Sailfish OS and the Sailfish SDK. When the next version of Sailfish OS is released, it'll be a version of this build and built using this SDK, so the browser has to work with it.

Raine had to make some changes to get it to work and I had a look through the changes. The changes Raine made to embedlite-components aligned with changes that I'd also had to make, so that's an easy case. But for the gecko-dev repository things are a bit more unexpected. To get things to run Raine has also had to make various changes to the privileged JavaScript.

After looking through Raine's PRs I need to incorporate his changes into mine, update my SDK to the latest Early Access release ( and rebuild everything using that from scratch. Unfortunately I didn't get a chance to do that today, but I'll pick it up tomorrow and spend a bit more time on it.

Doing this non-coding work is important, but I find I'm not very efficient at it and it's harder to gauge progress compared to making commits. But I'm hoping it's also a sign of where things are at with the browser. It feels like this is the start of the descent towards our destination.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
19 Jul 2024 : Day 293 #
Yesterday I was able to complete a couple of commits to fix some issues. I really want to get on to doing some non-coding work, but before I do, there's one last issue that I really need to address. Back on Day 126 I was having trouble with the cover of the browser app. I created a bug to remind myself to come back to it. Despite that I still managed to forget about it until I suddenly realised this morning.

I do need to fix it though. The bug prevented the cover app from showing a preview of the current page and it'd be rubbish if we had to release the browser without this working properly. On the other hand I'm really hoping that the issue is related to the offscreen rendering and that all of the changes made to fix that will magically allow the cover rendering to work again.

As the issue on GitHub explains:
The call to window.setBrowserCover(webView.tabModel) in BrowserPage.qml causes the user interface to hang. To reproduce:
  1. Run the browser.
  2. Select the "Tab" icon on the far left hand side of the toolbar.
  3. Select the "Private" browser tab icon (Batman).
  4. Notice that the app hangs (but doesn't crash) as the page switches from persistent to private mode.

The change made back then to work around the issue was the following:
git diff 51a72ef86825dcf0deca5ab3adc493247768eaee 
diff --git a/apps/browser/qml/pages/BrowserPage.qml b/apps/browser/qml/pages/
index 4373eeef..e0fb48c5 100644
--- a/apps/browser/qml/pages/BrowserPage.qml
+++ b/apps/browser/qml/pages/BrowserPage.qml
@@ -225,7 +225,7 @@ Page {
             if (webView.tabModel.count === 0) {
-            window.setBrowserCover(webView.tabModel)
+            //window.setBrowserCover(webView.tabModel)
This is a great place to start from because the change is in QML code. This means reversing this change is not only just a matter of uncommenting a single line, but it can also be done entirely without needing to do any recompilation.
devel-su vim /usr/share/sailfish-browser/pages/BrowserPage.qml
Having made the change, the good news is that it does now work, immediately, without any further changes. This isn't entirely a shock. As I mentioned earlier, there was always a good chance that the existing changes would fix this. The cover rendering makes use of the offscreen render pipeline, which was broken before but is now fixed.

I also checked that WebGL still works when rendered on the cover as well. So with this done I just have to now commit my minimal changes and that should be enough to justify closing the issue on GitHub. For reference, here's the commit I'll be reversing:
$ git log -1 8a8c474abed7d95ac9b32cbfa6fe90275cf97631
commit 8a8c474abed7d95ac9b32cbfa6fe90275cf97631
Author: David Llewellyn-Jones <>
Date:   Fri Dec 29 11:10:03 2023 +0000

    [browser] Disable window.setBrowserCover
    The window.setBrowserCover() call causes the browser to hang when
    rendering with ESR 91. This needs to be fixed, but in the meantime
    disabling the call prevents the hang from occurring.
    This change prevents the browser page from being rendered on the app
With this change committed my plan is to take a break from coding for a bit now. I'm going to continue writing these diary entries and working on the browser, but I've built up a backlog of administration that I really need to deal with. This means reading through (and responding to) comments on the issue tracker, getting some test packages built for others to use, performing some tests, that kind of thing.

I enjoy working on the code and use it as a way to recover from the real world. Maybe even to hide from it sometimes. It allows me to focus on something controllable while ignoring all of the messy uncontrollable stuff going on elsewhere. I don't think I'm unusual in this. I'm sure other people find their own ways to escape from reality periodically.

But with a project like this it's also important to occasionally step back, take stock, make course corrections and deal with the points of intersection with reality. And that's what I plan to do tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
18 Jul 2024 : Day 292 #
The build I started yesterday had completed by the time I got up this morning. I'm still working on avoiding a crash in the hyphenation code with the latest build by adding an early return to the find_hyphen_values() method of the Hyphenator class in mapped_hyph/src/

This time our test page loads and renders without crashing. This tells us two things. First that the partial build wasn't covering the Rust code, as I suspected. Second that the problem is happening in the Rust code itself. This doesn't preclude the possibility that the error is originating outside the Rust code, but it does at least give us something to focus on.

The header of the find_hyphen_values() method mentions a couple of situations in which a panic can be triggered. It's worth taking a look at these since a panic could be responsible for bringing down the browser. One obvious situation in which a panic will be triggered from the Rust code is as a result of the following line, the first line of the method:
        assert!(values.len() >= word.len());
That's not a debug assert, so it'll apply in the release build as well. I can't just remove it, since although that might stop the crash happening at that point in the code, it'll leave the browser in an unsafe state. The assert is there for a reason after all. So instead I've turned it into a condition with an early return to avoid the possibility of a panic:
$ git diff -U1
diff --git a/third_party/rust/mapped_hyph/src/ b/third_party/rust/
index 848c93d25790..c205bb09359c 100644
--- a/third_party/rust/mapped_hyph/src/
+++ b/third_party/rust/mapped_hyph/src/
@@ -477,3 +477,6 @@ impl Hyphenator<'_> {
     pub fn find_hyphen_values(&self, word: &str, values: &mut [u8]) -> isize {
-        assert!(values.len() >= word.len());
+        if (values.len() < word.len()) {
+            return 0;
+        }
+        //assert!(values.len() >= word.len());
         values.iter_mut().for_each(|x| *x = 0);
However, as I read through the code and look again at the backtrace, this looks highly unlikely to be the problem. Although we don't know the exact line, it's the level() method that's at the top of the stack when the crash occurs. So actually we know the crash is happening there. Here's what the code there looks like:
    fn level(&self, i: usize) -> Level {
        let offset = u32::from_le_bytes(*array_ref!(self.0, FILE_HEADER_SIZE + 
    4 * i, 4)) as usize;
        let limit = if i == self.num_levels() - 1 {
        } else {
            u32::from_le_bytes(*array_ref!(self.0, FILE_HEADER_SIZE + 4 * i + 
    4, 4)) as usize
        debug_assert!(offset + LEVEL_HEADER_SIZE <= limit && limit <= 
        debug_assert_eq!(offset & 3, 0);
        debug_assert_eq!(limit & 3, 0);
        Level::new(&self.0[offset .. limit])
It's challenging to understand what might be going wrong here. It looks to me like this is where the requirement for self.0 to be valid (mentioned as the other reason for a potential panic) comes into play. I'm now wondering whether the problem might be an invalid hyphenator file being loaded in, or it being invalid for some other reason.

So I've made an additional change to check that the hyphenator is invalid when the call to hyphenate a string is made, like this:
$ git diff -U1
diff --git a/intl/hyphenation/glue/nsHyphenator.cpp b/intl/hyphenation/glue/
index c3b377767e3f..974596b17115 100644
--- a/intl/hyphenation/glue/nsHyphenator.cpp
+++ b/intl/hyphenation/glue/nsHyphenator.cpp
@@ -348,2 +348,6 @@ nsresult nsHyphenator::Hyphenate(const nsAString& aString,
+  if (!IsValid()) {
+    return NS_ERROR_FAILURE;
+  }
   bool inWord = false;
I've set it rebuilding with these changes, which will take at least a few hours to complete. In the meantime, I noticed the following error when loading the test page:
JavaScript error: resource://gre/modules/amInstallTrigger.jsm, line 43: 
    TypeError: is null
That error seems to be coming from the gecko code itself, rather than some JavaScript loaded by the page, so this is also something it would be worth fixing. I'm sure it's unrelated to the crash, but I may as well look into it while the build completes.

Here's the method inside amInstallTrigger.jsm where this error is being generated:
function RemoteMediator(window) {
  this._windowID = window.windowGlobalChild.innerWindowId; = window.docShell.messageManager;, this);

  this._lastCallbackID = 0;
  this._callbacks = new Map();
It looks likely that the call to get window.docShell.messageManager is returning null. Here's the code from nsDocShell which is supposed to return it:
nsDocShell::GetMessageManager(ContentFrameMessageManager** aMessageManager) {
  RefPtr<ContentFrameMessageManager> mm;
  if (RefPtr<BrowserChild> browserChild = BrowserChild::GetFrom(this)) {
    mm = browserChild->GetMessageManager();
  } else if (nsPIDOMWindowOuter* win = GetWindow()) {
    mm = win->GetMessageManager();
  return NS_OK;
So the error goes up, potentially, to EmbedLite code. It looks like this might be related to patch 0043, which patches this method and which hasn't get been applied. The patch has the title "Get ContentFrameMessageManager via nsIDocShellTreeOwner" with the following description:
nsDocShellTreeOwner has a reference to WebBrowserChrome which in turn can return ContentFrameMessageManager from BrowserChildHelper.

So I'm going to try applying the patch; maybe that will fix this error?
$ git am --3way ../rpm/
Applying: Get ContentFrameMessageManager via nsIDocShellTreeOwner. JB#55336 
Nice: the patch applies cleanly! Now the GetMessageManager() method looks like this:
nsDocShell::GetMessageManager(ContentFrameMessageManager** aMessageManager) {
  RefPtr<ContentFrameMessageManager> mm;

  nsCOMPtr<nsIBrowserChild> bc;
  if (mTreeOwner) {
    bc = do_GetInterface(mTreeOwner);

  if (bc) {
  } else if (RefPtr<BrowserChild> browserChild = BrowserChild::GetFrom(this)) {
    mm = browserChild->GetMessageManager();
  } else if (nsPIDOMWindowOuter* win = GetWindow()) {
    mm = win->GetMessageManager();
  return NS_OK;
As we can see, an extra clause has been added to the condition, to catch the case where the mTreeOwner variable has an nsIBrowserChild XPCOM interface. If it does then the message manager will be extracted from there.

The hyphenation build has gone through and I've copied the packages over to my phone. That means I can now set this change building instead. In the meantime let's return to the hyphenation crash. I've tried a few different arrangements of things now, rebuilding the code and transferring the result over to my phone. What I've discovered is that adding a condition to check the validity of the hyphenator made no difference to the crash; so I've removed that.

However, rather unexpectedly, after converting the assert that tests whether the values return buffer is at least as large as the word string into a condition with an early return, the site now no longer crashes the browser.

Half of me is surprised, the other half isn't. I'm not surprised because this was clearly signposted code designed to cause a panic. On the other hand, I'm surprised because this shouldn't be happening; and also increasing the output buffer size didn't appear to help. So there may be something deeper going on here, but for now I'm going to take the win.

I feel satisfied with these changes today. I've been able to fix a JavaScript error and a crash bug. While I can't claim any credit for finding a solution (Raine, who authored the patch, deserves all of that), it's nonetheless good to be making progress.

Tomorrow I'll need to pick another task from the issue tracker to take a look at. Although, in fact, I think I'll do some housekeeping and try to catch up on some of the work others have been doing. Now that I look at the issue tracker I see some exciting developments there that I really need to examine.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
17 Jul 2024 : Day 291 #
Yesterday we were looking at a crash in the hyphenator, triggered by visiting a page on It makes for an interesting bug because it's the first time — that I can recall at least — that we've been hit with an issue related to the Rust code, other than as part of the build pipeline. I'm not really certain what the problem is, but given the comments we saw yesterday, it could be either an overflow in the output buffer, or the block of memory represented by self.0 being invalid. According to the header, either of these could cause a panic.

It looks like debugging the Rust code may turn out to be tricky, so in the first instance I've gone about disabling one of the calls further down the call stack so that the crashing code is never called.

This is a precursor to trying to establish the real solution. At this stage I just want to find out whether this fixes the issue or not. My fake fix is to early return from the nsHyphenator::HyphenateWord() method like this:
   AutoTArray<uint8_t, 200> hyphenValues;
+  return;
   int32_t result = mDict.match(
       [&](const void*& ptr) {
         return mapped_hyph_find_hyphen_values_raw(
             static_cast<const uint8_t*>(ptr), mDictSize, utf8.BeginReading(),
             utf8.Length(), hyphenValues.Elements(), hyphenValues.Length());
You may notice that I've made the change to the C++ code rather than the Rust code. That's mostly for convenience because I don't actually know whether the partial build process will notice and rebuild changes in the Rust code. I'm sure I'll find out, but right now I'm just trying to find an answer as quickly as possible.

Other than the panic theory, there are other reasons why the code may be crashing. It could simply be that the stack is being exhausted. I must admit that I'm not convinced by this theory: the backtrace doesn't look corrupted and although the stack is very large (177 frames) it's not uncommon for the gecko stack to get quite large as the DOM is traversed. The change I've made now should at least help shed some light on this: if the crash persists, then the exhausted-stack theory becomes more persuasive.

The partial build completed with this small change and I've copied the library over to my phone. Now will it crash?

No; there's no crash. Presumably hyphenation is now no longer working correctly, but it's hard to tell that just from the way the page is laid out. Nevertheless, it looks like we've pinned down the location of the problem, even if it's not clear why it's happening. I thought I'd also have a go at increasing the size of the buffer for the return values. I've extended it by an additional 1024 bytes:
   AutoTArray<uint8_t, 200> hyphenValues;
-  hyphenValues.SetLength(utf8.Length());
+  hyphenValues.SetLength(utf8.Length() + 1024);
   int32_t result = mDict.match(
       [&](const void*& ptr) {
         return mapped_hyph_find_hyphen_values_raw(
             static_cast<const uint8_t*>(ptr), mDictSize, utf8.BeginReading(),
             utf8.Length(), hyphenValues.Elements(), hyphenValues.Length());
But with this change the crash still occurs. It's possible I didn't choose a large enough amount to add to the buffer, but this seems unlikely to me.

As the next test I've amended the Rust code so that the find_hyphen_values() method in mapped_hyph/src/ returns early.
     pub fn find_hyphen_values(&self, word: &str, values: &mut [u8]) -> isize {
         assert!(values.len() >= word.len());
         values.iter_mut().for_each(|x| *x = 0);
         let top_level = self.level(0);
         let (lh_min, rh_min, clh_min, crh_min) = top_level.word_boundary_mins(
         if word.len() < lh_min + rh_min {
             return 0;
+        if word.len() >= lh_min + rh_min {
+            return 0;
+        }
This may look a little contrived: the two conditions together are equivalent to a guaranteed return. But I didn't want to just add a return here because the Rust compiler is quite thorough. If it had noticed the guaranteed early return it might have also complained about unused functionality later in the method.

I'm interested to know whether this will be enough to fix the crash. If it is, I can play around with the code a bit more to try to isolate the error. However I'm not even certain whether the partial build is actually rebuilding the Rust code or not. I think not, but this test should help determine this with more certainty.

Having run the partial build and copied over the binary, I find that the crash still happens on the page. That makes me think that the partial build wasn't enough to capture the changes to the Rust code. So my next step will be to do a full rebuild and have a go with the output from that once it's ready in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.