flypig.co.uk

Gecko-dev Diary

Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.

Latest code changes are in the gecko-dev sailfishos-esr91 branch.

There is an index of all posts in case you want to jump to a particular day.

Gecko RSS feed Click the icon for the Gecko-dev Diary RSS feed.

Gecko

5 most recent items

26 Jul 2024 : Day 300 #
Over the last few days I've been working through the testing tasks in Issue #1053 on the sailfish-browser issue tracker on GitHub. Yesterday I got to the end of the list, with the result that 17 out of 22 tests passed (feels pretty good), but with five tests failing. In addition quite a few of the tests generated error output.

Here are the five failing cases that need looking in to:
  1. Video rendering and controls: total fail.
  2. Audio output and controls: partial fail.
  3. Password storage: total fail.
  4. Automatic dark/light theme switching: partial fail.
  5. Everything on the browser test page: fails: single select widget; external links; full screen; double-tap.
In addition to the above there were several cases where error output was shown on the screen:
// Print to PDF
JavaScript error: resource://gre/actors/BrowserElementParent.jsm, line 24: 
    TypeError: browser is null
// Exiting the browser
JavaScript error: file:///usr/lib64/mozembedlite/components/
    EmbedLiteChromeManager.js, line 170: TypeError: chromeListener is undefined
// Saving a downloaded file
JavaScript error: resource://gre/modules/pdfjs.js, line 29: 
    NS_ERROR_NOT_AVAILABLE: 
// Login manager
JavaScript error: resource://gre/modules/LoginHelper.jsm, line 1734: TypeError: 
    browser is null
So that's a quick summary of the things that need to be fixed. Today I'm going to start looking at the password storage and login manager, which seems to be the biggest failure right now (some may argue video is more important, but that also falls further outside my area of expertise).

As with much of the other functionality, Jolla has a handy login test page. There's no backend functionality to the page, but it allows you to enter fake credentials which, if things are working, the browser should pick up and offer to store for future reuse.

On ESR 78 this works correctly, but on ESR 91 it fails with the following error message:
JavaScript error: resource://gre/modules/LoginHelper.jsm, line 1734: TypeError: 
    browser is null
An error message is an error message, but what's odd about this one is that probably the LoginHelper.jsm functionality shouldn't be being queried at all. To corroborate this I put some debug output inside the getBrowserForPrompt() method on ESR 78 and checked whether it got printed when saving a password. It didn't.

The reason for this quickly becomes apparent when I check the ESR 78 patches. Patch 0082 has the title "Allow LoginManagerPrompter to find its window" and comes with the following description:
 
This patch blocks loading of gecko's LoginManagerAuthPrompter.jsm so that the embedlite-components version can be used instead.
It also patches the nsILoginManagerPrompter interface to allow a reference to the window to be passed through, to allow the embedlite component to understand its context.

Finally it patches ParentChannelListener to pass the correct window object through to the nsILoginManagerAuthPrompter component.


That very first line "so that the embedlite-components version can be used instead" is crucial. Without this patch the upstream login manager will be used. We want our Sailfish-specific login manager to be used instead, which means tweaking the guts of gecko to allow it.

Checking the ESR 91 source clearly shows that this patch hasn't yet been applied, so now would be the time to do this.

Attempting to apply the patch directly fails, but brilliantly, attempting a 3-way merge succeeds:
$ git am ../rpm/0082-sailfishos-gecko-Allow-LoginManagerPrompter-to-find-.patch
Applying: Allow LoginManagerPrompter to find its window. JB#55760, OMP#JOLLA-418
error: patch failed: toolkit/components/passwordmgr/nsILoginManagerPrompter.idl:
    29
error: toolkit/components/passwordmgr/nsILoginManagerPrompter.idl: patch does 
    not apply
Patch failed at 0001 Allow LoginManagerPrompter to find its window. JB#55760, 
    OMP#JOLLA-418
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am 
    --abort".
$ git am --abort
$ git am --3way ../rpm/
    0082-sailfishos-gecko-Allow-LoginManagerPrompter-to-find-.patch
Applying: Allow LoginManagerPrompter to find its window. JB#55760, OMP#JOLLA-418
Using index info to reconstruct a base tree...
M       netwerk/protocol/http/ParentChannelListener.cpp
M       toolkit/components/passwordmgr/LoginManagerParent.jsm
M       toolkit/components/passwordmgr/components.conf
M       toolkit/components/passwordmgr/nsILoginManagerPrompter.idl
Falling back to patching base and 3-way merge...
Auto-merging toolkit/components/passwordmgr/nsILoginManagerPrompter.idl
Auto-merging toolkit/components/passwordmgr/components.conf
Auto-merging toolkit/components/passwordmgr/LoginManagerParent.jsm
Auto-merging netwerk/protocol/http/ParentChannelListener.cpp
Most of the changes here are to the JavaScript, so could potentially be applied dynamically for testing. However there's also a change to an interface IDL file, plus it's the start of my working day so I won't be able to return to this for the next eight hours anyway, so I may as well kick off a build. When I return, if things have gone well, I'll be able to test out this change.

[...]

The build went through successfully, but when I try to use the login manager I now get the following error:
JavaScript error: file:///usr/lib64/mozembedlite/components/
    LoginManagerPrompter.js, line 1530: ReferenceError: ComponentUtils is not 
    defined
JavaScript error: resource://gre/modules/XPCOMUtils.jsm, line 161: 
    NS_ERROR_XPC_GS_RETURNED_FAILURE: ServiceManager::GetService returned 
    failure code:    
For the first of these errors it looks like I'll just need to add the following line to the top of the LoginManagerPrompter.js file (see for example the changes made by Raine in embedlite-components Issue #99)
const { ComponentUtils } = ChromeUtils.import("resource://gre/modules/
    ComponentUtils.jsm");
The good news is that in addition to making this change in the package source, I can also make it directly on my device for immediate testing.
devel-su vim /usr/lib64/mozembedlite/components/LoginManagerPrompter.js
[...]
With this line added, both errors are now fixed and the login manager prompter is working correctly! That means I've also now finally been able to test clearing of the password data as well. All is working correctly and without error messages.

I'm going to call it a day. Tomorrow I'll look at the failing "Automatic dark/light theme switching".

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
25 Jul 2024 : Day 299 #
I'm at 15 out of 22 tests this morning with 14 passes and one fail. Next on the list is permissions: "location, pop-ups, cookies, camera, microphone."

First up is location. The permission part of this seems to be working correctly. From Jolla's positioning test page, when I select "Get position" the permissions dialog appears. If I deny access the page announces correctly "User denied geolocation prompt". I can then reset the permission via the little browser lock icon in the address bar.

Now if I allow the geolocation permission rather than denying it, a little "toast" notification appears on the screen to say "Positioning is disabled". On the console I get the following output:
ContentPermissionPrompt.js on message received: top:embedui:permissions, msg:
    {"allow":true,"checkedDontAsk":false,"id":
    "browser.sailfishos.org geolocation"}
[D] unknown:0 - Geoclue client path: "/org/freedesktop/Geoclue/Master/
    client1"
My conclusion is that the positioning permissions dialog is working just fine, even if positioning itself isn't. And in fact, when I try to do the same thing on ESR 78, not only do I get the same "Positioning is disabled" toast, I also get an identical error at the console:
ContentPermissionPrompt.js on message received: top:embedui:permissions, msg:
    {"allow":true,"checkedDontAsk":false,"id":
    "browser.sailfishos.org geolocation"}
[D] unknown:0 - Geoclue client path: "/org/freedesktop/Geoclue/Master/
    client1"
This might need a little more investigation, but if the functionality is identical across the two versions, then I'll take that as a good sign.

For the popup permission the behaviour is a little odd. If I visit Jolla's pop-up test page and wait a couple of seconds the pop-up permission dialog appears. I can select Deny or Allow but the result appears to be the same either way: the pop-up never opens. If I select to remember my choice, no change is made to the underlying permissions, which I can check using the little padlock in the address bar.

However, if I change the permission from "Deny" to "Allow" via the padlock and select the link to open another pop-up? Well, then the pop-up opens correctly.

I've always been confused by this functionality: if I select to permanently set the state to "Allow", shouldn't all future pop-ups be allowed, at least until I remove the setting? It doesn't feel quite right to me, but it turns out it does at least match the approach from ESR 78.

So while I'm not entirely comfortable with how it's working, given the behaviour matches ESR 78, I'm considering this a pass.

Next up cookies. I'm using another of Jolla's test pages for this, which allows you to set a cookie, then check its value after restarting the browser.

I've checked that the cookie gets successfully set when Allowed and blocked when Denied. I've also checked that if cookies are blocked in general but with an exception for a particular site, then the cookie is nevertheless stored correctly when visiting a page from the site. Everything on ESR 91 works the same as on ESR 78 and it all feels intuitive and correct.

For the camera and microphone I'm using the same page as I did for the WebRTC tests, Mozilla's getUserMedia Test Page. Although the camera still has the same colouring issue from earlier, everything is otherwise good. And specifically the permission dialogs do their job as expected.

So location, pop-ups, cookies, camera and microphone permission dialogs are all working correctly. I've updated the issue on GitHub to reflect this.

Next I'm going to find out what Happens when I clear the browser data. There's an option in the settings for this, with various different categories available for clearing: history, cookies, passwords, cache, bookmarks and site permissions.

History, cookies, cache, bookmarks and site permissions all appear to work as expected. Unfortunately I'm not able to test password clearing because the functionality to add passwords is currently broken. But I'll come back to that when it's fixed.

For the dark/light theming, switching between one of the fixed values (light or dark) from the settings page works as expected: the site updates its ambience to match (I'm testing using DuckDuckGo). But switching between light and dark phone ambiences doesn't update the site, even though it successfully updates the user interface elements. So that's going to have to be a fail for now.

For audio testing the results are also unfortunately a fail. I'm testing using BBC Sounds which works fine on ESR 78. But on my ESR 91 build we don't get any audio, just an error message that states "This content doesn't seem to be working". Disappointing.

I get the same with the BBC iPlayer for video: it works on ESR 78 but not ESR 91. When using Jolla's video test page I get the same experience. On YouTube as well.

This isn't, to be honest, much of a surprise. I've not applied the changes needed to get audio and video working yet. It's not all bad news though. For example the audio and audio controls on Jolla's audio test page are working correctly. So it looks like the problems are down to the available codecs, rather than something more fundamentally broken with the way audio or video works (or doesn't, in this case).

The final test is for "Everything on the browser test page". Which is a bit nebulous if I'm honest, but I'll still give it my best shot.

All of the prompt dialogs work fine. The multi-select groups work, but the single select widget actually managed to crash the browser. So that's something to look in to.

Text input, radio buttons and checkboxes are all working fine. History (back and forward) works as expected.

Mouse click positioning is looking good.

Interestingly, external links (for example to email or the phone app) are not working. There's no error in the output console either, which won't make fixing the issue any easier. But for now, I just need to identify the fact that this is broken.

The user agent string is good. Window opening and file pickers all work as expected. Localisation, anchors, CSS, Storage, Service Workers are all working.

Full screen doesn't appear to be working. There's also a difference in behaviour when double-tapping. On ESR 78 the double tap goes through, but on ESR 91 it zooms to the enclosing box item instead, as it does with non-selectable items. This will need fixing.

Everything else on the test pages works fine. So while it's not an unambiguous pass, it's not far off.

So that's everything in the list of tests. Seventeen out of twenty two tests passed. Three were partial failures and two were total failures. Here's the full list:
  1. Video rendering and controls: total fail.
  2. Audio output and controls: partial fail.
  3. Private browsing: pass.
  4. Search on page: pass.
  5. Share link: pass.
  6. Save web page as PDF: pass.
  7. Desktop/mobile view switching: pass.
  8. Bookmarks: pass.
  9. History: pass.
  10. Downloads (including setting save destination): pass.
  11. Configuration using about:config: pass.
  12. Home page functionality: pass.
  13. Search providers: pass.
  14. Close tabs on exit: pass.
  15. Do not track: pass.
  16. JavaScript enable/disable toggle: pass.
  17. Password storage: total fail.
  18. Permissions: location, pop-ups, cookies, camera, microphone: pass.
  19. Clearing the browser data: pass.
  20. Automatic dark/light theme switching: partial fail.
  21. Everything on the browser test page: fails: single select widget; external links; full screen; double-tap.
  22. WebRTC audio and video: pass.
Honestly, I don't think that's looking too bad. Tomorrow I'll start working on the failing cases, the first of which will be the password storage.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
24 Jul 2024 : Day 298 #
I'm back to testing again today. Yesterday I worked through eight tests, all of which passed, but with three errors generated from the privileged JavaScript. The errors don't appear to be affecting the functionality adversely, but I'll have to go back and fix them errors at some point.

Today I'm forging on with the tests; fourteen remaining. But before I do that, check out this phenomenal image from artist-in-residence thigg, shared yesterday on Mastodon.
 
A pig with wings, dressed like a janitor, little smartphones are running around on their legs on the floor like little bugs. The pig sweeps a few of them into a basket with a broom.

Thigg has been responsible for all of the amazing artistic creations on these pages (check out Day 174 and Day 248 for just a couple of the many others) and I'm always bowled over by the creativity.

This one has a slightly different style compared to previous images. Here's what Thigg has to say about it:
 
Software development is an emotionally challenging task. You need to deal with lots of frustration, your self-confidence and you always need to fight the urge of being sucked in by something that shouldn't be on your priority list... Today I tried to show that you pick up some bugs, but others run around being ignored.

This captures how I've been feeling over the last few days perfectly! Thigg has it spot on. Since I left the cover image bug yesterday, I've been doing my best to move on and find more bugs. Today it's less about sweeping them up and more about search with a magnifying glass to find them!

Let's continue with this task by working through the remaining tests. For the about:config configuration test I found a webgl.disabled flag. Activating this flag had the desired effect of disabling WebGL rendering, also restored when I deactivated the flag again. So I've satisfied myself that the configuration is working correctly.

I then tested the homepage functionality, search providers and automatic tab closing.

For the "Do Not Track" test I used the very convenient requestheaders.dev site. This mirrors the request headers sent to the site back at you. When the Do Not Track option is set in the browser I see the following line appearing:
  "dnt": "1",
This indicates Do Not Track is enabled. After disabling it in the browser settings this header line disappears, just as it should. I also found another convenient site — jsstatus.com — the main purpose of which is to tell you whether JavaScript is enabled or not on your browser. Flipping the switch in the settings gives the appropriate response on this page as well.

While searching for this page I also hit the Basemark browser test page and couldn't resist giving it a go. The result showed that ESR 91 on Sailfish OS isn't particularly fast (compared to desktop Firefox), but its functionality is pretty much up there with the latest.

Here are the results.
 
Functionality Desktop Firefox Sailfish ESR 91
Performance 657.00 168.72
CSS Capabilities 59.66% 58.23%
HTML5 Capabilities 91.71% 92.43%
Page Load and Responsiveness Capabilities 96.94% 98.50%
Resize Capabilities 76.12% 75.86%


To be honest, I wouldn't read too much into these numbers, but they make for an interesting curio.
 
Four screenshots: Do Not Track headers, JavaScript enabled, JavaScript disabled and the Basemark browser test results

Finally I tested the login manager. Unfortunately this didn't do so well with the following error appearing:
JavaScript error: resource://gre/modules/LoginHelper.jsm, line 1734: TypeError: 
    browser is null
While we've seen similar errors before, this time it seems to to be preventing the login functionality from working at all, so seems to be more serious.

This null browser error is appearing all over the place, so I'd like to get it fixed. But I also suspect there may be something else going on with the login manager as well. My plan is to look in to this further over the coming days.

For today, I've ended up with the following tests passing:
  1. Configuration using about:config.
  2. Home page functionality.
  3. Search providers.
  4. Close tabs on exit.
  5. Do not track.
  6. JavaScript enable/disable toggle.
But also with the first failure as well:
  1. Password storage.
Hopefully the password storage won't remain broken for long.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
23 Jul 2024 : Day 297 #
Yesterday I said I was going to persevere with the hang caused by switching to a non-null cover. But I had a change of heart overnight. I have a tendency to get obsessed with small bugs like this because I'm desperate to know the reason for them. I know there is an answer to this, so the only thing stopping me from finding out is my own inadequacy. When you frame things like that it's easy to overstate the importance of something and end up prioritising the wrong thing.

But while I'd love to know the reason, I do appreciate there are more important things to be working on. I could lose days of work chasing an answer only to discover that someone smarter and better informed knows how to fix it already. During that time I could have fixed several other easier but more impactful bugs.

So I'm going to pause work on the hanging bug and move on to something else. If no solution appears naturally I'll return to it later.

Consequently I started off this morning by fixing all of the remaining cases where the fromExternal flag was needed in the front-end code. This was complicated somewhat by the fact that while it's needed for some calls to newTab(), there are also others where it's not. Some care was therefore needed.

But I think I got all of the relevant cases and none of the extraneous ones. I've committed and pushed my changes and since then I've had some time to spare today to look at other things.

I've gone on to working on Issue 1053 ("Test browser functionality with ESR 91"). This one issue is comprised of 22 subtasks, each of which involves testing some simple functionality of the browser.

So far I've tested the following functionalities, all of which are working as expected, at least to the extent I've been able to test:
  1. Private browsing.
  2. Search on page.
  3. Share link.
  4. Save web page as PDF.
  5. Desktop/mobile view switching.
  6. Bookmarks.
  7. History.
  8. Downloads (including setting save destination).
Since the functionality works I've ticked all of these off on the issue, which feels like a good start. However in some cases alongside the working functionality there were also some errors showing in the debug output.

Given that the errors aren't blocking the functionality, they can't be too serious, but I'm still keen to both document them here and also fix them if they're as straightforward as I hope they are.

The following error appeared while performing a print to PDF:
JavaScript error: resource://gre/actors/BrowserElementParent.jsm, line 24: 
    TypeError: browser is null
When exiting the browser the following error appears:
JavaScript error: file:///usr/lib64/mozembedlite/components/
    EmbedLiteChromeManager.js, line 170: TypeError: chromeListener is undefined
Finally When downloading a file to save it out, the following error appears:
JavaScript error: resource://gre/modules/pdfjs.js, line 29: 
    NS_ERROR_NOT_AVAILABLE: 
I'm not going to have time to look into these today, but my plan is to continue testing tomorrow, followed by trying to find simple solutions for each of the errors I encounter as I go through. But that's it for today; there'll be more testing tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
22 Jul 2024 : Day 296 #
Today I've been trying — really quite hard actually — to fix the hang that happens when switching to private browsing mode. Having looked at this before I refreshed myself yesterday to the point of understanding that it's due to the call to setBrowserCover() that happens when mode switch is made on the tab screen.

I've got a bit further today, the results of which only makes things even more confusing. To explain all this, it'll help to take a look at the setBrowserCover() method, which looks like this:
s
    function setBrowserCover(model) {
        if (!model || model.count === 0 || !WebUtils.firstUseDone) {
            cover = Qt.resolvedUrl("cover/NoTabsCover.qml")
        } else {
            if (cover != null && window.webView) {
                window.webView.clearSurface()
            }
            cover = null
        }
    }
Let's break this down a bit. The browser manages two tab models, one for normal browsing and the other for private browsing. When switching between the two modes one model is switched out for the other. This setBrowserCover() method is called just proceeding the change in model. So by the time we find ourselves in this method we've already switched the model to the one for private browsing.

Whenever the browser is closed any private browsing state — including the associated tab model — is either destroyed or forgotten. This includes the private browsing tab model. That means that having just opened the browser we know the private browsing tab model will have no tabs and so model.count in the above code will be zero.

That means we're going through the first half of the if statement above. There's only one line of functionality that therefore gets called as a result and that's the following:
            cover = Qt.resolvedUrl("cover/NoTabsCover.qml")
Typically the cover model for the browser will be set to null so that it shows the contents of the current page. If there are no pages open the cover is replaced, as we can see with this line of code, with the cover layout defined in the NoTabsCover.qml file.

So far so good. This is exactly what happens when we move to private browsing mode for the very first time. If I comment out the above line there are two consequences:
 
  1. When there are no active web pages the cover just shows a blank background.
  2. There's no hang.


We can conclude that it seems to be the act of setting the cover that's triggering the hang. This feels very strange because there's nothing special or magical about the cover or the way it gets switched in and out. I've tried a whole host of things in an attempt to get a clearer picture.

For example I wondered whether this was related to private browser mode or not, so I added a timer that switched out the cover after a delay of five seconds, irrespective of what's happening at the time. What I found was that this also hangs the browser, even you just have a static web page open and there's nothing exceptional happening. This suggests that it's not private browsing per se that's causing the problem, but rather the switching of the cover.

Intriguingly, if you do the switch while performing a pan and zoom, there's a crash instead of a hang. This has allowed me to collect the following backtrace:
[D] onTriggered:45 - Set cover: file:///usr/share/sailfish-browser/cover/
    NoTabsCover.qml
[New LWP 2607]
sailfish-browser: ../../../platforms/wayland/wayland_window_common.cpp:256: 
    void WaylandNativeWindow::releaseBuffer(wl_buffer*): Assertion `it != 
    fronted.end()' failed.

Thread 38 "Compositor" received signal SIGABRT, Aborted.
[Switching to LWP 2574]
0x0000007fef49a344 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000007fef49a344 in raise () from /lib64/libc.so.6
#1  0x0000007fef47fce8 in abort () from /lib64/libc.so.6
#2  0x0000007fef48ebd8 in ?? () from /lib64/libc.so.6
#3  0x0000007fef48ec40 in __assert_fail () from /lib64/libc.so.6
#4  0x0000007fe74e3044 in WaylandNativeWindow::releaseBuffer(wl_buffer*) () 
    from /usr/lib64/libhybris//eglplatform_wayland.so
#5  0x0000007fee8fa050 in ?? () from /usr/lib64/libffi.so.8
#6  0x0000007fee8f65f8 in ?? () from /usr/lib64/libffi.so.8
#7  0x0000007fe7795f98 in ?? () from /usr/lib64/libwayland-client.so.0
#8  0x0000007fe7792d80 in ?? () from /usr/lib64/libwayland-client.so.0
#9  0x0000007fe7794038 in wl_display_dispatch_queue_pending () from /usr/lib64/
    libwayland-client.so.0
#10 0x0000007fe74e3204 in WaylandNativeWindow::readQueue(bool) () from /usr/
    lib64/libhybris//eglplatform_wayland.so
#11 0x0000007fe74e23ec in WaylandNativeWindow::finishSwap() () from /usr/lib64/
    libhybris//eglplatform_wayland.so
#12 0x0000007fef090210 in _my_eglSwapBuffersWithDamageEXT () from /usr/lib64/
    libEGL.so.1
#13 0x0000007ff2397110 in mozilla::gl::GLLibraryEGL::fSwapBuffers (
    surface=0x5555991a60, dpy=<optimized out>, this=<optimized out>)
    at gfx/gl/GLLibraryEGL.h:303
#14 mozilla::gl::EglDisplay::fSwapBuffers (surface=0x5555991a60, 
    this=<optimized out>)
    at gfx/gl/GLLibraryEGL.h:694
#15 mozilla::gl::GLContextEGL::SwapBuffers (this=0x7ed41a6e30)
    at gfx/gl/GLContextProviderEGL.cpp:558
#16 0x0000007ff2440e00 in mozilla::layers::CompositorOGL::EndFrame (
    this=0x7ed41a1d70)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#17 0x0000007ff25174dc in mozilla::layers::LayerManagerComposite::Render (
    this=this@entry=0x7ed41a8a70, aInvalidRegion=..., aOpaqueRegion=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#18 0x0000007ff2517728 in mozilla::layers::LayerManagerComposite::
    UpdateAndRender (this=this@entry=0x7ed41a8a70)
    at gfx/layers/composite/LayerManagerComposite.cpp:657
#19 0x0000007ff2517ad8 in mozilla::layers::LayerManagerComposite::
    EndTransaction (this=this@entry=0x7ed41a8a70, aTimeStamp=..., 
    aFlags=aFlags@entry=mozilla::layers::LayerManager::END_DEFAULT)
    at gfx/layers/composite/LayerManagerComposite.cpp:572
#20 0x0000007ff2559274 in mozilla::layers::CompositorBridgeParent::
    CompositeToTarget (this=0x7fb89aba80, aId=..., aTarget=0x0, 
    aRect=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#21 0x0000007ff253e9bc in mozilla::layers::CompositorVsyncScheduler::Composite (
    this=0x7fb8b500e0, aVsyncEvent=...)
    at gfx/layers/ipc/CompositorVsyncScheduler.cpp:256
#22 0x0000007ff2536e34 in mozilla::detail::RunnableMethodArguments<mozilla::
    VsyncEvent>::applyImpl<mozilla::layers::CompositorVsyncScheduler, void (
    mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&), 
    StoreCopyPassByConstLRef<mozilla::VsyncEvent>, 0ul> (args=..., m=<optimized 
    out>, 
    o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    nsThreadUtils.h:887
#23 mozilla::detail::RunnableMethodArguments<mozilla::VsyncEvent>::
    apply<mozilla::layers::CompositorVsyncScheduler, void (mozilla::layers::
    CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&)> (m=<optimized 
    out>, o=<optimized out>, this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154
#24 mozilla::detail::RunnableMethodImpl<mozilla::layers::
    CompositorVsyncScheduler*, void (mozilla::layers::CompositorVsyncScheduler::
    *)(mozilla::VsyncEvent const&), true, (mozilla::RunnableKind)1, mozilla::
    VsyncEvent>::Run (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201
[...]
#34 0x0000007fef54989c in ?? () from /lib64/libc.so.6
(gdb) 
This hints at the possibility that the render buffers may be being swapped in the wrong thread. But my attempts to dig deeper into this haven't as yet thrown up anything that could give more of a hint about what's going on.

I've also added some debug output to the setBrowserCover() method so it now looks like this:
    function setBrowserCover(model) {
        console.log(&quot;model: &quot; + model);
        console.log(&quot;model.count: &quot; + model.count);
        if (!model || model.count === 0 || !WebUtils.firstUseDone) {
            console.log(&quot;Setting cover&quot;);
            cover = Qt.resolvedUrl(&quot;cover/NoTabsCover.qml&quot;)
            console.log(&quot;Set cover: &quot; + cover);
        } else {
            console.log(&quot;Not setting cover&quot;);
            if (cover != null && window.webView) {
                window.webView.clearSurface()
            }
            cover = null
        }
        console.log(&quot;Exiting&quot;);
    }
When switching to private browsing mode, whether it's done via the menu or the tab list, the following output is triggered:
[D] setBrowserCover:20 - model: PrivateTabModel(0x62bf23d0e0)
[D] setBrowserCover:21 - model.count: 0
[D] setBrowserCover:23 - Setting cover
[D] setBrowserCover:25 - Set cover: file:///usr/share/sailfish-browser/cover/
    NoTabsCover.qml
[D] setBrowserCover:33 - Exiting
Immediately after the last debug print here the browser hangs. I've been trying hard to find some method inside the browser that's executed between the last line of this debug output and the actual hang, but without success. I've been doing this by adding breakpoints to various methods, switching to private browsing and watching to see if any of the breakpoints are hit.

So far without luck. Here are just a few of the methods I've attached breakpoints to and tested this way:
GLContextEGL::SwapBuffers()
GLContextEGL::SetDamage()
GLContextEGL::RenewSurface()
GLScreenBuffer::Swap()
ReadBuffer::Attach()
BeginTransaction()
EndEmptyTransaction()
NeedsPaint()
QOpenGLWebPage::onDrawOverlay()
Many of the breakpoints on these methods are triggered at other points in the browsing process, but if this happens I've just been continuing execution until the point at which I manually switch to private browsing. I get the same output and the same hang as when there's no breakpoint, like this:
Thread 39 &quot;Compositor&quot; hit Breakpoint 1, mozilla::layers::
    LayerManagerComposite::BeginTransaction (this=0x7ed41a8c20, aURL=...)
    at gfx/layers/composite/LayerManagerComposite.cpp:232
232     bool LayerManagerComposite::BeginTransaction(const nsCString& aURL) {
(gdb) c
Continuing.
[D] setBrowserCover:20 - model: PrivateTabModel(0x7fd800da50)
[D] setBrowserCover:21 - model.count: 0
[D] setBrowserCover:23 - Setting cover
[D] setBrowserCover:25 - Set cover: file:///usr/share/sailfish-browser/cover/
    NoTabsCover.qml
[D] setBrowserCover:33 - Exiting
There are a few other things I think it's worth mentioning. The hang happens when the cover is set, but not when it's cleared. If the cover is set right at the start and left as it is (so it's never set to null), everything runs fine. So it very much seems to be the act of switching from null to non-null that causes the problem.

Having not managed to find any methods that are fired between the cover being set and the hang occurring, I got frustrated and went for a walk outside. We have a lake nearby that's beautifully calm at this time of year. The air is warm and calm without being oppressive, which makes going for a walk a great way for me to clear my thoughts and come back feeling calmer.

I didn't have any revelations while walking, but I did think about whether I can approach this from a different angle. Rather than trying to find the gecko methods that are causing the problem by seeing if they're being used, what if I were to try to disable gecko functionality in the hope that the hang might suddenly vanish.

If the hang goes away with a particular piece of functionality disabled, then it may indicate some kind of clash between the cover change and the disabled functionality.

So I've tried a whole bunch of things, for example, setting it so that the page is always inactive by forcing the state to be always set to false:
 void QOpenGLWebPage::setActive(bool active)
 {
+    active = false;
     // WebPage is in inactive state until the view is initialized.
     // ::processViewInitialization always forces active state so we
     // can just ignore early activation calls.
     if (!d || !d->mViewInitialized)
         return;
 
     if (d->mActive != active) {
         d->mActive = active;
         d->mView->SetIsActive(d->mActive);
         Q_EMIT activeChanged();
     }
 }
I also tried disabling the initialisation code:
 void QOpenGLWebPage::initialize()
 {
-    d->createView();
 }
Plus a whole bunch of other similar things, from disconnecting various signals to preventing the EGL Display from being initialised. Many of these changes prevented rendering, but none of them prevented the hang.

I don't have an answer for why this is happening, but I'll persevere with it. As with everything computer-related, there is definitely an answer, it's just a case of finding it.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment