flypig.co.uk

Gecko-dev Diary

Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.

Latest code changes are in the gecko-dev sailfishos-esr91 branch.

There is an index of all posts in case you want to jump to a particular day.

Gecko RSS feed Click the icon for the Gecko-dev Diary RSS feed.

Gecko

5 most recent items

7 Mar 2024 : Day 178 #
Good morning! Well, it is for me at least. By the time this gets posted it'll be the evening. But for me right now it's early and when I check in on my laptop I can see that the build has completed.

We'll take a look at the build presently, but before we do, thank you to Adam Pigg (piggz) for commenting on my meander into virtual methods yesterday. As Adam rightly pointed out, my C++ code was missing some accoutrements that he'd have preferred to see included.
 
Sorry, but i have to deduct points for using raw pointers and not smart pointers, and missing the override keyword ;)

Adam is right of course, not only are these sensible ways to improve the code, but the use of override would also have improved clarity. Adam has kindly provide his own, improved, version of the code which I'll share below. He then went on to make a few, perhaps more controversial, suggestions"
 
I'd also drop using namespace, and maybe mix in some C++23, and swap cout for std::print 🙃

I'll leave it to the reader to judge whether these would actually be improvements or not. Here's Adam's updated version of the code. One thing to note is that support for C++23's std::print requires at least GCC 14 or Clang 18 if you want to compile this at home, but Adam has confirmed it all works as expected using the online Godbolt Compiler Explorer.
#include 
#include 
#include 

class Parent {
public:
    virtual ~Parent() {};
    std::string hello() { return std::string("Hello son"); }    
    std::string wave() { return std::string("Waves"); }
    virtual std::string goodbye() { return std::string("Goodbye son"); }
};

class Child : public Parent {
public:
    std::string hello() { return std::string("Hello Mum"); }
    std::string goodbye() override { return std::string("Goodbye Mum"); }
};

int main() {
    std::unique_ptr parent(new Parent);
    std::shared_ptr child = std::make_shared();
    std::shared_ptr vparent = std::dynamic_pointer_cast(child);


    std::println("1. {} ", parent->hello());
    std::println("2. {} ", child->hello());
    std::println("3. {} ", vparent->hello());

    std::println("4. {} ", parent->wave());
    std::println("5. {} ", child->wave());
    std::println("6. {} ", vparent->wave());
    
    std::println("7. {} ", parent->goodbye());
    std::println("8. {} ", child->goodbye());
    std::println("9. {} ", vparent->goodbye());
    
    std::println("10. {} ", reinterpret_cast(parent.get()));
    std::println("11. {} ", reinterpret_cast(child.get()));
    std::println("12. {} ", reinterpret_cast(vparent.get()));
  
    return 0;
}
Thanks for that Adam! I'm always up for a bit of blog-based code review. Okay, now back to the Gecko changes from yesterday.

The latest Gecko build incorporates all of the GLScreenBuffer code that I've been adding in and following changes made yesterday should also now no longer make use of the SwapChain class. I've copied the packages over to my development phone, installed them and now it's time to test them.
$ gdb harbour-webview 
GNU gdb (GDB) Mer (8.2.1+git9)
[...]
(gdb) r
[...]
=============== Preparing offscreen rendering context ===============
[New LWP 20044]

Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 20040]
0x0000007ff110a0a8 in mozilla::gl::GLScreenBuffer::Size (this=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
290     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:
    No such file or directory.
(gdb) bt
#0  0x0000007ff110a0a8 in mozilla::gl::GLScreenBuffer::Size (this=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#1  mozilla::gl::GLContext::OffscreenSize (this=this@entry=0x7ed419ee40)
    at gfx/gl/GLContext.cpp:2141
#2  0x0000007ff3666804 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc4ba91a0, aId=...)
    at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:156
#3  0x0000007ff12b6f50 in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7fc4cabe80, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:82
#4  0x0000007ff12b6fac in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7fc4ba91a0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#5  0x0000007ff12b7038 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7fc4ba91a0, x=<optimized out>,
    y=<optimized out>, width=<optimized out>, height=<optimized out>)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:794
#6  0x0000007ff12afbd4 in mozilla::detail::RunnableMethodArguments<int, int,
    int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void
    (mozilla::layers::CompositorBridgeParent::*)(int, int, int, int),
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>,
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul, 2ul,
    3ul> (args=..., m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
[...]
#18 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
(gdb) frame 1
#1  mozilla::gl::GLContext::OffscreenSize (this=this@entry=0x7ed419ee40)
    at gfx/gl/GLContext.cpp:2141
2141      return mScreen->Size();
(gdb) p mScreen
$1 = {
  mTuple = {<mozilla::detail::CompactPairHelper<mozilla::gl::GLScreenBuffer*,
    mozilla::DefaultDelete<mozilla::gl::GLScreenBuffer>,
    (mozilla::detail::StorageType)1, (mozilla::detail::StorageType)0>> =
    {<mozilla::DefaultDelete<mozilla::gl::GLScreenBuffer>> = {<No data fields>}, 
      mFirstA = 0x0}, <No data fields>}}
(gdb) p mScreen.mTuple.mFirstA
$2 = (mozilla::gl::GLScreenBuffer *) 0x0
(gdb) 
So that's an immediate crash, the reason being that the code is calling the OffScreenSize() method of the mScreen member variable of type GLScreenBuffer, but mScreen is null. That is, it's not been initialised yet.

Looking through the ESR 78 code there's only one place I can discern that sets the mScreen variable and that's this one:
bool GLContext::CreateScreenBufferImpl(const IntSize& size,
                                       const SurfaceCaps& caps) {
  UniquePtr<GLScreenBuffer> newScreen =
      GLScreenBuffer::Create(this, size, caps);
[...]
  mScreen = std::move(newScreen);

  return true;
}
The only place this is called is here:
  bool CreateScreenBuffer(const gfx::IntSize& size, const SurfaceCaps& caps) {
    if (!IsOffscreenSizeAllowed(size)) return false;

    return CreateScreenBufferImpl(size, caps);
  }
But this is just some indirection. Clearly the method we're really interested in is CreateScreenBuffer(). This is also only called in one place:
bool GLContext::InitOffscreen(const gfx::IntSize& size,
                              const SurfaceCaps& caps) {
  if (!CreateScreenBuffer(size, caps)) return false;
[...]
  mCaps = mScreen->mCaps;
  MOZ_ASSERT(!mCaps.any);

  return true;
}
It's worth noticing here that soon after calling the CreateScreenBuffer() method and in the same call, the mCaps member of mScreen is being accessed. If mScreen were null at the time of this access, this would immediately trigger a segmentation fault. So clearly, by the end of the InitOffscreen() method, it's expected that mScreen should be a valid instance of GLScreenBuffer.

Let's continue digging backwards and find out where InitOffscreen() gets called. It turns out it's called in quite a few places:
$ grep -rIn "InitOffscreen(" * --include="*.cpp"
gecko-dev/gfx/gl/GLContextProviderGLX.cpp:1035:
    if (!gl->InitOffscreen(size, minCaps)) {
gecko-dev/gfx/gl/GLContextProviderWGL.cpp:532:
    if (!gl->InitOffscreen(size, minCaps)) return nullptr;
gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1443:
    if (!gl->InitOffscreen(size, minOffscreenCaps)) {
gecko-dev/gfx/gl/GLContext.cpp:2576:
    bool GLContext::InitOffscreen(const gfx::IntSize& size,
The only one of these context providers we care about, given it's the only one used by sailfish-browser, is GLContextProviderEGL. The instance in GLContext.cpp is the method definition so we can ignore that. So there's only one case to concern ourselves with. The call in GLContextProviderEGL occurs in the GLContextProviderEGL::CreateOffscreen() method, which is rather long so I won't list the entire contents here. Suffice it to say that this is the call we need to be finding some equivalent of in ESR 91. This itself is only called from CompositorOGL::CreateContext().

We're building up a pretty clear path for how mScreen ends up initialised in ESR 78. The CreateContext() method is the first time we've reached something on this path which also exists in ESR 91. So this would seem to be a good place to switch back to ESR 91 and try to reconstruct the path in the opposite direction.

Just to make sure we're not being led in the wrong direction, it's also worth checking that CreateContext() is actually being called in ESR 91 and that this is happening before the segmentation fault causes the application to crash.
(gdb) b CompositorOGL::CreateContext
Breakpoint 1 at 0x7ff119a348: file gfx/layers/opengl/CompositorOGL.cpp, line 227.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]

Thread 38 "Compositor" hit Breakpoint 1, mozilla::layers::CompositorOGL::
    CreateContext (this=this@entry=0x7ed8002f10)
    at gfx/layers/opengl/CompositorOGL.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() {
(gdb)
Yes, it looks to be the case. So now back to building the path. The call to CreateOffscreen() in ESR 78 has been replaced by a call to CreateHeadless() in ESR 91. This is code we've looked at before, but finally we're getting to unravel it a bit. Here's the ESR 78 version:
    SurfaceCaps caps = SurfaceCaps::ForRGB();
    caps.preserve = false;
    caps.bpp16 = gfxVars::OffscreenFormat() == SurfaceFormat::R5G6B5_UINT16;

    nsCString discardFailureId;
    context = GLContextProvider::CreateOffscreen(
        mSurfaceSize, caps, CreateContextFlags::REQUIRE_COMPAT_PROFILE,
        &discardFailureId);
And here's the equivalent code, also executed from within CompositorOGL::CreateContext(), from ESR 91.
    nsCString discardFailureId;
    context = GLContextProvider::CreateHeadless(
        {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId);
    if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) {
      context = nullptr;
    }
This gives us all the pieces we need — both mSurfaceSize and caps are available in ESR 91 — so we just need to reconstruct the call to reflect what's happening in ESR 78.

So the next step is to add the CreateOffscreen() code to GLContextProviderEGL. There's code in ESR 78 to copy over, so that part's straightforward. However various interfaces it makes use of have been changed or are missing. I've had to make some quite heavy, but nevertheless justifiable, changes to the code. For example the GLLibraryEGL::EnsureInitialized() method has been replaced by GLLibraryEGL::Init(). That's more than just a name change: the former can be safely called multiple times, whereas the latter can only be called once (or so it appears). Consequently I've removed the call entirely, on the assumption that the Init() method is being called somewhere else. We'll find out whether that's true or not when we come to execute the program.

Nevertheless the partial builds now compile, so it's time to run the full build again, which will take more hours than there are left in the day. So we'll return to this again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
6 Mar 2024 : Day 177 #
Yesterday I spent the day fixing linker errors. By the evening there was one last error that popped up at the end of the full build that looked like this:
TextureClientSharedSurface.cpp:108: undefined reference to `vtable for mozilla::layers::SharedSurfaceTextureClient'
I promised to spend a bit of time today explaining what was going on and how I fixed it.

One of the reasons I want to explain is that these vtable errors are a little cryptic. They also related to an interesting implementation detail of C++. They're telling us that there's something wrong with the SharedSurfaceTextureClient implementation, but not telling us exactly what. The line number the error refers to is in the SharedSurfaceTextureClient constructor, which is also not particularly helpful:
SharedSurfaceTextureClient::SharedSurfaceTextureClient(
    SharedSurfaceTextureData* aData, TextureFlags aFlags,
    LayersIPCChannel* aAllocator)
    : TextureClient(aData, aFlags, aAllocator) {
}
The vtable the error is referring to is the "virtual table" of methods that can be dynamically overridden. When you subclass a class in C++ you can override certain methods from the parent class. In other words, when the code calls the method in the subclass, you can make sure it uses a method written for the subclass, rather than using the inherited method from the parent class. Both overriding and not overriding are useful. The whole point of creating a subclass is that you re-use some functionality from the parent class, so if you don't override a method it's great that the implementation from the parent class can be used.

However, sometimes you want to change the behaviour — or at least some of it — from the parent class. That might be where you'd override some of that functionality.

Where am I going with this? Bear with me! The point is that there are also two ways that a method can be overridden. It can be statically overridden or dynamically overridden. The former case is useful when the compiler can always tell that the method to be used is the one from the child class. In particular, if you never cast a class to make it look like its parent. If you cast it to the parent, any statically overridden methods will use the parent class's implementation.

However, if you dynamically override a method, the child's implementation will be used even if you cast the class upwards. Let's take some simple example (this doesn't appear in the Gecko codebase!).
#include <iostream>

using namespace std;

class Parent {
public:
    string hello() { return string("Hello son"); }    
    string wave() { return string("Waves"); }
    virtual string goodbye() { return string("Goodbye son"); }
};

class Child : public Parent {
public:
    string hello() { return string("Hello Mum"); }
    string goodbye() { return string("Goodbye Mum"); }
};

int main() {
    Parent* parent = new Parent();
    Child* child = new Child();
    Parent* vparent = dynamic_cast<Parent*>(child);

    cout << "1.  " << parent->hello() << "\n";
    cout << "2.  " << child->hello() << "\n";
    cout << "3.  " << vparent->hello() << "\n";

    cout << "4.  " << parent->wave() << "\n";
    cout << "5.  " << child->wave() << "\n";
    cout << "6.  " << vparent->wave() << "\n";
    
    cout << "7.  " << parent->goodbye() << "\n";
    cout << "8.  " << child->goodbye() << "\n";
    cout << "9.  " << vparent->goodbye() << "\n";
    
    cout << "10. " << parent << "\n";
    cout << "11. " << child << "\n";
    cout << "12. " << vparent << "\n";

    delete child;
    delete parent;

    return 0;
}
Here we can see from the class definition that the Child class is inheriting from the Parent class. The Child class doesn't have any implementation of the wave() method because we're expecting it to be inherited from the Parent. The Child class statically overrides its parent's hello() implementation but dynamically overrides its parent's goodbye() method. We can see this because of the virtual keyword that's added before the name of the method. Note that it's actually specified on the parent's goodbye() method. It's parents that decide whether their children can inherit methods virtually or not.

Now let's see what happens when we build and run this code.
$ g++ main.cpp
$ ./a.out 
1.  Hello son
2.  Hello Mum
3.  Hello son
4.  Waves
5.  Waves
6.  Waves
7.  Goodbye son
8.  Goodbye Mum
9.  Goodbye Mum
10. 0x563a9f694eb0
11. 0x563a9f694ed0
12. 0x563a9f694ed0
Notice how the Child class is still able to successfully call the wave() method (line 5). When the child class calls hello() and goodbye() it calls its own implementations of these methods.

However, there's also this peculiar vparent variable. This is the instance of Child that's been dynamically cast to look like a Parent. We can see that child and parent are actually the same object because they point to the same place (lines 11 and 12) compared to the parent object which has a different pointer (line 10).

Casing a Child to a Parent is perfectly safe because the former is inheriting from the latter. In other words, the code knows that everything a Parent has a Child will have as well. Safe.

But in the case of the dynamic cast, the calls to any overridden virtual methods are resolved at runtime rather than compile time. In particular, even though vparent is of type Parent, calling the goodbye() method on it calls the child's implementation of goodbye() (line 9).

The way this is achieved is with a vtable (this is where we're going!). This is a list of pointers to methods stored at runtime. Because goodbye() is marked as virtual in the Parent class, a pointer to this method is stored in the vtable. When the compiler calls the method it gets the address to call from the vtable, rather than from some fixed value stored at compile time.

Now when the Child dynamically overrides goodbye() it overwrites the value in the vtable. The result is that when the goodbye() class is called from vparent, the Child instance is referenced even though the class looks otherwise just like a Parent.

Finally we get to our error. The error is suggesting that there's some method that should be virtual, but no vtable has been created. This is most likely because there's a signature for a virtual method, but no implementation for any virtual method.

It's all a bit obscure, made more so because there are no virtual methods in the definition. However we do have ~SharedSurfaceTextureClient() in the class definition that doesn't have an implementation. Plus the class is inheriting from TextureData and this has a virtual destructor. As a result the ~SharedSurfaceTextureClient() destructor will also need to be added to the vtable.

But while ~SharedSurfaceTextureClient() appears in the class definition, there is no implementation of it. This is therefore likely what's causing our error.

The solution, after this very long explanation, is that we need to add in the implementation of the class destructor. Thankfully there's a destructor implementation in the ESR 78 code we can use:
SharedSurfaceTextureClient::~SharedSurfaceTextureClient() {
  // XXX - Things break when using the proper destruction handshake with
  // SharedSurfaceTextureData because the TextureData outlives its gl
  // context. Having a strong reference to the gl context creates a cycle.
  // This needs to be fixed in a better way, though, because deleting
  // the TextureData here can race with the compositor and cause flashing.
  TextureData* data = mData;
  mData = nullptr;

  Destroy();

  if (data) {
    // Destroy mData right away without doing the proper deallocation handshake,
    // because SharedSurface depends on things that may not outlive the
    // texture's destructor so we can't wait until we know the compositor isn't
    // using the texture anymore. It goes without saying that this is really bad
    // and we should fix the bugs that block doing the right thing such as bug
    // 1224199 sooner rather than later.
    delete data;
  }
}
As I mentioned yesterday, the partial builds all passed. But last night I also set off the full build again. So what was the outcome of this? The good news is that the full build went through as well, as a consequence of which I now have five shiny new xulrunner-qt5 packages to test on my phone. I'm not expecting them to work first time, but testing them is an essential step in finding out how to progress. Let's give them a go...
$ gdb harbour-webview 
GNU gdb (GDB) Mer (8.2.1+git9)
[...]
(gdb) r
Starting program: /usr/bin/harbour-webview 
[...]
=============== Preparing offscreen rendering context ===============
[New LWP 29323]

Thread 36 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 29318]
mozilla::gl::SwapChain::OffscreenSize (this=0x0)
    at gfx/gl/GLScreenBuffer.cpp:141
141       return mPresenter->mBackBuffer->mFb->mSize;
(gdb) bt
#0  mozilla::gl::SwapChain::OffscreenSize (this=0x0)
    at gfx/gl/GLScreenBuffer.cpp:141
#1  0x0000007ff3666884 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc4b7c500, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#2  0x0000007ff12b6fbc in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7fc4d23b20, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    LayersTypes.h:82
#3  0x0000007ff12b7018 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7fc4b7c500)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007ff12b70a4 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7fc4b7c500, x=<optimized out>, y=<optimized out>, 
    width=<optimized out>, height=<optimized out>)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:794
#5  0x0000007ff12afc40 in mozilla::detail::RunnableMethodArguments<int, int,
    int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void
    (mozilla::layers::CompositorBridgeParent::*)(int, int, int, int),
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>,
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul,
    2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
[...]
#17 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
(gdb) p this
$1 = (const mozilla::gl::SwapChain * const) 0x0
(gdb) 
Immediately there's a segmentation fault and it feels like we've been here before. Here's where the crash is triggered:
const gfx::IntSize& SwapChain::OffscreenSize() const {
  return mPresenter->mBackBuffer->mFb->mSize;
}
It's being triggered because the instance of SwapChain this is being called from is null. But hang on; didn't we just spend days getting rid of the SwapChain code and swapping in GLScreenBuffer precisely to avoid this error? Well, yes, we did. Clearly after changing all that code I missed a case of SwapChain being used.

The fix is to switch back from SwapChain to the previous GLScreenBuffer interface, like to:
@@ -151,8 +153,7 @@ EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget(VsyncId aId)
 
   if (context->IsOffscreen()) {
     MutexAutoLock lock(mRenderMutex);
-    if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize
-      && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) {
+    if (context->OffscreenSize() != mEGLSurfaceSize &&
+        !context->ResizeOffscreen(mEGLSurfaceSize)) {
       return;
     }
   }
I've also checked through the rest of the EmbedliteCompositionBridgeParent code to ensure there are no other SwapChain references in there. And, with that, we're done for the day. I've spent a long time explaining vtables today and very little time actually coding, but the change is an essential one and I can't test it without another build. So I've set it building again and with any luck I'll be able to test it again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
5 Mar 2024 : Day 176 #
Just as I was explaining in my diary entry yesterday, the first thing I like to do after I wake up following a night of hard work — all performed by my laptop building the latest Gecko changes of course — is to scan the output for red. Red indicates errors.

When I peeked in this morning there was no red showing on the console. But my excitement was short-lived. The exception to the "errors are red" rule is when they come from the linker rather than the compiler and that's what seems to have happened here.

I'm not sure why the linker doesn't bother highlighting errors in red, but on close inspection they definitely are errors.

The fact it compiled without error is nevertheless exciting in itself, just not as exciting as actually having a binary to test. So what are these errors? As is the way with the linker, they're all "undefined reference" or symbol-related errors. This happens when the code calls something that may, for example, have a method signature but no method definition. That can be common with pure virtual functions that aren't subsequently overridden, say, but there are other reasons too. For example I might have added a method signature simply without adding in the method body. Here's a sample of the errors that came out:
403:37.88 toolkit/library/build/libxul.so
410:16.07 aarch64-meego-linux-gnu-ld: ../../../gfx/gl/GLScreenBuffer.o:
    in function `mozilla::gl::GLScreenBuffer::~GLScreenBuffer()':
410:16.07 ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:296:
    undefined reference to `mozilla::gl::SurfaceCaps::~SurfaceCaps()'
410:16.07 aarch64-meego-linux-gnu-ld: /jol403:37.88
    toolkit/library/build/libxul.so
410:16.07 aarch64-meego-linux-gnu-ld: ../../../gfx/gl/GLScreenBuffer.o:
    in function `mozilla::gl::GLScreenBuffer::~GLScreenBuffer()':
410:16.07 ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:296:
    undefined reference to `mozilla::gl::SurfaceCaps::~SurfaceCaps()'
410:16.07 aarch64-meego-linux-gnu-ld: ${PROJECT}/gecko-dev/gfx/gl/
    GLScreenBuffer.cpp:296: undefined reference to
    `mozilla::gl::SurfaceCaps::~SurfaceCaps()'
410:16.07 aarch64-meego-linux-gnu-ld: ../../../gfx/gl/Unified_cpp_gfx_gl0.o:
    in function `mozilla::gl::GLContext::GLContext(mozilla::gl::GLContextDesc
    const&, mozilla::gl::GLContext*, bool)':
410:16.07 ${PROJECT}/gecko-dev/gfx/gl/GLContext.cpp:290: undefined reference to
    `mozilla::gl::SurfaceCaps::SurfaceCaps()'
[...]
410:16.10 aarch64-meego-linux-gnu-ld: libxul.so: hidden symbol
    `_ZNK7mozilla2gl9GLContext15ChooseGLFormatsERKNS0_11SurfaceCapsE'
    isn't defined
410:16.10 aarch64-meego-linux-gnu-ld: final link failed: bad value
410:16.10 collect2: error: ld returned 1 exit status
Let me summarise all of the errors we have here and make things a bit clearer. The following are all the error locations followed by the names of the missing implementations.
GLScreenBuffer.cpp:296: SurfaceCaps::~SurfaceCaps()
GLContext.cpp:290: SurfaceCaps::SurfaceCaps()
SharedSurface.cpp:362: SurfaceCaps::SurfaceCaps(mozilla::gl::SurfaceCaps const&)
SharedSurface.cpp:361: GLContext::ChooseGLFormats(
    mozilla::gl::SurfaceCaps const&) const
SharedSurface.cpp:362: SurfaceCaps::SurfaceCaps()
SharedSurface.cpp:338: SurfaceCaps::SurfaceCaps()
SurfaceTypes.h:38: SurfaceCaps::SurfaceCaps()
SurfaceTypes.h:38: SurfaceCaps::operator=(mozilla::gl::SurfaceCaps const&)
SurfaceTypes.h:38: SurfaceCaps::~SurfaceCaps()
SharedSurface.cpp:350: SurfaceCaps::operator=(mozilla::gl::SurfaceCaps const&)
SharedSurface.cpp:338: SurfaceCaps::~SurfaceCaps()
SharedSurface.cpp:367: SurfaceCaps::~SurfaceCaps()
GLContext.cpp:296: SurfaceCaps::~SurfaceCaps()
SharedSurfaceGL.cpp:39: SurfaceCaps::SurfaceCaps()
SharedSurfaceGL.cpp:39: SurfaceCaps::~SurfaceCaps()
RefPtr.h:590: SharedSurfaceTextureClient::SharedSurfaceTextureClient(
    mozilla::layers::SharedSurfaceTextureData*, mozilla::layers::TextureFlags,
    mozilla::layers::LayersIPCChannel*)
So, what's going on with these? Let's take a look. First up we have line 296 of GLScreenBuffer.cpp. This line, it turns out, is where the GLScreenBuffer destructor implementation starts in the code:
GLScreenBuffer::~GLScreenBuffer() {
  mFactory = nullptr;
  mRead = nullptr;

  if (!mBack) return;

  // Detach mBack cleanly.
  mBack->Surf()->ProducerRelease();
}
Any instance of GLScreenBuffer will also hold an instance of SurfaceCaps as we can see in the header:
 public:
  const SurfaceCaps mCaps;
This is an actual instance of SurfaceCaps not just a pointer to it, so when the GLScreenBuffer instance is destroyed mCaps will be too, triggering a call to the SurfaceCaps destructor. If we look in the SurfaceTypes.h header file we can see that there is a destructor specified:
struct SurfaceCaps final {
[...]
  SurfaceCaps();
  SurfaceCaps(const SurfaceCaps& other);
  ~SurfaceCaps();
[...]
But nowhere is the body of the method defined. Compare that to the ESR 78 code where the SurfaceCaps class looks the same. The difference is that checking in the GLContect.cpp source we find these:
// These are defined out of line so that we don't need to include
// ISurfaceAllocator.h in SurfaceTypes.h.
SurfaceCaps::SurfaceCaps() = default;
SurfaceCaps::SurfaceCaps(const SurfaceCaps& other) = default;
SurfaceCaps& SurfaceCaps::operator=(const SurfaceCaps& other) = default;
SurfaceCaps::~SurfaceCaps() = default;
They're pretty much the simplest implementations you can get. But they are at least implementations. So with any luck adding these in to the ESR 91 code as well will solve quite a few of the linker errors. That's not everything though. After we take away the instances that these four methods deal with we're left with these two:
SharedSurface.cpp:361: GLContext::ChooseGLFormats(
    mozilla::gl::SurfaceCaps const&) const
RefPtr.h:590: SharedSurfaceTextureClient::SharedSurfaceTextureClient(
    mozilla::layers::SharedSurfaceTextureData*, mozilla::layers::TextureFlags,
    mozilla::layers::LayersIPCChannel*)
Let's check SharedSurface.cpp line 361 next. The line looks like this:
      mFormats(partialDesc.gl->ChooseGLFormats(caps)),
If we look in the GLContext class, sure enough we can see the method signature:
  // Only varies based on bpp16 and alpha.
  GLFormats ChooseGLFormats(const SurfaceCaps& caps) const;
We can see the same signature in the ESR 78 code. Once again though, the body of the method is defined in GLContext.cpp of ESR 78, whereas it's nowhere to be found in ESR 91. So I've copied the missing code over: 
GLFormats GLContext::ChooseGLFormats(const SurfaceCaps& caps) const { GLFormats formats; // If we're on ES2 hardware and we have an explicit request for 16 bits of // color or less OR we don't support full 8-bit color, return a 4444 or 565 // format. bool bpp16 = caps.bpp16; [...] 
Finally we have a reference to SharedSurfaceTextureClient. The reference to line 590 of RefPtr.h is unhelpful; what we really want to know is where the RefPtr is being used. A quick search suggests it's inside GLScreenBuffer where there are a couple of cases that look like this:
bool GLScreenBuffer::Swap(const gfx::IntSize& size) {
  RefPtr<layers::SharedSurfaceTextureClient> newBack =
      mFactory->NewTexClient(size);
[...]
There's no definition of SharedSurfaceTextureClient::SharedSurfaceTextureClient() in the ESR 91 code, but there is in the ESR 78 code, in the TextureClientSharedSurface.cpp source:
SharedSurfaceTextureClient::SharedSurfaceTextureClient(
    SharedSurfaceTextureData* aData, TextureFlags aFlags,
    LayersIPCChannel* aAllocator)
    : TextureClient(aData, aFlags, aAllocator) {
  mWorkaroundAnnoyingSharedSurfaceLifetimeIssues = true;
}
I've copied that code over as well. But my reading that's all of them. So this may mean we're ready to kick off a full rebuild. Before doing so, it's worth spending a bit of extra time to check that the changes pass our three partial builds already.
$ make -j1 -C obj-build-mer-qt-xr/gfx/gl/
[...]
$ make -j1 -C obj-build-mer-qt-xr/dom/canvas
[...]
$ make -j1 -C obj-build-mer-qt-xr/gfx/layers
[...]
The first two work fine, but the last generates a new error:
${PROJECT}/gecko-dev/gfx/layers/client/TextureClientSharedSurface.cpp:109:3:
    error: ‘mWorkaroundAnnoyingSharedSurfaceLifetimeIssues’
    was not declared in this scope
   mWorkaroundAnnoyingSharedSurfaceLifetimeIssues = true;
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This mWorkaroundAnnoyingSharedSurfaceLifetimeIssues variable is present in ESR 78, used to decide whether or not to deallocate the TextureData when the TextureClient is destroyed. Here's the logic it's part of:
void TextureClient::Destroy() {
[...]
  TextureData* data = mData;
  if (!mWorkaroundAnnoyingSharedSurfaceLifetimeIssues) {
    mData = nullptr;
  }

  if (data || actor) {
[...]
    params.workAroundSharedSurfaceOwnershipIssue =
        mWorkaroundAnnoyingSharedSurfaceOwnershipIssues;
    if (mWorkaroundAnnoyingSharedSurfaceLifetimeIssues) {
      params.data = nullptr;
    } else {
      params.data = data;
    }
[...]
    DeallocateTextureClient(params);
This logic has been removed in ESR 91 and, if I'm honest, I'm comfortable leaving it this way. Here's what the equivalent ERS 91 code looks like:
void TextureClient::Destroy() {
[...]
  TextureData* data = mData;
  mData = nullptr;

  if (data || actor) {
[...]
    DeallocateTextureClient(params);
  }
}
As you can see, there are no references to mWorkaroundAnnoyingSharedSurfaceLifetimeIssues, or anything like it, there at all. Consequently I'm just going to remove the references to it from our new SharedSurfaceTextureClient constructor as well:
SharedSurfaceTextureClient::SharedSurfaceTextureClient(
    SharedSurfaceTextureData* aData, TextureFlags aFlags,
    LayersIPCChannel* aAllocator)
    : TextureClient(aData, aFlags, aAllocator) {
}
Maybe this will cause problems later, but my guess is that if there are going to be problems, they'll be pretty straightforward to track down with the debugger, at which point we can refer back to the ESR 78 code to restore these checks.

The partial builds all now pass, so it's time to do a full rebuild. Here's the updated tally of changes once again to highlight today's progress:
$ git diff --numstat
145     16      gfx/gl/GLContext.cpp
73      8       gfx/gl/GLContext.h
11      0       gfx/gl/GLContextTypes.h
590     0       gfx/gl/GLScreenBuffer.cpp
171     1       gfx/gl/GLScreenBuffer.h
305     5       gfx/gl/SharedSurface.cpp
132     4       gfx/gl/SharedSurface.h
16      10      gfx/gl/SharedSurfaceEGL.cpp
13      3       gfx/gl/SharedSurfaceEGL.h
81      2       gfx/gl/SharedSurfaceGL.cpp
61      0       gfx/gl/SharedSurfaceGL.h
60      0       gfx/gl/SurfaceTypes.h
27      0       gfx/layers/client/TextureClientSharedSurface.cpp
25      0       gfx/layers/client/TextureClientSharedSurface.h
$ git diff --shortstat
 14 files changed, 1710 insertions(+), 49 deletions(-)
So off the build goes again:
$ sfdk build -d --with git_workaround
NOTICE: Appending changelog entries to the RPM SPEC file…
Setting version: 91.9.1+git1+sailfishos.esr91.20240302180401.
    2f1f19ac7d73+gecko.dev.5292b747b036
Directory walk started
[...]
This will likely take until the morning, at which point I'll be eagerly checking for errors once again.

[...]

An early and unusually short build run (just 5 hours 12 minutes) meant that I've got a second bit of the cherry! The build finished before the end of the day. It's not a successful build unfortunately, but by discovering this now rather than in the morning I'm able to claim an entire day back. So I'm very happy about that.

Here are the linker errors. Well, actually, I think it's just one error that's spread over many lines:
308:11.87 toolkit/library/build/libxul.so
311:51.74 aarch64-meego-linux-gnu-ld: ../../../gfx/layers/
    Unified_cpp_gfx_layers6.o: in function `mozilla::layers::
    SharedSurfaceTextureClient::SharedSurfaceTextureClient
    (mozilla::layers::SharedSurfaceTextureData*, mozilla::layers::TextureFlags,
    mozilla::layers::LayersIPCChannel*)':
311:51.74 ${PROJECT}/gecko-dev/gfx/layers/client/
    TextureClientSharedSurface.cpp:108: undefined reference to
    `vtable for mozilla::layers::SharedSurfaceTextureClient'
311:51.74 aarch64-meego-linux-gnu-ld: ${PROJECT}/gecko-dev/gfx/layers/client/
    TextureClientSharedSurface.cpp:108: undefined reference to
    `vtable for mozilla::layers::SharedSurfaceTextureClient'
311:51.74 aarch64-meego-linux-gnu-ld: libxul.so: hidden symbol
    `_ZTVN7mozilla6layers26SharedSurfaceTextureClientE' isn't defined
311:51.74 aarch64-meego-linux-gnu-ld: final link failed: bad value
311:51.75 collect2: error: ld returned 1 exit status
The essence of error is the following:
TextureClientSharedSurface.cpp:108: undefined reference to
    `vtable for mozilla::layers::SharedSurfaceTextureClient'
I'm going to hold of explaining what's going on here until tomorrow. This post is already rather long and I don't have the energy to go in to the details tonight. But I have added in a fix.

With these latest changes the partial builds still all pass. So it's time to set the full build off again. But this also means it's the end of my gecko development for the day. More tomorrow when we'll see how things have gone.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
4 Mar 2024 : Day 175 #
I've been really struggling to get the GLScreenBuffer code back compiling again over the last few days. This is partly because a whole raft of interfaces have changed and these changes have seeped through into many different places in the code. Often the changes are quite small, such as using a reference rather than a pointer. But other times they're more significant, such as the way member variables in SharedSurface have changed. To look into the latter in a little more detail, the previous version looked like this:
class SharedSurface {
 public:
[...]
  const SharedSurfaceType mType;Th
  const AttachmentType mAttachType;
  const WeakPtr<GLContext> mGL;
  const gfx::IntSize mSize;
  const bool mHasAlpha;
  const bool mCanRecycle;
But the new code bundles all this up into a structure so that it now looks more like this:
struct PartialSharedSurfaceDesc {
  const WeakPtr<GLContext> gl;
  const SharedSurfaceType type;
  const layers::TextureType consumerType;
  const bool canRecycle;
};
struct SharedSurfaceDesc : public PartialSharedSurfaceDesc {
  gfx::IntSize size = {};
};

class SharedSurface {
 public:
[...]
  const SharedSurfaceDesc mDesc;
Even this might seem like a small change, but it can make it really hard to match up method signatures given the ESR 78 version takes a collection of individual parameters, whilst the ESR 91 version requires just a single SharedSurfaceDesc instance.

Besides all this I've also had what feels like several long days at work. The result is that when I come to work on Gecko of an evening the code swims around in front of my eyes and refuses to stay still as my mind drifts hither and thither.

Nevertheless I've been persevering with my "fix, compile, examine" routine and have continued to make some slight progress. Once again, I don't want to go into the full detail of the changes I've had to make, because it really is just reintroducing code that was removed upstream. So there's a lot of it, and I'm not even spending any time attempting to understand it.

But it has now got to the point where all three directories I've been making changes to are compiling when I do the partial builds:
$ make -j1 -C obj-build-mer-qt-xr/gfx/gl/
[...]
$ make -j1 -C obj-build-mer-qt-xr/dom/canvas
[...]
$ make -j1 -C obj-build-mer-qt-xr/gfx/layers
[...]
I've been here before of course. Just because the partial builds compile without errors doesn't mean the full build will pass. So the next step for me tonight is to set the full build running so we can come back to it and check its status in the morning.

Before I do that, here are the latest stats showing the changes I've made in relation to the offscreen rendering pipeline:
$ git diff --numstat
73      16      gfx/gl/GLContext.cpp
73      8       gfx/gl/GLContext.h
11      0       gfx/gl/GLContextTypes.h
590     0       gfx/gl/GLScreenBuffer.cpp
171     1       gfx/gl/GLScreenBuffer.h
305     5       gfx/gl/SharedSurface.cpp
132     4       gfx/gl/SharedSurface.h
16      10      gfx/gl/SharedSurfaceEGL.cpp
13      3       gfx/gl/SharedSurfaceEGL.h
81      2       gfx/gl/SharedSurfaceGL.cpp
61      0       gfx/gl/SharedSurfaceGL.h
60      0       gfx/gl/SurfaceTypes.h
21      0       gfx/layers/client/TextureClientSharedSurface.cpp
25      0       gfx/layers/client/TextureClientSharedSurface.h
$ git diff --shortstat
 14 files changed, 1632 insertions(+), 49 deletions(-)
That's not a huge increase on yesterday, but is at least non-zero. This reflects the fact that as things have progressed the fixes have become smaller, but thornier. All of the main changes this time seem to have been to the GLContext code.

Alright, time to set the build off. Hopefully this will complete without errors, but I'm not betting on it.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
3 Mar 2024 : Day 174 #
As I walk, half asleep, from bedroom to kitchen for breakfast each morning I have to go past the office space where I work on my laptop. Whenever I leave a build running overnight I can't resist the urge to peak in and see how things have gone as I go past.

This morning I peaked in and saw a barrage of errors, highlighted in red, spanning back into the history of the terminal buffer. That's not quite the morning wake-up I was hoping for!

Nevertheless, as I like to tell myself, it still represents progress. And so today I have the task of fixing these fresh errors. Compile-time errors are generally easier to fix than runtime errors, so I don't mind at all really.

That was earlier today though and as I write this I'm already on the train. I've made the smart move of connecting all my devices to the same Wifi hotspot on my phone, which means no wires today. It's all surprisingly well arranged, although that hasn't prevented me from knocking my phone under the seat in front of me already once this journey.

This calls for another one of thigg's amazing images. This one feels a lot less chaotic than the last, albeit with a stalker fox in the background adding an ominous air to proceedings!
 
A pig with wings wearing a suit sits at in a train at a table using a laptop; resting on the table is a phone and in the background a fox enters the carriage wheeling a case

So that's me today. Before I start, let's have some stats from git so we can compare what happens now with what we get at the end of the day.
$ git diff --numstat
38      4       gfx/gl/GLContext.cpp
59      7       gfx/gl/GLContext.h
11      0       gfx/gl/GLContextTypes.h
590     0       gfx/gl/GLScreenBuffer.cpp
171     1       gfx/gl/GLScreenBuffer.h
305     5       gfx/gl/SharedSurface.cpp
130     3       gfx/gl/SharedSurface.h
16      10      gfx/gl/SharedSurfaceEGL.cpp
13      3       gfx/gl/SharedSurfaceEGL.h
8       2       gfx/gl/SharedSurfaceGL.cpp
3       0       gfx/gl/SharedSurfaceGL.h
60      0       gfx/gl/SurfaceTypes.h
21      0       gfx/layers/client/TextureClientSharedSurface.cpp
25      0       gfx/layers/client/TextureClientSharedSurface.h
$ git diff --shortstat
 14 files changed, 1450 insertions(+), 35 deletions(-)
Now on to the bug fixing!

[...]

I've managed to plough through quite a few issues, almost all related to the SharedSurfaceGL and EmbedliteCompositorBridgeParent classes. Unfortunately I've not managed to get to the point where the partial builds are going through without error. That means there's no point in setting the full build to run over night. Instead, I'll have to pick up the errors where I left off in the morning.

Just to demonstrate that some progress has been made, here are the new stats for this evening.
$ git diff --numstat
58      4       gfx/gl/GLContext.cpp
70      7       gfx/gl/GLContext.h
11      0       gfx/gl/GLContextTypes.h
590     0       gfx/gl/GLScreenBuffer.cpp
171     1       gfx/gl/GLScreenBuffer.h
305     5       gfx/gl/SharedSurface.cpp
131     4       gfx/gl/SharedSurface.h
16      10      gfx/gl/SharedSurfaceEGL.cpp
13      3       gfx/gl/SharedSurfaceEGL.h
72      2       gfx/gl/SharedSurfaceGL.cpp
60      0       gfx/gl/SharedSurfaceGL.h
60      0       gfx/gl/SurfaceTypes.h
21      0       gfx/layers/client/TextureClientSharedSurface.cpp
25      0       gfx/layers/client/TextureClientSharedSurface.h
$ git diff --shortstat
 14 files changed, 1603 insertions(+), 36 deletions(-)
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment