flypig.co.uk

Personal Blog

View the blog index.

RSS feed Click the icon for the blog RSS feed.

Blog

5 most recent items

28 Mar 2024 : Day 199 #
Adam Pigg (piggz, who you'll know from his daily Ofono blog amongst other Sailfish-related things, asked a good question on Mastodon yesterday. "How" Adam asks "was the texture deletion missed? Was it present in 71?" This is in relation to how I ended up solving the app seizure problem. With apologies to those who have been following along and already know, but to recap, the problem turned out to be that an EGL texture was being created for each SharedSurface_EGLImage, but then in the destructor there was no code to delete the texture.

It's a classic resource leakage problem and one you might rightly ask (and that's exactly what Adam did) how such an obvious failure could have snuck through.

The answer is that the code to delete the texture was in ESR 78 but was removed from ESR 91. That sounds strange until you also realise that the code to create the texture was removed as well. A whole raft of changes were made as a result of changeset D75055 and the changes specifically to the SharedSurfaceEGL.cpp file involved stripping out the EGL code that the WebView relies on. When attempting to reintroduce this code back I returned the texture creation code but somehow missed the texture deletion code.

And that's the challenge of where I'm at with the WebView rendering changes now. It's all down to whether I've successfully reversed these changes or not. I know it can work because it works with ESR 78. But getting all of the pieces to balance together is turning out to be a bit of a challenge. It's just a matter of time before everything fits in the right way, but it is, I'm afraid, taking a lot of time.

So thanks Adam for asking the question. It's good to reflect on these things and hopefully in my case learn how to avoid similar problems happening in future. Now on to the work of finding and fixing the other issues.

So today I've been looking through the diffs I mentioned yesterday, but admittedly without making a huge amount of progress. The one thing I've discovered, that I think may be important, is that there's a difference in the way the display is being configured between ESR 78 and ESR 91.

On ESR 78 the display is collected via a call to GetAppDisplay(). This function can be found in GLContextProviderEGL.cpp and looks like this:
// Use the main app's EGLDisplay to avoid creating multiple Wayland connections
// See JB#56215
static EGLDisplay GetAppDisplay() {
#ifdef MOZ_WIDGET_QT
  QPlatformNativeInterface* interface = QGuiApplication::
    platformNativeInterface();
  MOZ_ASSERT(interface);
  return (EGLDisplay)(interface->nativeResourceForIntegration(QByteArrayLiteral(
    "egldisplay")));
#else
  return EGL_NO_DISPLAY;
#endif
}
In our case we have MOZ_WIDGET_QT defined, so it's the first half of the ifdef that's getting compiled. This gets passed in to the GLLibraryEGL::EnsureInitialized() method when the library is initialised.

The initialisation process has been changed in ESR 91. But there's still a similar process that happens when the library is initialised, the difference being that currently EGL_NO_DISPLAY is passed into the method instead.

Eventually this gets passed on to the CreateDisplay() method, which is where we need the correct value to be. Using the debugger I've checked exactly how this gets called on ESR 91. It's clear from this that the display isn't being set up as it should be.
(gdb) b GLLibraryEGL::CreateDisplay
Function "GLLibraryEGL::CreateDisplay" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (GLLibraryEGL::CreateDisplay) pending.
(gdb) r
[...]
Thread 37 "Compositor" hit Breakpoint 1, mozilla::gl::GLLibraryEGL::
    CreateDisplay (this=this@entry=0x7ed8003200, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f176081c8, aDisplay=aDisplay@entry=0x0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:747
747     ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp: No such file or directory.
(gdb) bt
#0  mozilla::gl::GLLibraryEGL::CreateDisplay (this=this@entry=0x7ed8003200, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f176081c8, aDisplay=aDisplay@entry=0x0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:747
#1  0x0000007ff111d850 in mozilla::gl::GLLibraryEGL::DefaultDisplay (
    this=0x7ed8003200, out_failureId=out_failureId@entry=0x7f176081c8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:740
#2  0x0000007ff112ef28 in mozilla::gl::DefaultEglDisplay (
    out_failureId=0x7f176081c8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextEGL.h:33
#3  mozilla::gl::GLContextProviderEGL::CreateHeadless (desc=..., 
    out_failureId=out_failureId@entry=0x7f176081c8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1246
#4  0x0000007ff112f804 in mozilla::gl::GLContextProviderEGL::CreateOffscreen (
    size=..., minCaps=..., 
    flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, 
    out_failureId=out_failureId@entry=0x7f176081c8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1288
#5  0x0000007ff11982f8 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7ed8002f10)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:254
#6  0x0000007ff11ad8e8 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ed8002f10, out_failureReason=0x7f17608520)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:391
#7  0x0000007ff12c3584 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc46c2070, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#8  0x0000007ff12ce600 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc46c2070, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436
[...]
#26 0x0000007ff6a0289c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) 
Besides this investigation I've also started making changes to try to fix the code, but this is still very much a work-in-progress. Hopefully tomorrow I'll have something more concrete to show for my efforts.

So it's just a short one today, but rest assured I'll be writing more about all this tomorrow.

Before finishing up, I also just want to reiterate my commitment to base-2 milestones. The fact I'll be hitting a day that just happens to be represented neatly in base 10 is of no interest to me and I won't be making a big deal out of it.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
27 Mar 2024 : Day 198 #
I'm looking forward to getting back to a more balanced cadence with gecko development. It's been frustrating to be stuck on app seizing for the last couple of weeks, now that it's out of the way it'll be nice to focus on other parts of the code. But I'm not going to be wandering too far afield as I continue to try to get the WebView render pipeline working.

What's become clear is that the front and back buffer are being successfully created (and now destroyed!). So now there are two other potential places where the rendering could be failing. It could be that the paint from the Web pages is failing to get on the texture. Or it could be that the paint from the texture is failing to get on the screen.

I'd like to devise ways to test both of those things, but before I do that I want to first check another area that's ripe for failure in my opinion, and that's the setting of the display value. The EGL library uses an EGLDisplay object to control where rendering happens. Although it's part of the Khronos EGL specification, the official documentation is frustratingly vague about what an EGLDisplay actually is. Thankfully the PowerVR documentation has a note that summarises it quite clearly.
 
EGL uses the concept of a "display". This is an abstract object which will show the rendered graphical output. In most environments it corresponds to a single physical screen. After creating a native display for a given windowing system, EGL can use this handle to get a corresponding EGLDisplay handle for use in rendering.

The shift from ESR 78 to ESR 91 brought with it a more flexible handling of displays. In particular, while ESR 78 had just a single instance of a display, ESR 91 allows multiple displays to be configured. What the practical benefit of this is I'm not entirely certain of, but handling of EGLDisplay storage has become more complex as a result.

So whereas previously gecko had a single mDisplay value that got used everywhere, the EGLDisplay is now wrapped in a gecko-specific EglDisplay class, defined in GLLibraryEGL.h. This class captures a collection of functionalities, one of which is to store an EGLDisplay value. There can be multiple instances of EglDisplay live at any one time.

The subtle distinction in the capitalisation — EGLDisplay vs. EglDisplay — is critical. The former belongs to EGL whereas the latter belongs to gecko. The fact they're so similar and that the shift from ESR 78 to ESR 91 has resulted in a switch from one to the other in many parts of the code, makes things all the more confusing.

There's plenty of opportunity for errors here. So I'm thinking: this is something to check.

An obvious place to start these checks is with display initialisation. A quick grep of the code for eglInitialize doesn't give any useful results. However as we saw at some length on Monday, all of these EGL library calls have been abstracted away. And eglInitialize() is no different. The gecko code uses a method called GLLibraryEGL::fInitialize() instead.

Grepping for that throws up some more useful references. The most promising one being this:
static EGLDisplay GetAndInitDisplay(GLLibraryEGL& egl, void* displayType, 
    EGLDisplay display = EGL_NO_DISPLAY) {
  if (display == EGL_NO_DISPLAY) {
      display = egl.fGetDisplay(displayType);
      if (display == EGL_NO_DISPLAY) return EGL_NO_DISPLAY;
      if (!egl.fInitialize(display, nullptr, nullptr)) return EGL_NO_DISPLAY;
  }
  return display;
}
That's on ESR 78. On ESR 91 things are different and for good reason. The GetAndInitDisplay() method assumes a single instance of EGLDisplay as discussed earlier. On ESR 91 the display is initialised when its EglDisplay wrapper is created:
// static
std::shared_ptr<EglDisplay> EglDisplay::Create(GLLibraryEGL& lib,
                                               const EGLDisplay display,
                                               const bool isWarp) {
  // Retrieve the EglDisplay if it already exists
  {
    const auto itr = lib.mActiveDisplays.find(display);
    if (itr != lib.mActiveDisplays.end()) {
      const auto ret = itr->second.lock();
      if (ret) {
        return ret;
      }
    }
  }

  if (!lib.fInitialize(display, nullptr, nullptr)) {
    return nullptr;
  }
[...]
}
I've chopped off the end of the method there, but the section shown highlights the important part. It's also worth mentioning that in ESR 78 this and the surrounding functionality were all amended by Patch 0038 "Fix mesa egl display and buffer initialisation". I attempted to apply this patch all the way back on Day 55 and it does contain plenty of relevant changes. Here's the way the patch describes itself:
 
Ensure the same display is used for all initialisations to avoid creating multiple wayland connections. Fallback to a wayland window surface in case pixel buffers aren't supported. This is needed on the emulator.

Unfortunately applying the patch, especially due to the differences in the way EGLDisplay is handled, turned out to be a challenge.

Consequently I'm now working my way through this patch again. It'll take me longer than just today, so I'll continue with it until it's all applied properly and report back if I find anything important tomorrow. Raine (rainemak) also flagged up patches 0045 and 0065. The former claims to "Prioritize GMP plugins over all others, and support decoding video for h264, vp8 & vp9"; whereas the latter will:
 
Hardcode loopback address for profile lock filename. When engine started without network PR_GetHostByName takes 20 seconds when connman tries to resolve host name. As this is only used as part of the profile lock filename it can as well be like "127.0.0.1:+<pid>".

It'll take me a while to work through these as well, which means that's it for today. I'll write more about all this tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
26 Mar 2024 : Day 197 #
If you've been following any of these diary entries over the last couple of weeks you'll know I've been struggling to diagnose a problem related to graphics surfaces. A serious bug prevented the graphics surface from being properly created, but as soon as that was fixed another serious issue appeared: after a short period of time using the WebView the app started to seize up, rapidly progressing to the entire phone. After a while the watchdog kicked in causing the phone to reboot itself.

This is, as a general rule, not considered ideal behaviour for an application.

Since then I've been generally debugging, monitoring and annotating the code to try to figure out what was causing the problem. As of yesterday I'd narrowed the issue down to the creation of the EGL image associated with an EGL texture. Each frame the app would create the texture, then create the image from the texture and then create a surface from that.

Skipping execution from anywhere up to the image creation and beyond would result in the seizing up happening. This led me to the EGL instructions: creating and destroying the image.

I've been looking at this code in ShareSurfaceEGL.cpp quite deeply for a couple of weeks now. And finally, narrowing down the area of consideration has finally thrown up something useful.

It turns out that while the surface destructor is called correctly and that this calls fDestroyImage() correctly, that's not all it's supposed to be doing.

All of this was stuff we checked yesterday: a call to fDestroyImage() was being called for every call to fCreateImage() except two, allowing for the front and back buffer to exist at all times.

But looking at the code today I realised there was something missing. When the image is created in SharedSurface_EGLImage::Create() it needs a texture to work with. And so we have this code:
  GLuint prodTex = CreateTextureForOffscreen(prodGL, formats, size);
  if (!prodTex) {
    return ret;
  }

  EGLClientBuffer buffer =
      reinterpret_cast<EGLClientBuffer>(uintptr_t(prodTex));
  EGLImage image = egl->fCreateImage(context,
                                     LOCAL_EGL_GL_TEXTURE_2D, buffer, nullptr);
First create the texture then pass this in to the image creation routine. But while the image is deleted in the destructor, the texture is not!

Here is our destructor code in ESR 91:
SharedSurface_EGLImage::~SharedSurface_EGLImage() {
  const auto& gle = GLContextEGL::Cast(mDesc.gl);
  const auto& egl = gle->mEgl;
  egl->fDestroyImage(mImage);

  if (mSync) {
    // We can't call this unless we have the ext, but we will always have
    // the ext if we have something to destroy.
    egl->fDestroySync(mSync);
    mSync = 0;
  }
}
The image and sync are both destroyed, but the texture never is. So what happens if we add in the texture deletion? To test this I've added it in and the code now looks like this:
SharedSurface_EGLImage::~SharedSurface_EGLImage() {
  const auto& gle = GLContextEGL::Cast(mDesc.gl);
  const auto& egl = gle->mEgl;
  egl->fDestroyImage(mImage);

  if (mSync) {
    // We can't call this unless we have the ext, but we will always have
    // the ext if we have something to destroy.
    egl->fDestroySync(mSync);
    mSync = 0;
  }

  if (!mDesc.gl || !mDesc.gl->MakeCurrent()) return;

  mDesc.gl->fDeleteTextures(1, &mProdTex);
  mProdTex = 0;
}
And now, after building and running this new version, the app no longer seizes up!

To be clear, there's still no rendering happening to the screen, but this is nevertheless an important step forwards and I'm pretty chuffed to have noticed the missing code. In retrospect, it's something I should have noticed a lot earlier, but this goes to show both how intricate these things are, and where my limitations are as a developer. It's hard to keep all of the execution paths in my head all at the same time. As a result I'm left using these often trial-and-error based approaches to finding fixes.

It's a small victory. But it means that tomorrow I can continue on with the proper job of finding out why the render never makes it to the screen. With this resolved I'm feeling more confident again that it will be possible to get to the bottom of it.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
25 Mar 2024 : Day 196 #
Yesterday I finally narrowed down the error causing the WebView app to seize up during execution to a call to EglDisplay::fCreateImage(). Now it may not be this call that's the problem, it might be the way the result is used or the fact that it's not being freed properly, or maybe the parameters that are being passed in to it. But the fact that we've narrowed it down is likely to be a big help in figuring things out.

The call itself goes through to a method that looks like this:
  EGLImage fCreateImage(EGLContext ctx, EGLenum target, EGLClientBuffer buffer,
                        const EGLint* attribList) const {
    MOZ_ASSERT(HasKHRImageBase());
    return mLib->fCreateImage(mDisplay, ctx, target, buffer, attribList);
  }
Here mLib is an instance of GLLibraryEGL. It looks like we have several layers of wrappers here so let's continue digging. This goes through to the following method that's part of GLLibraryEGL:
  EGLImage fCreateImage(EGLDisplay dpy, EGLContext ctx, EGLenum target,
                        EGLClientBuffer buffer,
                        const EGLint* attrib_list) const {
    WRAP(fCreateImageKHR(dpy, ctx, target, buffer, attrib_list));
  }
That looks similar but it's not quite the same. It is just another wrapper though, this time going through to a dynamically created method. The WRAP() macro looks like this:
#define WRAP(X)                \
  PROFILE_CALL                 \
  BEFORE_CALL                  \
  const auto ret = mSymbols.X; \
  AFTER_CALL                   \
  return ret
The PROFILE_CALL, BEFORE_CALL and AFTER_CALL lines are all macros which turn into something functional in the Android build, but in our build are just empty. That means that the WRAP(fCreateImageKHR(dpy, ctx, target, buffer, attrib_list)) statement actually reduces down to just the following:
  const auto ret = mSymbols.fCreateImageKHR(dpy, ctx, target, buffer, 
    attrib_list);
  return ret
The mSymbols object has the following defined on it:
    EGLImage(GLAPIENTRY* fCreateImageKHR)(EGLDisplay dpy, EGLContext ctx,
                                          EGLenum target,
                                          EGLClientBuffer buffer,
                                          const EGLint* attrib_list);
Here EGLImage is a typedef of void* and GLAPIENTRY is an empty define, giving us a final result that looks like this:
    void* (*fCreateImageKHR)(EGLDisplay dpy, EGLContext ctx,
                             EGLenum target,
                             EGLClientBuffer buffer,
                             const EGLint* attrib_list);
We're still not quite there though. Inside GLLibraryEGL.cpp we find this:
    const SymLoadStruct symbols[] = {SYMBOL(CreateImageKHR),
                                     SYMBOL(DestroyImageKHR), END_OF_SYMBOLS};
    (void)fnLoadSymbols(symbols);
This is packing symbols with some data which is then passed in to fnLoadSymbols(), a method for loading symbols from a dynamically loaded library. The define that's used here is the following:
#define SYMBOL(X)                 \
  {                               \
    (PRFuncPtr*)&mSymbols.f##X, { \
      { &quot;egl&quot; #X }                \
    }                             \
  }
Notice how here it's playing around with the input argument so that, with a little judicious simplification for clarity, SYMBOL(CreateImageKHR) becomes:
  mSymbols.fCreateImageKHR, {{ &quot;eglCreateImageKHR&quot; }}
In other words (big reveal, but no big surprise) a call to mSymbols.fCreateImageKHR() will get converted into a call to the EGL function eglCreateImageKHR, loaded in from the EGL driver.

What does this do? According to the documentation:
 
eglCreateImage is used to create an EGLImage object from an existing image resource buffer. display specifies the EGL display used for this operation. context specifies the EGL client API context used for this operation, or EGL_NO_CONTEXT if a client API context is not required. target specifies the type of resource being used as the EGLImage source (examples include two-dimensional textures in OpenGL ES contexts and VGImage objects in OpenVG contexts). buffer is the name (or handle) of a resource to be used as the EGLImage source, cast into the type EGLClientBuffer. attrib_list is a list of attribute-value pairs which is used to select sub-sections of buffer for use as the EGLImage source, such as mipmap levels for OpenGL ES texture map resources, as well as behavioral options, such as whether to preserve pixel data during creation. If attrib_list is non-NULL, the last attribute specified in the list must be EGL_NONE.

Super. Where does that leave us? Well, it tells us that the call to fCreateImage() in our SharedSurface_EGLImage::Create() is really just a bunch of simple wrapper calls that ends up calling an EGL function. What could be going wrong? One obvious potential problem is that the input parameters may be messed up. Another one is that each call to eglCreateImageKHR() creating an EGLImage object should be balanced out with a call to eglDestroyImageKHR() to destroy it.

We do have a call to eglDestroyImageKHR() happening in our SharedSurface_EGLImage destructor. It looks like this:
SharedSurface_EGLImage::~SharedSurface_EGLImage() {
  const auto& gle = GLContextEGL::Cast(mDesc.gl);
  const auto& egl = gle->mEgl;
  egl->fDestroyImage(mImage);
[...]
There's an unexpected difference with the way it's called in ESR 78, where the code looks like this:
SharedSurface_EGLImage::~SharedSurface_EGLImage() {
  const auto& gle = GLContextEGL::Cast(mGL);
  const auto& egl = gle->mEgl;
  egl->fDestroyImage(egl->Display(), mImage);
[...]
Notice the extra egl->Display() value being passed in as a parameter. That's because in ESR 91 EGLLibrary is storing its own copy of the EGLDisplay:
  EGLBoolean fDestroyImage(EGLImage image) const {
    MOZ_ASSERT(HasKHRImageBase());
    return mLib->fDestroyImage(mDisplay, image);
  }
That gives us a couple of things to look into: first, is the correctly value being passed in for image? Second, is the value stored for mDisplay valid? The underlying call to eglDestroyImage also has a Boolean return value which will return EGL_FALSE in case something goes wrong. A nice first step would be to check this return value in case it's indicating a problem. To do this I've added some additional debug output to the code:
  EGLBoolean result = egl->fDestroyImage(mImage);
  printf_stderr(&quot;RENDER: fDestroyImage() return value: %d\n&quot;, result);
The result of running it shows a large number of successful calls to fDestroyImage():
[...]
[JavaScript Warning: &quot;Layout was forced before the page was fully loaded. 
    If stylesheets are not yet loaded this may cause a flash of unstyled 
    content.&quot; {file: &quot;https://jolla.com/themes/unlike/js/
    modernizr.js?x98582&ver=2.6.2&quot; line: 4}]
RENDER: fDestroyImage() return value: 1
RENDER: fDestroyImage() return value: 1
RENDER: fDestroyImage() return value: 1
RENDER: fDestroyImage() return value: 1
RENDER: fDestroyImage() return value: 1
RENDER: fDestroyImage() return value: 1
RENDER: fDestroyImage() return value: 1
[...]
Since this output looks okay I've taken it a step further and added a count to the creation and deletion calls in case it shows any imbalance between the two.
[...]
Frame script: embedhelper.js loaded
RENDER: fCreateImage() return value: 1, 0
RENDER: fCreateImage() return value: 1, 1
CONSOLE message:
[JavaScript Warning: &quot;This page uses the non standard property “zoom”. 
    Consider using calc() in the relevant property values, or using “transform” 
    along with “transform-origin: 0 0”.&quot; {file: &quot;https://jolla.com/
    &quot; line: 0}]
CONSOLE message:
[JavaScript Warning: &quot;Layout was forced before the page was fully loaded. 
    If stylesheets are not yet loaded this may cause a flash of unstyled 
    content.&quot; {file: &quot;https://jolla.com/themes/unlike/js/
    modernizr.js?x98582&ver=2.6.2&quot; line: 4}]
RENDER: fCreateImage() return value: 1, 2
RENDER: fDestroyImage() return value: 1, 0
RENDER: fCreateImage() return value: 1, 3
RENDER: fDestroyImage() return value: 1, 1
[...]

RENDER: fCreateImage() return value: 1, 316
RENDER: fDestroyImage() return value: 1, 314
RENDER: fCreateImage() return value: 1, 317
RENDER: fDestroyImage() return value: 1, 315
[...]
The increasing numbers (going up to 317 and 315 here) tell us that the balance between creates and destroys is pretty clean. There are two creates at the start which don't have matching destroys, after which everything is balanced. It seems unlikely therefore that this is the cause of the seize-ups. What's more, it all makes sense too: at any point in time there should be a front and a back buffer, so there should always be exactly two images in existence at any one time. That's a situation that's confirmed by the numbers.

Just to ensure this matches the behaviour of the previous version I've also tested the same using the debugger on ESR 78. I got the same sequence of calls. First two creates, followed by balanced create and destroy calls so that there are exactly two images in existence at any one time:
fCreateImage
fCreateImage
fCreateImage
fDestroyImage
fCreateImage
fDestroyImage
fCreateImage
[...]
In conclusion everything here looks in order on ESR 91. So tomorrow I'll move on to checking that the display value is set correctly.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
24 Mar 2024 : Day 195 #
I'm working my way through the SharedSurface_EGLImage::Create() method and gradually increasing the steps that are executed. Over the last few days I first established that preventing the SurfaceFactory_EGLImage from being created was enough to prevent the app from seizing up. Without the factory the surfaces themselves weren't going to get created. Next I enabled the factory but disabled the image creation.

Today I'm allowing the offscreen texture to be created by allowing this call to take place:
  GLuint prodTex = CreateTextureForOffscreen(prodGL, formats, size);
But I've placed a return immediately afterwards so that neither the image nor the surface that builds on this are created. Once again the objective is to find out whether the app seizes up or not. If it does then that would point to the texture being the culprit. If not, it's likely something that follows it.

Change made, code built, binary transferred and library installed. Now running the app, there's no seizing up. So that takes us one more step closer to finding the culprit. Now I've moved the early return one step later, until after the EGLImage has been created using the texture, after these lines:
  EGLClientBuffer buffer =
      reinterpret_cast<EGLClientBuffer>(uintptr_t(prodTex));
  EGLImage image = egl->fCreateImage(context,
                                     LOCAL_EGL_GL_TEXTURE_2D, buffer, nullptr);
  if (!image) {
    prodGL->fDeleteTextures(1, &prodTex);
    return ret;
  }
Once again, I've build, transferred and installed the updated library. And now when I run it... the app seizes up! So we have our culprit. The problem seems to be the creation of the image from the surface that's either causing the problem in itself, or triggering something else to cause the problem. The most likely offender in the latter case would be if the created image weren't being freed:
  EGLImage image = egl->fCreateImage(context,
                                     LOCAL_EGL_GL_TEXTURE_2D, buffer, nullptr);
This is reminiscent of a problem I experienced earlier which resulted in me having to disable the texture capture for the cover image. Now that it's narrowed down I can look into the underlying reason. That will be my task for tomorrow morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment