flypig.co.uk

List items

Items from the current list are shown below.

Gecko

13 Mar 2024 : Day 184 #
Yesterday we determined that a problem in CreateShared() meant that the method was returning a null SharedSurface_EGLImage on ESR 91 when it should have been returning a valid pointer. The question I want to answer today is: "why"?

Stepping through the code the programme counter is jumping all over the place, making it hard to follow. But eventually it becomes clear that it's the HasEglImageExtensions() method that's returning false, causing CreateShared() to return early with a null return value. Although the method is called HasEglImageExtensions() in ESR 91, in ESR 78 it's called something else; just HasExtensions. Let's take a look at the two versions of it. But they're otherwise largely the same. First the ESR 78 version:
bool SharedSurface_EGLImage::HasExtensions(GLLibraryEGL* egl, GLContext* gl) {
  return egl->HasKHRImageBase() &&
         egl->IsExtensionSupported(GLLibraryEGL::KHR_gl_texture_2D_image) &&
         (gl->IsExtensionSupported(GLContext::OES_EGL_image_external) ||
          gl->IsExtensionSupported(GLContext::OES_EGL_image));
}
Followed by the ESR 91 version:
static bool HasEglImageExtensions(const GLContextEGL& gl) {
  const auto& egl = *(gl.mEgl);
  return egl.HasKHRImageBase() &&
         egl.IsExtensionSupported(EGLExtension::KHR_gl_texture_2D_image) &&
         (gl.IsExtensionSupported(GLContext::OES_EGL_image_external) ||
          gl.IsExtensionSupported(GLContext::OES_EGL_image));
}
As you can see, they're similar but not quite identical. Unfortunately the debugger claims the IsExtensionSupported() methods have been optimised out. But it's a pretty simple method, just returning as it does the value in the mAvailableExtensions array referenced by aKnownExtension.
  bool IsExtensionSupported(EGLExtensions aKnownExtension) const {
    return mAvailableExtensions[aKnownExtension];
  }
There's a change on ESR 91 where the aKnownExtension is first redirected via the UnderlyingValue() method. Here's the ESR 91 version:
  bool IsExtensionSupported(EGLExtension aKnownExtension) const {
    return mAvailableExtensions[UnderlyingValue(aKnownExtension)];
  }
We'll come back to UnderlyingValue() in a bit. Now that we know the implementations we can make use of this info when we perform our debugging to circumnavigate the fact the methods have been optimised out: we can just access the mAvailableExtensions array used by each directly instead. Let's take a look at that. First let's look at the values in ESR 78:
(gdb) b HasExtensions
Breakpoint 2 at 0x7fb8e84d70: HasExtensions. (2 locations)
(gdb) c
Continuing.

Thread 36 "Compositor" hit Breakpoint 2, mozilla::gl::SharedSurface_EGLImage::
    HasExtensions (egl=0x7eac0036a0, gl=0x7eac109140)
    at gfx/gl/SharedSurfaceEGL.cpp:59
59        return egl->HasKHRImageBase() &&
(gdb) p egl.mAvailableExtensions
$1 = std::bitset = {  [0] = 1,   [2] = 1,   [3] = 1,   [5] = 1,   [6] = 1,
                      [7] = 1,  [13] = 1,  [21] = 1,  [22] = 1}
(gdb) p gl.mAvailableExtensions
$2 = std::bitset = {  [1] = 1,  [57] = 1,  [58] = 1,  [60] = 1,  [72] = 1,
                     [75] = 1,  [77] = 1,  [78] = 1,  [86] = 1,  [87] = 1,
                     [96] = 1,  [97] = 1, [100] = 1, [111] = 1, [112] = 1,
                    [113] = 1, [114] = 1, [115] = 1, [117] = 1, [118] = 1,
                    [120] = 1, [121] = 1, [122] = 1, [123] = 1, [125] = 1,
                    [126] = 1, [127] = 1, [128] = 1, [129] = 1, [130] = 1,
                    [131] = 1, [132] = 1}
(gdb) 
And for contrast, let's see what happens on ESR 91 using the same process:
(gdb) b HasEglImageExtensions
Breakpoint 1 at 0x7ff11322a0: file include/c++/8.3.0/bitset, line 1163.
(gdb) c
Continuing.
[LWP 26957 exited]
[LWP 26952 exited]
[New LWP 27078]
[LWP 27037 exited]
[Switching to LWP 26961]

Thread 38 "Compositor" hit Breakpoint 1, mozilla::gl::HasEglImageExtensions
    (gl=...)
    at ${PROJECT}/gfx/gl/SharedSurfaceEGL.cpp:28
28      ${PROJECT}/gfx/gl/SharedSurfaceEGL.cpp: No such file or directory.
(gdb) p egl.mAvailableExtensions
$1 = std::bitset = {  [0] = 1,   [2] = 1,   [4] = 1,   [5] = 1,   [6] = 1,
                      [7] = 1,   [8] = 1,  [11] = 1,  [16] = 1,  [17] = 1,
                     [22] = 1}
(gdb) p gl.mAvailableExtensions
$2 = std::bitset = {  [1] = 1,  [57] = 1,  [58] = 1,  [60] = 1,  [72] = 1,
                     [75] = 1,  [77] = 1,  [78] = 1,  [86] = 1,  [87] = 1,
                     [88] = 1, [97] = 1,   [99] = 1, [101] = 1, [102] = 1,
                    [113] = 1, [114] = 1, [115] = 1, [116] = 1, [117] = 1,
                    [119] = 1, [120] = 1, [122] = 1, [123] = 1, [124] = 1,
                    [125] = 1, [127] = 1, [128] = 1, [129] = 1, [130] = 1,
                    [131] = 1, [132] = 1, [133] = 1, [134] = 1}
(gdb) 
It's noticeable that neither the egl nor the gl values are identical across the two versions. The obvious question is whether this is a real difference, or whether the UnderlyingValue() method is obscuring the fact that they're the same. Here's what the code has to say about UnderlyingValue():
/**
 * Get the underlying value of an enum, but typesafe.
 *
 * example:
 *
 *   enum class Pet : int16_t {
 *     Cat,
 *     Dog,
 *     Fish
 *   };
 *   enum class Plant {
 *     Flower,
 *     Tree,
 *     Vine
 *   };
 *   UnderlyingValue(Pet::Fish) -> int16_t(2)
 *   UnderlyingValue(Plant::Tree) -> int(1)
 */
template <typename T>
inline constexpr auto UnderlyingValue(const T v) {
  static_assert(std::is_enum_v<T>);
  return static_cast<typename std::underlying_type<T>::type>(v);
}
So this isn't actually changing the value, it's checking and casting it to the appropriate type. So we can ignore this when we're comparing values and conclude that the mAvailableExtensions array definitely has different indices set to true between ESR 78 and ESR 91. But we still need to check the enums that these represent in order to be sure that these are real differences.

Both egl and gl use different enums, so we'll need to consider them separately.

Here's the enum associated with egl in ESR 78, found in GlLibraryEGL.h:
 0: KHR_image_base
 2: KHR_gl_texture_2D_image
 3: KHR_lock_surface
 5: EXT_create_context_robustness
 6: KHR_image
 7: KHR_fence_sync
13: KHR_create_context
21: KHR_surfaceless_context
22: KHR_create_context_no_error
Based on the HasExtensions() implementation we're interested in KHR_gl_texture_2D_image, KHR_image and KHR_image_base; all of which are present in the list above (indices 2, 6 and 0).

On ESR 91, the related enum, also found in GlLibraryEGL.h, looks like this:
 0: KHR_image_base
 2: KHR_gl_texture_2D_image
 4: ANGLE_surface_d3d_texture_2d_share_handle
 5: EXT_create_context_robustness
 6: KHR_image
 7: KHR_fence_sync
 8: ANDROID_native_fence_sync
11: ANGLE_platform_angle_d3d
16: EXT_device_query
17: NV_stream_consumer_gltexture_yuv
22: KHR_create_context_no_error
Again, looking at the code and based on HasEglImageExtensions() we're interested in the same flags: KHR_gl_texture_2D_image, KHR_image and KHR_image_base. All of these are also present in the ESR 91 list (indices 2, 6 and 0).

So, no obvious problems on the egl side. Let's now check the longer enum for gl. Here are the active values based on the ESR 78 list available in GLContext.h:
  1: AMD_compressed_ATC_texture
 57: EXT_color_buffer_float
 58: EXT_color_buffer_half_float
 60: EXT_disjoint_timer_query
 72: EXT_multisampled_render_to_texture
 75: EXT_read_format_bgra
 77: EXT_sRGB
 78: EXT_sRGB_write_control
 86: EXT_texture_filter_anisotropic
 87: EXT_texture_format_BGRA8888
 96: IMG_texture_npot
 97: KHR_debug
100: KHR_robustness
111: NV_transform_feedback
112: NV_transform_feedback2
113: OES_EGL_image
114: OES_EGL_image_external
115: OES_EGL_sync
117: OES_depth24
118: OES_depth32
120: OES_element_index_uint
121: OES_fbo_render_mipmap
122: OES_framebuffer_object
123: OES_packed_depth_stencil
125: OES_standard_derivatives
126: OES_stencil8
127: OES_texture_3D
128: OES_texture_float
129: OES_texture_float_linear
130: OES_texture_half_float
131: OES_texture_half_float_linear
132: OES_texture_npot
From the ESR 78 code the ones we're interested in are just OES_EGL_image_external and OES_EGL_image. These are both in the list (indices 114 and 113). What about ESR 91? Here's the enum list in this case:
  1: AMD_compressed_ATC_texture
 57: EXT_color_buffer_float
 58: EXT_color_buffer_half_float
 60: EXT_disjoint_timer_query
 72: EXT_multisampled_render_to_texture
 75: EXT_read_format_bgra
 77: EXT_sRGB
 78: EXT_sRGB_write_control
 86: EXT_texture_filter_anisotropic
 87: EXT_texture_format_BGRA8888
 88: EXT_texture_norm16
 97: KHR_debug
 99: KHR_robust_buffer_access_behavior
101: KHR_texture_compression_astc_hdr
102: KHR_texture_compression_astc_ldr
113: OES_EGL_image
114: OES_EGL_image_external
115: OES_EGL_sync
116: OES_compressed_ETC1_RGB8_texture
117: OES_depth24
119: OES_depth_texture
120: OES_element_index_uint
122: OES_framebuffer_object
123: OES_packed_depth_stencil
124: OES_rgb8_rgba8
125: OES_standard_derivatives
127: OES_texture_3D
128: OES_texture_float
129: OES_texture_float_linear
130: OES_texture_half_float
131: OES_texture_half_float_linear
132: OES_texture_npot
133: OES_vertex_array_object
134: OVR_multiview2
Once again from the ESR 91 code we can see the ones we're interested in are the same: OES_EGL_image_external and OES_EGL_image. These are both in the list (indices 114 and 113). So what gives? Both methods have the appropriate flags set, so why is one succeeding and the other failure?

It's not clear to me right now. Something is wrong, but I can't see where. I'd love to dig deeper in to this today but my mind has reached its limit. I'll have to pick this up again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments