flypig.co.uk

List items

Items from the current list are shown below.

Gecko

28 Aug 2023 : Day 12 #
Yesterday we hit an error with the Rust build process which I tried to fix by reworking an existing patch. If you've been following along you'll know I left the build running overnight.

I'm up bright and early to check the build (after also one quick check in the middle of the night when I couldn't sleep; but it was still crunching away). The results this morning are a little hard to determine.

Somewhere in the middle of the build I spot this.
47:04.13   cargo:warning=during RTL pass: expand
47:04.13   cargo:warning=src/glsl.h: In function ‘glsl::vec2_scalar
           glsl::sign(glsl::vec2_scalar)’:
47:04.13   cargo:warning=src/glsl.h:662:39: internal compiler error:
           Segmentation fault
47:04.13   cargo:warning= float sign(float a) { return copysignf(1.0f, a); }
47:04.13   cargo:warning=                              ~~~~~~~~~^~~~~~~~~
47:04.13   cargo:warning=Please submit a full bug report,
47:04.13   cargo:warning=with preprocessed source if appropriate.
47:04.14   cargo:warning=See  for instructions.
47:04.14   exit status: 1
47:04.14   --- stderr
47:04.14   error occurred: Command "/usr/bin/g++" "-O2" "-ffunction-sections"
           "-fdata-sections" "-fPIC" "-std=gnu++17"
           "-Iobj-build-mer-qt-xr/dist/stl_wrappers"
           "-Iobj-build-mer-qt-xr/dist/system_wrappers" "-include"
           "gecko-dev/config/gcc_hidden.h" "-U_FORTIFY_SOURCE"
           "-D_FORTIFY_SOURCE=2" "-fstack-protector-strong" "-DNDEBUG=1"
           "-DTRIMMED=1" "-Igecko-dev/toolkit/library/rust"
           "-Iobj-build-mer-qt-xr/toolkit/library/rust"
           "-Iobj-build-mer-qt-xr/dist/include" "-I/usr/include/nspr4"
           "-I/usr/include/nss3" "-I/usr/include/nspr4"
           "-Iobj-build-mer-qt-xr/dist/include/nss" "-I/usr/include/pixman-1"
           "-DMOZILLA_CLIENT" "-include" "obj-build-mer-qt-xr/mozilla-config.h"
           "-Wall" "-Wempty-body" "-Wignored-qualifiers" "-Wpointer-arith"
           "-Wsign-compare" "-Wtype-limits" "-Wunreachable-code"
           "-Wno-invalid-offsetof" "-Wduplicated-cond" "-Wimplicit-fallthrough"
           "-Wno-error=maybe-uninitialized"
           "-Wno-error=deprecated-declarations" "-Wno-error=array-bounds"
           "-Wno-error=coverage-mismatch" "-Wno-error=free-nonheap-object"
           "-Wno-multistatement-macros" "-Wno-error=class-memaccess"
           "-Wno-error=unused-but-set-variable" "-Wformat"
           "-Wformat-overflow=2" "-Wno-psabi" "-fno-sized-deallocation"
           "-fno-aligned-new" "-O3" "-I/usr/include/freetype2"
           "-DUSE_ANDROID_OMTC_HACKS=1" "-DUSE_OZONE=1"
           "-DMOZ_UA_OS_AGNOSTIC=1" "-Wno-psabi" "-Wno-attributes" "-Wno-psabi"
           "-Wno-attributes" "-fno-exceptions" "-fno-strict-aliasing" "-fPIC"
           "-ffunction-sections" "-fdata-sections" "-fno-exceptions"
           "-fno-math-errno" "-pthread" "-pipe" "-gdwarf-4" "-freorder-blocks"
           "-O2" "-fno-omit-frame-pointer" "-funwind-tables"
           "-DMOZILLA_CONFIG_H" "-I" "gecko-dev/gfx/wr/webrender/res" "-I"
           "src" "-I"
           "obj-build-mer-qt-xr/aarch64-unknown-linux-gnu/release/build
           /swgl-c7fddee6f1578b80/out"
           "-std=c++17" "-fno-exceptions" "-fno-rtti" "-fno-math-errno"
           "-UMOZILLA_CONFIG_H" "-D_GLIBCXX_USE_CXX11_ABI=0" "-o"
           "obj-build-mer-qt-xr/aarch64-unknown-linux-gnu/release/build
           /swgl-c7fddee6f1578b80/out/src/gl.o"
           "-c" "src/gl.cc" with args "g++" did not execute successfully (status
           code exit status: 1).
47:04.14 warning: build failed, waiting for other jobs to finish...
Unravelling this error is a challenge, but in the middle of it there's a "Segmentation fault". That's not an error that the compiler ought to be generating. This sort of thing can happen if, for example, the compiler runs out of memory.

Further along I see this:
50:42.73 gecko-dev/gfx/thebes/gfxFcPlatformFontList.cpp:1725:72:
         error: could not convert ‘false’ from ‘bool’ to ‘mozilla::WeightRange’
50:42.74                                               weight,     stretch, style};
50:42.74                                                                         ^
This suggests that all of the errors from earlier indeed haven't been fixed. Then finally there is also this:
47:02.31 warning: during RTL pass: expand
47:02.32 warning: src/glsl.h: In function ‘glsl::vec2_scalar
         glsl::sign(glsl::vec2_scalar)’:
47:02.32 warning: src/glsl.h:662:39: internal compiler error: Segmentation fault
47:02.33 warning:  float sign(float a) { return copysignf(1.0f, a); }
47:02.33 warning:                               ~~~~~~~~~^~~~~~~~~
47:02.33 warning: Please submit a full bug report,
47:02.33 warning: with preprocessed source if appropriate.
47:02.33 warning: See  for instructions.
47:02.33 error: failed to run custom build command for `swgl v0.1.0
         (gecko-dev/gfx/wr/swgl)`
This is also a segmentation fault. Not encouraging.

Part of the challenge here is disentangling the separate jobs. With up to sixteen running simultaneously, the original source error can be tricky to determine. Probably these are three separate errors, but on a fresh run we might hit one and not another.

To circumnavigate this I've dropped the build down to using just a single job by editing the spec file. Maybe this will make things clearer.
#./mach build -j$RPM_BUILD_NCPUS
./mach build -j1
The build very quickly stumbles on the swgl segmentation fault error. Running the build a third time I get the same result. So it looks like this is where we're at, and the next thing I need to fix.

The error occurs building swgl, but I am a little concerned the underlying issue is still webrender related, since this is also mentioned in the debug output.

The error itself is happening while building gl.o on line 662 of src/glsl.h. This line looks like this:
float sign(float a) { return copysignf(1.0f, a); }
The line is unremarkable and the error "Segmentation fault" is nothing to do with the code we see there. It could be the compiler is running out of resources; it could be a compiler versioning issue; even something more esoteric like a compiler bug.

The debug output does at least include the full command being executed when the error occurred, so I can rerun and tweak this manually to see the results. This is always a helpful and time-saving technique.
$ sfdk engine exec
$ sb2 -R -m sdk-install -t SailfishOS-devel-aarch64.default
$ /usr/bin/g++ -O2 -ffunction-sections -fdata-sections -fPIC -std=gnu++17 \
  -I${PROJECT}/obj-build-mer-qt-xr/dist/stl_wrappers \
  -I${PROJECT}/obj-build-mer-qt-xr/dist/system_wrappers \
  -include ${PROJECT}/gecko-dev/config/gcc_hidden.h -U_FORTIFY_SOURCE \
  -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 \
  -I${PROJECT}/gecko-dev/toolkit/library/rust \
  -I${PROJECT}/obj-build-mer-qt-xr/toolkit/library/rust \
  -I${PROJECT}/obj-build-mer-qt-xr/dist/include -I/usr/include/nspr4 \
  -I/usr/include/nss3 -I/usr/include/nspr4 \
  -I${PROJECT}/obj-build-mer-qt-xr/dist/include/nss -I/usr/include/pixman-1 \
  -DMOZILLA_CLIENT -include ${PROJECT}/obj-build-mer-qt-xr/mozilla-config.h \
  -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare \
  -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wduplicated-cond \
  -Wimplicit-fallthrough -Wno-error=maybe-uninitialized \
  -Wno-error=deprecated-declarations -Wno-error=array-bounds \
  -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object \
  -Wno-multistatement-macros -Wno-error=class-memaccess \
  -Wno-error=unused-but-set-variable -Wformat -Wformat-overflow=2 -Wno-psabi \
  -fno-sized-deallocation -fno-aligned-new -O3 -I/usr/include/freetype2 \
  -DUSE_ANDROID_OMTC_HACKS=1 -DUSE_OZONE=1 -DMOZ_UA_OS_AGNOSTIC=1 -Wno-psabi \
  -Wno-attributes -Wno-psabi -Wno-attributes -Wno-psabi -Wno-attributes \
  -Wno-psabi -Wno-attributes -Wno-psabi -Wno-attributes -Wno-psabi \
  -Wno-attributes -fno-exceptions -fno-strict-aliasing -fPIC \
  -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread \
  -pipe -gdwarf-4 -freorder-blocks -O2 -fno-omit-frame-pointer -funwind-tables \
  -DMOZILLA_CONFIG_H -I ${PROJECT}/gecko-dev/gfx/wr/webrender/res -I src \
  -I ${PROJECT}/obj-build-mer-qt-xr/aarch64-unknown-linux-gnu/release/build/swgl-c7fddee6f1578b80/out \
  -std=c++17 -fno-exceptions -fno-rtti -fno-math-errno -UMOZILLA_CONFIG_H \
  -D_GLIBCXX_USE_CXX11_ABI=0 \
  -o ${PROJECT}/obj-build-mer-qt-xr/aarch64-unknown-linux-gnu/release/build/swgl-c7fddee6f1578b80/out/src/gl.o \
  -c src/gl.cc
Soon after I've issued this command the fans on my laptop whir up. It's taking a while and stressing the processor. Something is happening.

Using this abbreviated build process the error is triggered at exactly the same spot:
during RTL pass: expand
src/glsl.h: In function ‘glsl::vec2_scalar glsl::sign(glsl::vec2_scalar)’:
src/glsl.h:662:39: internal compiler error: Segmentation fault
 float sign(float a) { return copysignf(1.0f, a); }
                              ~~~~~~~~~^~~~~~~~~
Here "RTL" means "Register Transfer Language". I'm no compiler expert and the internals of gcc are mysterious to me, but a quick skim across the Internet suggests that "expand" is an RTL optimisation pass. As we can see in the command above the compiler is currently using O2 level optimisation. If this really is an optimisation problem, or even just a compiler bug, then changing the optimisation level might help.

And indeed, running the abbreviated build again using O1 (noting that the flag has to be replaced twice in the command) yields good results. Now the command goes through without the segmentation fault error. Ordinarily I'd look in the mozconfig.merqtxulrunner file to adjust the optimisation level, but there it's stated as O3, so the O2 must be coming from somewhere else.

Inside the gfx/wr/swgl/build.rs I find the following:
        // SWGL relies heavily on inlining for performance so override -Oz with -O2
        if tool.args().contains(&"-Oz".into()) {
            build.flag("-O2");
        }
So switching this for O1 might be a good thing to try.

I do that and build again. Now a different error is coming up, so maybe that did the trick? We won't know for sure until further down the line. Now we have this error:
 6:29.21 fatal runtime error: Rust cannot catch foreign exceptions
 6:29.35 warning: `style` (lib) generated 5 warnings
 6:29.35 error: could not compile `style`; 5 warnings emitted
 6:29.35 Caused by:
 6:29.35   process didn't exit successfully: `/usr/bin/rustc --crate-name [...]`
           (signal: 6, SIGABRT: process abort signal)
Again, we have a full command to test, so I'll run it manually. But as I've reached the end of my available coding time today, the results of that will have to wait until tomorrow.

As always, you can find all the other posts on my Gecko Dev Diary.

Comments

Uncover Disqus comments