[Buildroot] Some analysis of the major build failure reasons

Giulio Benetti giulio.benetti at benettiengineering.com
Wed Aug 4 10:04:50 UTC 2021


Hello Thomas,

On 8/3/21 2:48 PM, Giulio Benetti wrote:
> Hello Thomas,
> 
> On 8/3/21 2:24 PM, Giulio Benetti wrote:
>> On 8/3/21 9:28 AM, Thomas Petazzoni wrote:
>>> Hello Giulio,
>>>
>>> On Tue, 3 Aug 2021 00:56:24 +0200
>>> Giulio Benetti <giulio.benetti at benettiengineering.com> wrote:
>>>
>>>>> I have investigated this. It fails only on sh4, due an internal
>>>>> compiler error. It only occurs at -Os, at -O0 and -O2 it builds fine. I
>>>>> have reported gcc bug
>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101737 for this. Since I
>>>>> tested only with gcc 9.3.0 for now, I've started a build with gcc 11.x,
>>>>> to see how it goes.
>>>>>
>>>>> Based on the result, I'll send a patch adding a new
>>>>> BR2_TOOLCHAIN_GCC_HAS_BUG_101737 and disable -Os on pixman on SuperH
>>>>> based on this.
>>>>
>>>> I can do that since I've treated a lot of gcc bugs. Is it ok?
>>>
>>> Oh yes, sure! In the mean time, I have confirmed that gcc 11.1.0 is
>>> also affected by the same issue, and I have updated the gcc bug with
>>> that information. So to me, it seems like all gcc versions are affected.
>>
>> Great, I've sent this patchset for it:
>> https://patchwork.ozlabs.org/project/buildroot/list/?series=256392
>>
>>>>>>                            unknown | 31
>>>>>
>>>>> I did not look into these for now.
>>>>
>>>> I've taken a look into the last ones of today(the 31 ones)
>>>> @James: lot of your builds simply stuck after Make has finished(not on
>>>> linking but exactly on 'make: Leaving directory').
>>>>
>>>> I've noticed it time ago some time, and now it became very more
>>>> frequent. This is 9/10 your autobuilder giving that problem.
>>>> And this happens at different Build Duration:
>>>> http://autobuild.buildroot.net/?reason=unknown
>>>> Also I see you use /tmp/ folder but I don't see anyone else doing that.
>>>> Isn't it maybe that your distro automatically cleans /tmp folder up? Or
>>>> it is mapped somewhere where disk gets full randomly?
>>>> I would move it to a specific user folder(i.e. buildroot user) and that
>>>> should fix the problem. If you're trying to do such thing to save disk
>>>> space I tell you that I've already done it time ago with this patch that
>>>> now needs to be rebased:
>>>> https://patchwork.ozlabs.org/project/buildroot/patch/20180919114541.17670-1-giulio.benetti@micronovasrl.com/
>>>> But in my case NOK was clear. I'm pretty sure /tmp/ is the problem.
>>>
>>> No, I don't think there is any problem with the use of /tmp. The
>>> "unknown" build failures are typically build failures with top-level
>>> parallel build enabled.
>>>
>>> If you take
>>> http://autobuild.buildroot.net/results/4ee/4eead81391d76edcdd2823e439f9b8d165b9b7ef/
>>> which is the latest "unknown" build issue at the time of writing this
>>> e-mail, it has BR2_PER_PACKAGE_DIRECTORIES=y. We enable
>>> BR2_PER_PACKAGE_DIRECTORIES for a subset of the builds, and then for
>>> the builds that have BR2_PER_PACKAGE_DIRECTORIES enabled, for a subset
>>> of them, we use top-level parallel build.
>>>
>>> However, when there is top-level parallel build enabled, the output is
>>> quite messy (due to multiple packages being built in parallel). And due
>>> to that, the build-end.log may not actually contain the actual build
>>> failure as the failure may be visible much earlier in the build output.
>>>
>>> If you check these "unknown" build failures, they all have
>>> BR2_PER_PACKAGE_DIRECTORIES enabled, which really hints at a top-level
>>> parallel build issue.
>>
>> +1
>>
>>> Perhaps what I should do is cook a patch that keeps the full build log
>>> file for builds that use top-level parallel build, so that we have a
>>> chance to debug these. The problem is going to be the disk-space
>>> consumption on my server, but I guess I could do something that
>>> compresses the build log after X days or something like that.
>>>
>>>> On @Thomas autobuilder I see failure for:
>>>> - optee-client(already found NOK):
>>>> http://autobuild.buildroot.net/results/5e9/5e91bc53c3fbcd2ed232db79dc5c947394d66a1e/
>>>> - failure on fetching as mentioned above
>>>>
>>>>>>                       zeromq-4.3.4 | 30
>>>>>
>>>>> Giulio: this is happening only on or1k, with a binutils assert. Do you
>>>>> think this is solved by your or1k fixes?
>>>>
>>>> It is. Those build failures are due to the use of binutils-2.33.1 that
>>>> have not patches for or1k. While all the other binutils versions have
>>>> local or upstreamed or1k patches.
>>>>
>>>> Of course this can still happen with actual external Bootlin or1k
>>>> toolchain that use exactly binutils-2.33.1 not patched. So the problem
>>>> will be solved once you recompile and bump them with patches provided. I
>>>> still see that we're on 2020.08-1:
>>>> https://git.buildroot.net/buildroot/tree/toolchain/toolchain-external/toolchain-external-bootlin/toolchain-external-bootlin.mk#n586
>>>
>>> Yes, I have worked on rebuilding the toolchains with 2021.05 + a few
>>> patches, but I have a runtime issue with Microblaze + glibc, which
>>> doesn't boot (well the kernel boots, but not user-space). Microblaze +
>>> uclibc or musl works fine.
>>>
>>> I guess I should probably not delay further the release of 2021.05
>>> toolchains, and leave just the Microblaze/glibc toolchains to 2020.08.
>>
>> As you wish, now we have OpenRisc Buildroot toolchain working, so maybe
>> we don't need to hurry up. It stayed there for long times with bugs.
>> It' only annoying seeing those autobuilders NOK, but we know the reason
>> why they are there.
>>
>>>> NOTE:
>>>> I also add libnss-3.68 new bug on Aarch64_BE to be fixed. I'm already
>>>> working on it on spare time.
>>>
>>> Great! >> PS: I've found my autobuilders stopped, I think I've forgotten to
>>>> restart the daemon after updating the Distro. Now they're up and running.
>>>
>>> OK, thanks. That being said, additional autobuilders at the moment are
>>> probably not that important: compared to the CPU power made available
>>> by James through its super high-performance 63 machines, additional
>>> "regular" machines added to the autobuilder pool are not going to help
>>> much.
>>
>> Wow, now I understand why James has so many build!
>>
>>> However, they are going to definitely help when James will no longer
>>> have access to those machines.
>>
>> +1
>>
>> Fabrice preceded me for gpsd as you've noticed:
>> https://patchwork.ozlabs.org/project/buildroot/patch/20210803120714.2797624-1-fontaine.fabrice@gmail.com/
>>
>>
>> And I'm working on libfuse3 right now.
> 
> This ^^^ "error: symver is only supported on ELF platforms" seems gcc
> related. On meson check if symver is supported everything is based on
> the macro __has_attribute(symver) that must work as expected, while in
> the code we have something like the following that fails with that error:
> ```
> __attribute__ ((symver ("fuse_new at FUSE_3.0")))
> void func1(void)
> {
> 
> }
> 
> ```
> 
> So I try to patch microblaze gcc.

This is fixed by this patch:
https://patchwork.ozlabs.org/project/buildroot/patch/20210803224106.1619032-1-giulio.benetti@benettiengineering.com/

I've opened a gcc bug, but they state that __has_attribute() gives false 
positives:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101766
So basically I've only changed how libfuse3 meson check if symver is 
available. Patch has been merged:
https://github.com/libfuse/libfuse/pull/620

Need some more time to fix libnss Aarch64 BE and eventually trying to 
re-enable the parallel build that seems to be fixed in version 3.61:
https://bugzilla.mozilla.org/show_bug.cgi?id=1688374
and I've retried to send a patch. I'll try to loop a build in 1 of my 
autobuilders before sending that. Also, I can't access Mozilla Bugzilla 
and I'm waiting for them to re-enable it, otherwise I can't access and 
give bug a number and consequently upstream.

Best regards
-- 
Giulio Benetti
Benetti Engineering sas


More information about the buildroot mailing list