[Buildroot] Some analysis of the major build failure reasons

Thomas Petazzoni thomas.petazzoni at bootlin.com
Tue Aug 3 07:28:35 UTC 2021


Hello Giulio,

On Tue, 3 Aug 2021 00:56:24 +0200
Giulio Benetti <giulio.benetti at benettiengineering.com> wrote:

> > I have investigated this. It fails only on sh4, due an internal
> > compiler error. It only occurs at -Os, at -O0 and -O2 it builds fine. I
> > have reported gcc bug
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101737 for this. Since I
> > tested only with gcc 9.3.0 for now, I've started a build with gcc 11.x,
> > to see how it goes.
> > 
> > Based on the result, I'll send a patch adding a new
> > BR2_TOOLCHAIN_GCC_HAS_BUG_101737 and disable -Os on pixman on SuperH
> > based on this.  
> 
> I can do that since I've treated a lot of gcc bugs. Is it ok?

Oh yes, sure! In the mean time, I have confirmed that gcc 11.1.0 is
also affected by the same issue, and I have updated the gcc bug with
that information. So to me, it seems like all gcc versions are affected.

> >>                         unknown | 31  
> > 
> > I did not look into these for now.  
> 
> I've taken a look into the last ones of today(the 31 ones)
> @James: lot of your builds simply stuck after Make has finished(not on 
> linking but exactly on 'make: Leaving directory').
> 
> I've noticed it time ago some time, and now it became very more 
> frequent. This is 9/10 your autobuilder giving that problem.
> And this happens at different Build Duration:
> http://autobuild.buildroot.net/?reason=unknown
> Also I see you use /tmp/ folder but I don't see anyone else doing that. 
> Isn't it maybe that your distro automatically cleans /tmp folder up? Or 
> it is mapped somewhere where disk gets full randomly?
> I would move it to a specific user folder(i.e. buildroot user) and that 
> should fix the problem. If you're trying to do such thing to save disk 
> space I tell you that I've already done it time ago with this patch that 
> now needs to be rebased:
> https://patchwork.ozlabs.org/project/buildroot/patch/20180919114541.17670-1-giulio.benetti@micronovasrl.com/
> But in my case NOK was clear. I'm pretty sure /tmp/ is the problem.

No, I don't think there is any problem with the use of /tmp. The
"unknown" build failures are typically build failures with top-level
parallel build enabled.

If you take
http://autobuild.buildroot.net/results/4ee/4eead81391d76edcdd2823e439f9b8d165b9b7ef/
which is the latest "unknown" build issue at the time of writing this
e-mail, it has BR2_PER_PACKAGE_DIRECTORIES=y. We enable
BR2_PER_PACKAGE_DIRECTORIES for a subset of the builds, and then for
the builds that have BR2_PER_PACKAGE_DIRECTORIES enabled, for a subset
of them, we use top-level parallel build.

However, when there is top-level parallel build enabled, the output is
quite messy (due to multiple packages being built in parallel). And due
to that, the build-end.log may not actually contain the actual build
failure as the failure may be visible much earlier in the build output.

If you check these "unknown" build failures, they all have
BR2_PER_PACKAGE_DIRECTORIES enabled, which really hints at a top-level
parallel build issue.

Perhaps what I should do is cook a patch that keeps the full build log
file for builds that use top-level parallel build, so that we have a
chance to debug these. The problem is going to be the disk-space
consumption on my server, but I guess I could do something that
compresses the build log after X days or something like that.

> On @Thomas autobuilder I see failure for:
> - optee-client(already found NOK):
> http://autobuild.buildroot.net/results/5e9/5e91bc53c3fbcd2ed232db79dc5c947394d66a1e/
> - failure on fetching as mentioned above
> 
> >>                    zeromq-4.3.4 | 30  
> > 
> > Giulio: this is happening only on or1k, with a binutils assert. Do you
> > think this is solved by your or1k fixes?  
> 
> It is. Those build failures are due to the use of binutils-2.33.1 that 
> have not patches for or1k. While all the other binutils versions have
> local or upstreamed or1k patches.
> 
> Of course this can still happen with actual external Bootlin or1k 
> toolchain that use exactly binutils-2.33.1 not patched. So the problem 
> will be solved once you recompile and bump them with patches provided. I 
> still see that we're on 2020.08-1:
> https://git.buildroot.net/buildroot/tree/toolchain/toolchain-external/toolchain-external-bootlin/toolchain-external-bootlin.mk#n586

Yes, I have worked on rebuilding the toolchains with 2021.05 + a few
patches, but I have a runtime issue with Microblaze + glibc, which
doesn't boot (well the kernel boots, but not user-space). Microblaze +
uclibc or musl works fine.

I guess I should probably not delay further the release of 2021.05
toolchains, and leave just the Microblaze/glibc toolchains to 2020.08.

> NOTE:
> I also add libnss-3.68 new bug on Aarch64_BE to be fixed. I'm already 
> working on it on spare time.

Great!

> PS: I've found my autobuilders stopped, I think I've forgotten to 
> restart the daemon after updating the Distro. Now they're up and running.

OK, thanks. That being said, additional autobuilders at the moment are
probably not that important: compared to the CPU power made available
by James through its super high-performance 63 machines, additional
"regular" machines added to the autobuilder pool are not going to help
much.

However, they are going to definitely help when James will no longer
have access to those machines.

Best regards,

Thomas
-- 
Thomas Petazzoni, co-owner and CEO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


More information about the buildroot mailing list