FAST_FUNC not working well with GCC's LTO

Kang-Che Sung explorer09 at gmail.com
Thu Jan 5 01:25:03 UTC 2017


(This mail and patch was sent to busybox mailing list on Dec 25, 2016,
and I'm re-sending again for people to notice.)

Busybox uses FAST_FUNC macro to tweak with IA-32 calling conventions in
order to make the function call slightly smaller or slightly faster.
However, when I experiment with GCC's LTO (Link Time Optimization), I
discovered that FAST_FUNC could hinder LTO's optimization so that the
resulting executable become a few bytes larger (than what is compiled
without FAST_FUNC).

Although I can comment out the FAST_FUNC lines in include/platform.h to
achieve the level of optimization I want, may I suggest a way for user
to disable FAST_FUNC conveniently?

For example, let me specify CONFIG_EXTRA_CFLAGS="-DFAST_FUNC= -flto"
and I can compile with LTO without a source code hack. It seems like
GCC does not yet provide a macro or a way to detect LTO in code, so
this is the best suggestion I could have.

The changes will be something like below. I would like some comments
about this problem and my suggestion. Please?

Kang-Che Sung ("Explorer")

--------

diff --git a/include/platform.h b/include/platform.h
index c987d418c..7e537b950 100644
--- a/include/platform.h
+++ b/include/platform.h
@@ -108,13 +108,19 @@
  * and/or smaller by using modified ABI. It is usually only needed
  * on non-static, busybox internal functions. Recent versions of gcc
  * optimize statics automatically. FAST_FUNC on static is required
- * only if you need to match a function pointer's type */
-#if __GNUC_PREREQ(3,0) && defined(i386) /* || defined(__x86_64__)? */
+ * only if you need to match a function pointer's type.
+ * FAST_FUNC may not work well with -flto so allow user to disable this.
+ * (-DFAST_FUNC= ) */
+#ifndef FAST_FUNC
+# if __GNUC_PREREQ(3,0) && defined(i386)
 /* stdcall makes callee to pop arguments from stack, not caller */
-# define FAST_FUNC __attribute__((regparm(3),stdcall))
+#  define FAST_FUNC __attribute__((regparm(3),stdcall))
 /* #elif ... - add your favorite arch today! */
-#else
-# define FAST_FUNC
+/* x86_64 doesn't need this - its ABI can't be tweaked like IA-32 (can't use
+ * stdcall; the ABI uses 6 regparms already). */
+# else
+#  define FAST_FUNC
+# endif
 #endif

 /* Make all declarations hidden (-fvisibility flag only affects definitions) */


More information about the busybox mailing list