[Buildroot] [PATCH/next v2 1/1] package/icu: Add support to generate a subset of ICU data
Thomas Petazzoni
thomas.petazzoni at bootlin.com
Sun Jul 18 21:11:20 UTC 2021
Hello Bernd,
On Tue, 1 Jun 2021 08:06:08 +0200
Bernd Kuhls <bernd.kuhls at t-online.de> wrote:
> This would reduce the size of libicudata.so to 12M.
>
> [1] https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md
This is really some good and useful contribution, but (as usual), I
have a couple of questions/requests.
> +config BR2_PACKAGE_ICU_DATA_FILTER_FILE
> + string "Path to custom data configuration file"
> + help
> + The ICU Data Build Tool enables you to write a configuration
> + file that specifies what features and locales to include in a
> + custom data bundle:
> + https://github.com/unicode-org/icu/blob/main/docs/userguide/icu_data/buildtool.md
> + Leave empty to not use this functionality.
We already have BR2_PACKAGE_ICU_CUSTOM_DATA_PATH which was also meant
to allow building ICU with a smaller dataset, based on pre-compiled
.dat file. It points to using http://apps.icu-project.org/datacustom/,
which in fact seems to no longer exists, as it redirects to
https://unicode-org.atlassian.net/browse/ICU-12977/, which says that
the feature is no longer available since ICU 58.x.
So I guess we should first drop BR2_PACKAGE_ICU_CUSTOM_DATA_PATH which
is already unusable today.
Then the second question is whether we can provide/generate some kind
of default file in Buildroot that would in a default Buildroot
configuration generate a more minimal ICU dataset. For example by
building only the support for English ? Or based on the value of
BR2_GENERATE_LOCALE ?
I know this would break backward compatibility, but it would really
make ICU a lot more reasonable in size for most of our users. Indeed,
they are unlikely to know that they can reduce the size of ICU by using
this new feature.
> +ICU_DATA_FILTER_FILE = $(call qstrip,$(BR2_PACKAGE_ICU_DATA_FILTER_FILE))
> +
> +ifneq ($(ICU_DATA_FILTER_FILE),)
> +HOST_ICU_DATA_SOURCE = $(subst src.tgz,data.zip,$(ICU_SOURCE))
> +HOST_ICU_EXTRA_DOWNLOADS += $(HOST_ICU_SITE)/$(HOST_ICU_DATA_SOURCE)
> +
> +define HOST_ICU_EXTRACT_DATA
> + rm -rf $(@D)/$(HOST_ICU_SUBDIR)/data
> + $(UNZIP) $(ICU_DL_DIR)/$(HOST_ICU_DATA_SOURCE) -d $(@D)/$(HOST_ICU_SUBDIR)
> +endef
> +HOST_ICU_POST_EXTRACT_HOOKS += HOST_ICU_EXTRACT_DATA
> +
> +HOST_ICU_CONF_ENV = ICU_DATA_FILTER_FILE=$(ICU_DATA_FILTER_FILE)
> +HOST_ICU_CONF_OPTS += --with-data-packaging=archive
> +
> +define ICU_COPY_CUSTOM_DATA
> + $(INSTALL) -D -m 644 $(HOST_ICU_DIR)/$(HOST_ICU_SUBDIR)/data/out/icudt$(ICU_VERSION_MAJOR)l.dat $(@D)/$(ICU_SUBDIR)/data/in/
> +endef
> +ICU_PRE_CONFIGURE_HOOKS += ICU_COPY_CUSTOM_DATA
> +endif
It took me quite some time to understand what was going on here. My
understanding is as follows:
* In a normal build, the pre-compiled source/data/in/icudt69l.dat
provided in the ICU tarball is used.
* When a custom dataset needs to be generated thanks to your new
option, we need to download the source of this dataset as an extra
download for the host-icu package, replace the source/data/ subdir
with this source data set, and ask the host-icu package to generate
icudt69l.dat. Then the target icu package grabs this pre-compiled
icudt69l.dat.
Is that correct ? If so, then I'd say it would be useful to add a
comment above this code, as I find the logic to not be that trivial.
Thanks!
Thomas
--
Thomas Petazzoni, co-owner and CEO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
More information about the buildroot
mailing list