How do I (unconditionally) enable unicode support in busybox?

Rich Felker dalias at libc.org
Tue Aug 12 02:59:57 UTC 2014


On Mon, Aug 11, 2014 at 05:15:21PM +0200, Harald Becker wrote:
> >IMO there is still something very strange with sed and unicode
> 
> YES! I did not stop looking for this. Looks like this is a problem
> in the regular expression parser.
> 
> s /./x/g
> 
> shall match every character and replace with a single x, but indeed
> it matches every byte of UTF-8 characters too (which is wrong). But
> this doesn't seam to depend on setting of LANG (which confused me).
> Is it possible, it only worked when BB is linked with glibc in a
> fully functional environment. Maybe than an UTF-8 aware regex
> scanner is used. We need to look further on this!

I think this is the result of using uclibc with a broken regex
implementation -- either as a result of a build time option (omitting
locale? omitting full regex?) or just a deficiency in uclibc. Using
glibc or musl would solve it.

Rich


More information about the busybox mailing list