[PATCH 1/2] udhcpd: sanitize invalid hostnames to match rfcs

Isaac Dunham ibid.ag at gmail.com
Tue Oct 20 03:47:07 UTC 2015


On Mon, Oct 19, 2015 at 05:41:16PM -0400, Rich Felker wrote:
> On Mon, Oct 19, 2015 at 10:52:27AM +0200, walter harms wrote:
> > 
> > 
> > Am 18.10.2015 23:26, schrieb Isaac Dunham:
> > > On Sun, Oct 18, 2015 at 07:55:38PM +0200, walter harms wrote:
> > >>
> > >>
> > >> Am 18.10.2015 07:54, schrieb Isaac Dunham:
> > >>> RFC952/RFC1123 limit the characters in a hostname for a node to
> > >>> [-a-zA-Z0-9], with '-' being legal only in the middle; we were
> > >>> accepting everything from ' ' to '~'.
[snip code]
> > >> since several tools check for hostnames,
> > >> maybe it is useful to make this a function ?
> > > 
> > > What this does is not  simply 'check for validity'; it *makes* a hostname
> > > valid, which is not what most tools need.
> > > It also is exclusively for leaf node names, rather than an FQDN (ie,
> > > '.' is not valid here).
> > > 
> > > It would be possible to design a function that can check or fix the
> > > hostname depending how it's called, though I wonder if that's
> > > doing too much in a single call.
> > > 
> > > It would probably have to be something like this:
> > > 
> > > #define HOSTCHECK_LEAF	0x1 //leaf hostname-no '.' allowed
> > > #define HOSTCHECK_FIX	0x2 //fix-replace invalid chars with '-'/'X'
> > > 
> > > //return NULL if valid, pointer to first invalid char otherwise
> > > char * validate_hostname(char *p, int flags);
> > > 
> > > This does not handle transforming a URL via punycode, of course.
(This comment was intended as a paranthetical, though I forgot the 
parantheses.)
> > > Would such an interface be desireable?

('such an interface' was intended to refer to validate_hostname(), as it
seems Walter took it.)
> > note: i did not make an inventory if this is needed by other
> >       programms but i can imagine that with 'hostname' it would be useful.
> 
> I see no reason hostnames should be represented as punycode anywhere
> except DNS query packets, or in other protocols that require encoding
> as such. Everywhere else they should just be normal printable text.

As far as I'm aware, the restriction is 'alphanumeric ascii or sometimes
-', and the standards apply only to network protocols (so basically
DHCP and DNS, including all the networking programs that accept
hostnames), and the discussion is about where we need to force
hostnames to comply with that.

punycode is one common convention on how to map non-ascii printable
text to permitted chars, though we don't currently support it and I'm
eager to not implement it myself.

I suspect Walter assumed that the standards applied to all hostnames,
and was suggesting that some form of sanitization is needed there.
I'm inclined to think that:
-hostname should fail rather than silently mutilating the name, and it
 should impose no restrictions above the kernel's restrictions.
-DNS queries could be rejected, passed on as invalid hostnames, or fixed
 to punycode; the notes in the RFCs referred to seem to imply that
 there *is* a possibility of technically invalid hostnames existing,
 but configurations should avoid it because it renders a host
 inaccessible to some clients.
 I would think that we should try to avoid publishing invalid
 hostnames, but recognize that someone might have managed to ignore
 the RFCs - after all, we might end up on a network that used
 busybox 1.23 dnsd, with one host named 'host_1' or similar...
 (I haven't checked dnsd, but it may have a similar bug.)
-udhcpc should probably sanitize invalid hostnames or reject them; it
 runs on the user end, so a bad config is fixable there.
 dnsd should emit/log an error on encountering them; I don't know
 whether sanitizing or ignoring bad hostnames is better, but aborting
 would be a potential DOS issue.
 udhcpd only deals with hostnames once a lease is granted, so we can only
 delete invalid hostnames or sanitize them.

Thanks,
Isaac Dunham


More information about the busybox mailing list