[PATCH 1/2] udhcpd: sanitize invalid hostnames to match rfcs

Rich Felker dalias at libc.org
Mon Oct 19 21:41:16 UTC 2015


On Mon, Oct 19, 2015 at 10:52:27AM +0200, walter harms wrote:
> 
> 
> Am 18.10.2015 23:26, schrieb Isaac Dunham:
> > On Sun, Oct 18, 2015 at 07:55:38PM +0200, walter harms wrote:
> >>
> >>
> >> Am 18.10.2015 07:54, schrieb Isaac Dunham:
> >>> RFC952/RFC1123 limit the characters in a hostname for a node to
> >>> [-a-zA-Z0-9], with '-' being legal only in the middle; we were
> >>> accepting everything from ' ' to '~'.
> >>> (As a byproduct of this, the hostname in dumpleases can now be safely
> >>> used from scripts without sanitization.)
> >>>
> >>> function                                             old     new   delta
> >>> add_lease                                            326     363     +37
> >>> ------------------------------------------------------------------------------
> >>> (add/remove: 0/0 grow/shrink: 1/0 up/down: 37/0)               Total: 37 bytes
> >>>    text	   data	    bss	    dec	    hex	filename
> >>>  892983	   6844	   7288	 907115	  dd76b	busybox_old
> >>>  893020	   6844	   7288	 907152	  dd790	busybox_unstripped
> >>> ---
> >>>  networking/udhcp/leases.c | 13 ++++++++++---
> >>>  1 file changed, 10 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/networking/udhcp/leases.c b/networking/udhcp/leases.c
> >>> index 745340a..1f7af87 100644
> >>> --- a/networking/udhcp/leases.c
> >>> +++ b/networking/udhcp/leases.c
> >>> @@ -65,12 +65,19 @@ struct dyn_lease* FAST_FUNC add_lease(
> >>>  			if (hostname_len > sizeof(oldest->hostname))
> >>>  				hostname_len = sizeof(oldest->hostname);
> >>>  			p = safe_strncpy(oldest->hostname, hostname, hostname_len);
> >>> -			/* sanitization (s/non-ASCII/^/g) */
> >>> +			/* sanitization - per rfcs 952 & 1123 only [-a-zA-Z0-9] are legal
> >>> +			 * with '-' being allowed only in the middle
> >>> +			 */
> >>>  			while (*p) {
> >>> -				if (*p < ' ' || *p > 126)
> >>> -					*p = '^';
> >>> +				if (! (isupper((char)*p) || islower((char)*p) ||
> >>> +						isdigit((char)*p) || (char)*p == '-') )
> >>> +					*p = '-';
> >>>  				p++;
> >>>  			}
> >>> +			if (p--, *p == '-')
> >>> +				*p = 'X';
> >>> +			if (p = oldest->hostname, *p == '-')
> >>> +				*p = 'X';
> >>>  		}
> >>>  		if (chaddr)
> >>>  			memcpy(oldest->lease_mac, chaddr, 6);
> >>
> >> since several tools check for hostnames,
> >> maybe it is useful to make this a function ?
> > 
> > What this does is not  simply 'check for validity'; it *makes* a hostname
> > valid, which is not what most tools need.
> > It also is exclusively for leaf node names, rather than an FQDN (ie,
> > '.' is not valid here).
> > 
> > It would be possible to design a function that can check or fix the
> > hostname depending how it's called, though I wonder if that's
> > doing too much in a single call.
> > 
> > It would probably have to be something like this:
> > 
> > #define HOSTCHECK_LEAF	0x1 //leaf hostname-no '.' allowed
> > #define HOSTCHECK_FIX	0x2 //fix-replace invalid chars with '-'/'X'
> > 
> > //return NULL if valid, pointer to first invalid char otherwise
> > char * validate_hostname(char *p, int flags);
> > 
> > This does not handle transforming a URL via punycode, of course.
> > 
> > Would such an interface be desireable?
> 
> note: i did not make an inventory if this is needed by other
>       programms but i can imagine that with 'hostname' it would be useful.

I see no reason hostnames should be represented as punycode anywhere
except DNS query packets, or in other protocols that require encoding
as such. Everywhere else they should just be normal printable text.

Rich


More information about the busybox mailing list