[PATCH] pselect: Use linux pselect syscall when available

Rich Felker dalias at libc.org
Sat Dec 19 04:44:13 UTC 2015


On Fri, Dec 18, 2015 at 07:39:19PM -0800, Nicolas S. Dade wrote:
> This supercedes the previous pselect patch from 16 Dec 2015.
> 
> Linux has a pselect syscall since 2.6.something. Using it
> rather than emulating it with sigprocmask+select+sigprocmask
> is smaller code, and works properly. (The emulation has
> race conditions when unblocked signals arrive before or
> after the select)
> 
> The tv.nsec >= 1E9 handling comes from uclibc's linux select()
> implementation, which itself uses pselect() internally if the
> pselect syscall exists. I though it would be good to do the
> same here.
> 
> Note that although the libc pselect() API has 6 arguments,
> the linux kernel syscall as 7 arguments. There is an extra,
> somewhat vestigial, sizeof the signal mask argument.
> 
> Signed-off-by: Nicolas S. Dade <nic.dade at gmail.com>
> ---
>  libc/sysdeps/linux/common/pselect.c | 52 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/libc/sysdeps/linux/common/pselect.c b/libc/sysdeps/linux/common/pselect.c
> index bf19ce3..3f1dd28 100644
> --- a/libc/sysdeps/linux/common/pselect.c
> +++ b/libc/sysdeps/linux/common/pselect.c
> @@ -30,6 +30,57 @@ static int __NC(pselect)(int nfds, fd_set *readfds, fd_set *writefds,
>  			 fd_set *exceptfds, const struct timespec *timeout,
>  			 const sigset_t *sigmask)
>  {
> +#ifdef __NR_pselect6
> +#define NSEC_PER_SEC 1000000000L
> +	struct timespec _ts, *ts = 0;
> +	if (timeout) {
> +		/* The Linux kernel can in some situations update the timeout value.
> +		 * We do not want that so use a local variable.
> +		 */
> +		_ts = *timeout;
> +
> +		/* GNU extension: allow for timespec values where the sub-sec
> +		* field is equal to or more than 1 second.  The kernel will
> +		* reject this on us, so take care of the time shift ourself.
> +		* Some applications (like readline and linphone) do this.
> +		* See 'clarification on select() type calls and invalid timeouts'
> +		* on the POSIX general list for more information.
> +		*/
> +		if (_ts.tv_nsec >= NSEC_PER_SEC) {
> +			_ts.tv_sec += _ts.tv_nsec / NSEC_PER_SEC;
> +			_ts.tv_nsec %= NSEC_PER_SEC;
> +		}
> +
> +		ts = &_ts;
> +	}
> +
> +	/* The pselect6 syscall API is strange. It wants a 7th arg to be
> +	 * the sizeof(*sigmask). However syscalls with > 6 arguments aren't
> +	 * supported on linux. So arguments 6 and 7 are stuffed in a struct
> +	 * and a pointer to that struct is passed as the 6th argument to
> +	 * the syscall.
> +	 * Glibc stuffs arguments 6 and 7 in a ulong[2]. Linux reads
> +	 * them as if there were a struct { sigset_t*; size_t } in
> +	 * userspace. There woudl be trouble if userspace and the kernel are
> +	 * compiled differently enough that size_t isn't the same as ulong,
> +	 * but not enough to trigger the compat layer in linux. I can't
> +	 * think of such a case, so I'm using linux's struct.
> +	 * Furthermore Glibc sets the sigsetsize to _NSIG/8. However linux
> +	 * checks for sizeof(sigset_t), which internally is a ulong array.
> +	 * This means that if _NSIG isn't a multiple of BITS_PER_LONG then
> +	 * linux will refuse glibc's value. So I prefer sizeof(sigset_t) for
> +	 * the value of sigsetsize.
> +	 */
> +	struct {
> +		const sigset_t *sigmask;
> +		size_t sigsetsize;
> +	} args67 = {
> +		sigmask,
> +		sizeof(sigset_t),
> +	};

This might work if uclibc defines sigset_t not to have the ridiculous
expansion space for 1024 signals like glibc does, but it still seems
fragile. I'd use _NSIG/8. That's what we do in musl. (If the size you
pass to the kernel is wrong you get EINVAL or something like that.)

Rich


More information about the uClibc mailing list