[Bug 9971] New: shuf implementation does not correctly implement algorithm chosen

bugzilla at busybox.net bugzilla at busybox.net
Tue Jun 20 11:45:05 UTC 2017


https://bugs.busybox.net/show_bug.cgi?id=9971

            Bug ID: 9971
           Summary: shuf implementation does not correctly implement
                    algorithm chosen
           Product: Busybox
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: rob.bast at gmail.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

The shuf implementation refers to the
https://en.wikipedia.org/wiki/Fisher–Yates_shuffle algorithm, however it does
not seem to actually follow it. This leads to peculiar behaviour. For example,
given input file:

    foo
    bar
    baz

after shuffling the input file, foo will never end up back on the first line.
This came to light when I ran into a use-case where someone was selecting a
random line from a file using shuf | head -n 1, and the results on busybox were
showing a statistical anomaly (as in, the first line would never ever be
picked) vs the same process running on environments that had gnu coreutils
installed.

On line https://git.busybox.net/busybox/tree/coreutils/shuf.c#n56 it uses r %=
i, which will result in 0 <= r < i, while the algorithm specifies 0 <= r <= i.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the busybox-cvs mailing list