[Bug 789] New: gunzip/bunzip2/zcat output corrupted

bugzilla at busybox.net bugzilla at busybox.net
Thu Dec 10 15:44:05 UTC 2009


https://bugs.busybox.net/show_bug.cgi?id=789

              Host: ARM Cortex A8
            Target: arm-none-linux-gnueabi
             Build: 1.15.2
           Summary: gunzip/bunzip2/zcat output corrupted
           Product: Busybox
           Version: 1.15.x
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: major
          Priority: P5
         Component: Other
        AssignedTo: unassigned at busybox.net
        ReportedBy: mmcternan at airvana.com
                CC: busybox-cvs at busybox.net
   Estimated Hours: 0.0


Hi,

I'm seeing that sometimes the data returned from gunzip or zcat is corrupted. 
Here is my test script:

  #!/bin/sh

  dd if=$1 of=data count=$2
  cat data | md5sum
  rm -f data.gz
  ls -l data
  gzip data
  ls -lh data.gz

  zcat data.gz | md5sum
  zcat data.gz | md5sum
  zcat data.gz | md5sum
  zcat data.gz | md5sum
  ...
  zcat data.gz | md5sum

The expected result is that the output show the same MD5 sum repeated:

  # ./gziptest /dev/urandom  1024
  1024+0 records in
  1024+0 records out
  6f662731c4e80f5d825131fa0cc894ff  -
  -rw-r--r--    1 0        0          524288 data
  -rw-r--r--    1 0        0          512.3K data.gz
  6f662731c4e80f5d825131fa0cc894ff  -
  6f662731c4e80f5d825131fa0cc894ff  -
  6f662731c4e80f5d825131fa0cc894ff  -
  6f662731c4e80f5d825131fa0cc894ff  -
  ...
  6f662731c4e80f5d825131fa0cc894ff  -

In this case all output hashes match so all is well.  Taking input from
/dev/zero also produces correct results.  

However, when I try one of my own files which contains sparse data, the output
*sometimes* does not match:

  # ./gziptest mybits   1024
  1024+0 records in
  1024+0 records out
  b542db59f26251cf7899484bc384dd3d  -
  -rw-r--r--    1 0        0          524288 data
  -rw-r--r--    1 0        0           43.7K data.gz
  b542db59f26251cf7899484bc384dd3d  -
  b542db59f26251cf7899484bc384dd3d  -
  b542db59f26251cf7899484bc384dd3d  -
  b542db59f26251cf7899484bc384dd3d  -
  b542db59f26251cf7899484bc384dd3d  -
  ....
  b542db59f26251cf7899484bc384dd3d  -
  58feaeb3349aeb1b97cef52ed454bc2c  -
  b542db59f26251cf7899484bc384dd3d  -

The failure rate is somewhere between 1 and 2 bad hashes in 20 attempts.  If I
just md5sum the mybits file without going through any compression, I get the
same hash repeatedly.  So I think this is a problem with gzip/zcat.  Inspecting
a badly expanded file shows that only 1 or 2 bits somewhere in the file are
wrong, although their location sometimes changes.

I wondered if the compiler was maybe doing something bad, so I set the
following:

  #
  # Debugging Options
  #
  CONFIG_DEBUG=y
  CONFIG_DEBUG_PESSIMIZE=y
  # CONFIG_WERROR is not set
  CONFIG_NO_DEBUG_LIB=y
  # CONFIG_DMALLOC is not set
  # CONFIG_EFENCE is not set
  # CONFIG_INCLUDE_SUSv2 is not set

I verified this disabled optimisation using "make V=1", but it still fails. 
This was with gcc version 4.2.3 (Sourcery G++ Lite 2008q1-126), but it also
shows the same problem with gcc version 4.3.2 (Sourcery G++ Lite 2008q3-41).

I also tried the same tests on busybox 1.13.3 and got the same problem.

I decided that maybe I could live with bzip2 instead, so modified the test
script to that.  Sadly this shows the same failures as well.

Finally I wondered if the corruption is typical.  Given a file with a known
sum, the following test can be ran:

  while [ true ] ; do
    gunzip -c data.gz | md5sum | grep -v b542db59f26251cf7899484bc384dd3d
  done

The output shows the following:

  58feaeb3349aeb1b97cef52ed454bc2c  -
  58feaeb3349aeb1b97cef52ed454bc2c  -
  58feaeb3349aeb1b97cef52ed454bc2c  -
  58feaeb3349aeb1b97cef52ed454bc2c  -
  58feaeb3349aeb1b97cef52ed454bc2c  -
  ...
  58feaeb3349aeb1b97cef52ed454bc2c  -

i.e. the corruption is always the same.

I think this could be a hard bug, so let me know if you need more info.

Regards,

Mike


-- 
Configure bugmail: https://bugs.busybox.net/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the busybox-cvs mailing list