[BusyBox 0004944]: patch to support lzop compression/decompression

bugs at busybox.net bugs at busybox.net
Thu Oct 9 00:23:21 UTC 2008


A NOTE has been added to this issue. 
====================================================================== 
http://busybox.net/bugs/view.php?id=4944 
====================================================================== 
Reported By:                Alain137
Assigned To:                BusyBox
====================================================================== 
Project:                    BusyBox
Issue ID:                   4944
Category:                   New Features
Reproducibility:            N/A
Severity:                   feature
Priority:                   normal
Status:                     assigned
====================================================================== 
Date Submitted:             09-13-2008 00:54 PDT
Last Modified:              10-08-2008 17:23 PDT
====================================================================== 
Summary:                    patch to support lzop compression/decompression
Description: 
This is a patch to add the lzop compressor/decompressor to busybox.
Lzop is a compressor that is more than twice as fast as gzip at its
fastest setting, and is thus useful in any application where
compression/decompression speed is more important than ratio. For
instance, I use lzop with udcpast to speed up network transfers (gzip
would actually make the transfer slower, as any time gain obtained due to
having less data to transfer would be lost due to the time it takes to
compress)

Compression:

> time lzop -c /usr/bin/mplayer >/dev/null
lzop -c /usr/bin/mplayer > /dev/null  0,33s user 0,01s system 100% cpu
0,337 total
> time gzip -1 -c /usr/bin/mplayer >/dev/null
gzip -1 -c /usr/bin/mplayer > /dev/null  0,80s user 0,03s system 94% cpu
0,882 total


Decompression:
> time gzip -dc  mplayer.gz >/dev/null
gzip -dc mplayer.gz > /dev/null  0,27s user 0,01s system 97% cpu 0,288
total
> time lzop -dc mplayer.lzo >/dev/null
lzop -dc mplayer.lzo > /dev/null  0,10s user 0,01s system 100% cpu 0,112
total


For building it, a liblzo2 static library must be present on the system
(for example, liblzo2-dev on a Kubuntu system)

====================================================================== 

---------------------------------------------------------------------- 
 vda - 09-20-08 07:56  
---------------------------------------------------------------------- 
The patch seems to contain only the glue needed to interface with LZO
library. It does not contain LZO compression code itself. The "innermost"
routines are lzo_compress() and lzo_decompress() but from there they just
call into liblzo2 (lzo1x_1_15_compress() etc).

And this glue code is big. I mean, it's VERY BIG.

For comparison: look into bbunzip.c, unlzma glue code is less than one
screensful:

static char* make_new_name_unlzma(char *filename)
{
        return make_new_name_generic(filename, "lzma");
}
static USE_DESKTOP(long long) int unpack_unlzma(void)
{
        return unpack_lzma_stream(STDIN_FILENO, STDOUT_FILENO);
}
int unlzma_main(int argc UNUSED_PARAM, char **argv)
{
        getopt32(argv, "cf");
        argv += optind;
        /* lzmacat? */
        if (applet_name[4] == 'c')
                option_mask32 |= OPT_STDOUT;
        return bbunpack(argv, make_new_name_unlzma, unpack_unlzma);
}

lzop glue code should be just added to this file in a similar way.

 

---------------------------------------------------------------------- 
 Alain137 - 09-23-08 22:30  
---------------------------------------------------------------------- 
Well, part of the problem is that the LZO library (apparently?) only
handles the actual compressed data, but not the header. So, header
handling is left to the application (glue) 

---------------------------------------------------------------------- 
 bernhardf - 09-24-08 00:21  
---------------------------------------------------------------------- 
I would strongly suggest to make sure that we can use minilzo instead of
the rather big "normal" lzo.

For reference:
http://repo.or.cz/w/buildroot.git?a=blob_plain;f=package/lzo_mini/lzo_mini.mk;hb=HEAD


---------------------------------------------------------------------- 
 vda - 09-24-08 03:54  
---------------------------------------------------------------------- 
I'm not against glue code itself. I am just saying it is way, way too big.

I understand that you will need the code like lzo_compress() and
lzo_decompress() functions to handle .lzo data format and interface it
with LZO library, but they are not big. The rest of 100k+ code in this
patch is not needed.

Start by adding .lzo handling to bbunzip.c which calls not-yet-existing
unpack_lzo_stream(), then implement that one by taking only those parts of
lzo code that you need. I bet this way it will be much smaller. 

---------------------------------------------------------------------- 
 Alain137 - 09-27-08 12:17  
---------------------------------------------------------------------- 
I've made a new version (see attachment), which is much smaller: 36899
bytes of source, rather than 178272.

I admit, 36899 is still big (for just glue code), but unfortunately, the
lzop library is rather barebones, and _only_ supplies (de)compression of
one block, and not much else. Interpretation of headers, chaining of
blocks, calculation of checksums need all to be managed by the
application. 

---------------------------------------------------------------------- 
 vda - 09-27-08 17:37  
---------------------------------------------------------------------- 
Looked at the code.

e_method() is dead code, it can never be called:

+static void e_method(int m)
+{
+       fprintf(stderr,"%s: illegal method option -- %c\n",
+               applet_name, m & 255);
+       e_usage();
+}
...
+int lzo_set_method(int level)
+{
+       int l = level & 0xff;
+       int m = 0;
+
+       if (l < 1 || l > 9)
+               return -1;          /* not a LZO method */
...
+}
...
+       case '1':
+       case '2':
+       case '3':
+       case '4':
+       case '5':
+       case '6':
+       case '7':
+       case '8':
+       case '9':
+               if (lzo_set_method(optc - '0'))
+                       e_method(optc);



+extern const char *opt_complementary;

#include <libbb.h> does this already, no?



+int lzop_main(int argc, char **argv) MAIN_EXTERNALLY_VISIBLE;
+int lzop_main(int argc UNUSED_PARAM, char **argv)
+{
+       int i;
+       char *options="cfvdt123456789CFVh";
+       opt_complementary="h-";
+        int r=getopt32(argv, options);
+        argv += optind;
+        /* lzopcat? */
+        if (applet_name[4] == 'c')
+                option_mask32 |= (OPT_STDOUT | OPT_DECOMPRESS);
+        /* unlzop? */
+        if (applet_name[0] == 'u')
+                option_mask32 |= OPT_DECOMPRESS;
+
+       /* By default, use method 3 */
+       lzo_set_method(3);
...

Use tabs for indentation. Above is a mix of tabs/spaces, which is visible
in the diff. 

---------------------------------------------------------------------- 
 Alain137 - 09-28-08 02:12  
---------------------------------------------------------------------- 
Uploaded a new version with the issues fixed, and a couple of other
simplifications 

---------------------------------------------------------------------- 
 vda - 09-28-08 05:09  
---------------------------------------------------------------------- 
busybox already have a functions analogous to:
read_buf() - full_read(0,...)
write_buf() - xwrite(1,...)
error() - bb_error_msg_and_die()
io_error() - bb_perror_msg_and_die()

Why lzo_set_method() is returning a value? It's always 0, thus carries no
useful information.


+               if (dst_len > MAX_BLOCK_SIZE) {
+                       error("lzop file corrupted");
+                       ok = 0;
+                       break;
+               }

error() exits, why here you have code after call to error()?


+       case 'V':
+               bb_error_msg("version: v%s, %s",
+                            LZOP_VERSION_STRING, LZOP_VERSION_DATE);
+               exit(0);
+               break;

We do not need to support ALL options. Options like -V, --version are
usually not implemented in busybox applets.


+               if (read_buf(b1, src_len) != (lzo_int) src_len)
+                       read_error();

We have xread() for this in busybox.


+static int exit_code = EXIT_OK;

Unused variable.


-LDLIBS         :=
+LDLIBS         := lzo2

You need to add this _conditionally_. Example:
ifeq ($(CONFIG_SELINUX),y)
LDLIBS += selinux sepol
endif


+    lzo_uint32 f_adler32;
+    lzo_uint32 f_crc32;

still using spaces for indent here 

---------------------------------------------------------------------- 
 Alain137 - 09-29-08 11:00  
---------------------------------------------------------------------- 
Just a quick note to notify you of some new versions that I attached
yesterday.
Patch5 is the most recent and smallest. I took account your recent
suggestions (LIBS, busybox methods, indentation ...). Moreover, I merged
everything into a single C file, bringing down the size to 26790.

Thx,

Alain 

---------------------------------------------------------------------- 
 bernhardf - 09-29-08 11:04  
---------------------------------------------------------------------- 
Note that LOC of source is not really relevant as opposed to the output of
$ ./scripts/bloat-o-meter busybox_unstripped.old busybox_unstripped
where the .old is without your code.

can you please paste this?
TIA, 

---------------------------------------------------------------------- 
 bernhardf - 09-29-08 12:11  
---------------------------------------------------------------------- 
A few comments on the patch itself:
- the code should live in the archival/ directory
- preferably it would be under GPLv2 "or later"
- includes
  The first include in lzop.c has to be libbb.h, then your <lzo/.*.h>
stanza
  and then you can remove all includes from stdio.h down to sys/stat.h
- Since we are concerned about size, you should check building against
lzo_mini 
  (see previous reply; did you verify that?)
- everything in "/* options*/" should live in globals, but we can fix that
later
- calloc() should be xzalloc()
- lzo_decompress() if (b2) {b2=NULL;free(b2)} This can't be right.
- magic numbers: 0 == STDIN_FILENO, 1 == STDOUT_FILENO etc. Please use
these for
  clarity.
- check_magic() is only used once. Inline it into the single caller.
- init_compress_header: h->flags = (F_OS & F_OS_MASK) | (F_CS &
F_CS_MASK);
- do_compress odd prototype, you forgot "void"
... 

---------------------------------------------------------------------- 
 vda - 09-29-08 15:42  
---------------------------------------------------------------------- 
Having entire separate lzop/* directory for just one *.c file doesn't make
sense.

"config LZOP" should default to "n" as every other applet does.

All other applets share usage.h, lzop is not that special to have separate
one.

Other compressors have compressor/decompressor selectable *separately* -
it's possible to select bunzip2 but not bzip2. lzop should be similar.

#include <stdio.h>
#include <string.h>
#include <limits.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>

This all is already included by libbb.h

        lzo_byte * const b1 = calloc(1, block_size);
        lzo_byte * const b2 = calloc(1, MAX_COMPRESSED_SIZE(block_size));

Since you do not check for allocation errors anyway,
at least use xmalloc() - it will die with an error message
intead of a segfault.

        if (wrk_mem)
                free(wrk_mem);

free(NULL) is safe, you don't need if() here.



                        if (b2) {
                                b2 = NULL;
                                free(b2);
                        }

Whats going on here??


Okay, I tried to build it, and due to -Wundef being used by busubox
I got gobs of errors from lzo2 headers. Okay, disabled that.
Then I've got these:

archival/lzop.c:237: error: function declaration isn't a prototype
archival/lzop.c: In function 'f_read8':
archival/lzop.c:237: error: old-style function definition
archival/lzop.c: At top level:
archival/lzop.c:247: error: function declaration isn't a prototype
archival/lzop.c: In function 'f_read16':
archival/lzop.c:247: error: old-style function definition
archival/lzop.c: At top level:
archival/lzop.c:257: error: function declaration isn't a prototype
archival/lzop.c: In function 'f_read32':
archival/lzop.c:257: error: old-style function definition
archival/lzop.c: At top level:
archival/lzop.c:706: error: function declaration isn't a prototype
archival/lzop.c: In function 'do_compress':
archival/lzop.c:706: error: old-style function definition
archival/lzop.c:707: error: unused variable 'ok'
archival/lzop.c: In function 'lzop_main':
archival/lzop.c:783: error: ISO C90 forbids mixed declarations and code
archival/lzop.c:783: error: unused variable 'r'


Ooookay, fixing it all up... see attached 5.patch 

---------------------------------------------------------------------- 
 vda - 09-29-08 15:45  
---------------------------------------------------------------------- 
bloatcheck:

function                                             old     new   delta
do_compress                                            -    2756   +2756
lzo1x_999_compress_internal                            -    2099   +2099
find_match                                             -    1962   +1962
lzo1x_optimize                                         -    1487   +1487
lzo1x_decompress_safe                                  -    1246   +1246
do_decompress                                          -    1184   +1184
lzo_crc32_table                                        -    1024   +1024
lzo1x_decompress                                       -     980    +980
code_match                                             -     539    +539
lzo_adler32                                            -     389    +389
lzo_crc32                                              -     385    +385
lzo1x_1_compress                                       -     243    +243
lzo1x_1_15_compress                                    -     243    +243
lzop_main                                              -     228    +228
len_of_coded_match                                     -     222    +222
static.c                                               -     216    +216
STORE_RUN                                              -     203    +203
better_match                                           -     174    +174
lzo1x_999_compress_level                               -      99     +99
packed_usage                                       24949   25044     +95
make_new_name_lzop                                     -      84     +84
code_run                                               -      65     +65
copy3                                                  -      62     +62
lzo_set_method                                         -      55     +55
f_write                                                -      43     +43
f_read_num                                             -      43     +43
lzo_check                                              -      42     +42
add_bytes_to_chksum                                    -      40     +40
...
------------------------------------------------------------------------------
(add/remove: 61/0 grow/shrink: 6/0 up/down: 16807/0)        Total: 16807
bytes

I think lzo2 itself should be busyboxed. 16kb+ is twice the size of
busybox's bzip2 implementation. 

---------------------------------------------------------------------- 
 Alain137 - 10-01-08 12:54  
---------------------------------------------------------------------- 
Sorry for the slow answer, I'm on the road right now :)

>  if (b2) {
>    b2 = NULL;
>    free(b2);
>  }
>
> Whats going on here??

A simple typo... should be the other way round: first the free, then the
NULL assignment.

> if (wrk_mem)
>   free(wrk_mem);
>
> free(NULL) is safe, you don't need if() here.

... and that is probably the reason why the error above went unnoticed:
rather than segfaulting, free just did a no-op, and all that happened was
a memory leak.

... more when I'm back next week 

---------------------------------------------------------------------- 
 vda - 10-04-08 10:17  
---------------------------------------------------------------------- 
a.patch is an attempt to pull in parts of lzo *library* so that we don't
need to link against it.

Boy, the author (Markus F.X.J. Oberhumer) wants to be so compatible with
gazillions of compilers and platforms, he sacrificed code readability for
that.
Tons of macros + #ifdef forest. I'm not sure I untangled it all without
breaking something. But this is a start.

I gave up on extracting lzo1x_999_compress_level() out of lzo library.
It's hopelessly intertwined with the rest. (Grep for
"lzo1x_999_compress_level" to see where I disabled it in this patch). As a
result, levels 7...9 won't use it, producing "less compressed" archives
(level 6).

I gave it a very limited testing (seems to compress and uncompress at
level 1, resulting archives uncompress ok with stock lzop. And vice
versa).

Couple of notes:
* stock lzop doesn't seem to have "unlzop" and "lzopcat", do we want to
invent those?
* stock lzop does not delete source file, unlike g[un]zip and b[un]zip2.
* crc table can (should) be generated (busybox already has this code
elsewhere) - save 1k of text segment.

Alain137, can you test this and reinstate lzo1x_999_compress_level()
usage? 

---------------------------------------------------------------------- 
 vda - 10-04-08 10:19  
---------------------------------------------------------------------- 
make bloatcheck:

function                                             old     new   delta
do_compress                                            -    4120   +4120
do_decompress                                          -    1167   +1167
lzo_crc32_table                                        -    1024   +1024
lzo1x_decompress_safe                                  -    1010   +1010
...
------------------------------------------------------------------------------
(add/remove: 30/0 grow/shrink: 5/0 up/down: 8772/0)          Total: 8772
bytes 

---------------------------------------------------------------------- 
 Alain137 - 10-04-08 13:07  
---------------------------------------------------------------------- 
> a.patch is an attempt to pull in parts of lzo *library* so that we don't
need to link against it.

Ok, thanks for that.

> Boy, the author (Markus F.X.J. Oberhumer) wants to be so compatible with

> gazillions of compilers and platforms, he sacrificed code readability
for 
> that.

I already noticed that in the glue code... and I've got the impression
that most of this is for supporting legacy 80's area 16 bit (or even 8
bit) homecomputer processors. I think we can do away with that all... At
least that's what I did for the glue code...

> Tons of macros + #ifdef forest. I'm not sure I untangled it all without

> breaking something. But this is a start.

Indeed :(

> I gave up on extracting lzo1x_999_compress_level() out of lzo library.
It's 
> hopelessly intertwined with the rest. (Grep for
"lzo1x_999_compress_level" to > see where I disabled it in this patch). As
a result, levels 7...9 won't use 
> it, producing "less compressed" archives (level 6).

If needed, we can do without it. However, I'll have a closer look at
lzo1x_999_compress_level when I'm back home, maybe I'll be able to do
something there. Or if not, we can still keep the original for 999, but
make it optional.

> I gave it a very limited testing (seems to compress and uncompress at
level 
> 1, resulting archives uncompress ok with stock lzop. And vice versa).

The tests that I did was attempting to compress mplayer using busybox lzop
and decompress the result using stock lzop, and then vice-versa. I used
mplayer as a test object, as smaller files (such as /bin/ls) didn't catch
all bugs.
An even better test would be to piece together a file from one large
executable, and bits of another file that's already compressed using gzip.
This would be to test correct handling of "stored" blocks.

I'll do some tests when I'll be back home.

> Couple of notes:
> * stock lzop doesn't seem to have "unlzop" and "lzopcat", do we want to

> invent those?

I introduced those to mirror gzip (gunzip, zcat) and lzma...

> * stock lzop does not delete source file, unlike g[un]zip and
b[un]zip2.

Indeed, that was part of the original byzantine glue code. You could pass
it a command-line argument to delete the original file. All this went out
of the window when I trimmed down the glue code: now lzop behaves like all
the other compressors, and unconditionnally removes original file after
compressing it.
There are other parts which went away too, such as optionally storing
original file name in the compressed file, storing the mode and the
modification time. Keeping such functionality would have made it
impossible to use the generic bbunzip function.
 
> * crc table can (should) be generated (busybox already has this code 
> elsewhere) - save 1k of text segment.

Fully agreed.

> Alain137, can you test this and reinstate lzo1x_999_compress_level()
usage?

I'll do as soon as I'm back home. 

---------------------------------------------------------------------- 
 Alain137 - 10-08-08 17:23  
---------------------------------------------------------------------- 
I've just attached patch 7.

Changes:

- crc32 now uses crc32_filltable rather than wasting 1k of statically
initialized variable
- lzo1x_999 compression implemented
- fixed problem with level (levels 789 were not distinguished amongst each
other)

Total "bloat" count is now 11740... of which 2935 at least are directly
accountable to lzo1x_999 compression. So, this algorithm seems to be
indeed larger than the other 2, this depsite some optimizations (removal
of useless statistics in lzo1x_999_t structure, elimination of code
duplication, usage of constants where appropriate) 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
09-13-08 00:54  Alain137       New Issue                                    
09-13-08 00:54  Alain137       Status                   new => assigned     
09-13-08 00:54  Alain137       Assigned To               => BusyBox         
09-13-08 00:54  Alain137       File Added: lzop.diff                        
09-20-08 07:54  vda            Note Added: 0011714                          
09-20-08 07:56  vda            Note Edited: 0011714                         
09-23-08 22:30  Alain137       Note Added: 0011824                          
09-24-08 00:21  bernhardf      Note Added: 0011834                          
09-24-08 03:54  vda            Note Added: 0011844                          
09-27-08 12:11  Alain137       File Added: busybox-1.12.0-lzop.patch            
       
09-27-08 12:17  Alain137       Note Added: 0012284                          
09-27-08 17:37  vda            Note Added: 0012294                          
09-28-08 02:10  Alain137       File Added: busybox-1.12.0-lzop-3.patch          
         
09-28-08 02:12  Alain137       Note Added: 0012304                          
09-28-08 05:09  vda            Note Added: 0012314                          
09-28-08 15:35  Alain137       File Added: busybox-1.12.0-lzop-4.patch          
         
09-28-08 16:49  Alain137       File Added: busybox-1.12.0-lzop-5.patch          
         
09-29-08 11:00  Alain137       Note Added: 0012464                          
09-29-08 11:04  bernhardf      Note Added: 0012474                          
09-29-08 12:11  bernhardf      Note Added: 0012484                          
09-29-08 15:42  vda            Note Added: 0012494                          
09-29-08 15:42  vda            File Added: 5.patch                          
09-29-08 15:45  vda            Note Added: 0012504                          
10-01-08 12:54  Alain137       Note Added: 0012554                          
10-04-08 10:07  vda            File Added: a.patch                          
10-04-08 10:17  vda            Note Added: 0012754                          
10-04-08 10:19  vda            Note Added: 0012764                          
10-04-08 13:07  Alain137       Note Added: 0012784                          
10-08-08 17:15  Alain137       File Added: busybox-1.12.0-lzop-7.patch          
         
10-08-08 17:23  Alain137       Note Added: 0013144                          
======================================================================




More information about the busybox-cvs mailing list