[BusyBox 0000984]: Split missing from busybox

bugs at busybox.net bugs at busybox.net
Mon Mar 26 20:19:18 UTC 2007


A NOTE has been added to this issue. 
====================================================================== 
http://busybox.net/bugs/view.php?id=984 
====================================================================== 
Reported By:                Sadara
Assigned To:                BusyBox
====================================================================== 
Project:                    BusyBox
Issue ID:                   984
Category:                   New Features
Reproducibility:            N/A
Severity:                   feature
Priority:                   normal
Status:                     assigned
====================================================================== 
Date Submitted:             08-05-2006 22:53 PDT
Last Modified:              03-26-2007 13:19 PDT
====================================================================== 
Summary:                    Split missing from busybox
Description: 
Coreutils split is currently not included in busybox.
Would it be possible to provide a cutdown applet with the most commonly
used features?
====================================================================== 

---------------------------------------------------------------------- 
 bernhardf - 08-17-06 02:28  
---------------------------------------------------------------------- 
It certainly can be added.
What are the most commonly used features of split? 

---------------------------------------------------------------------- 
 Sadara - 08-28-06 22:21  
---------------------------------------------------------------------- 
IMO:
 -b/--bytes=SIZE
 -l/--lines=LINES
Would also be nice if SIZE could have a multiplier suffix(b,k,m,g) as
several scripts floating around use it, but needed scripts could be
modified. 

---------------------------------------------------------------------- 
 sega32x - 02-07-07 14:26  
---------------------------------------------------------------------- 
Just curious, has there been any progress with adding this in the past few
months? Havent seen it on SVN, and this would make busybox complete
(atleast for me!) 

---------------------------------------------------------------------- 
 jkolb - 03-23-07 11:06  
---------------------------------------------------------------------- 
Patch against 1.4.1 implementing -a, -b, -d, and -l options.  Fancy
configuration supports k, b, m suffixes on sizes.  It's not too pretty,
but it Works For Me(tm). 

---------------------------------------------------------------------- 
 vda - 03-24-07 08:17  
---------------------------------------------------------------------- 
Code review:

FILE *open_next_file(char **filename)
{
...
    return fopen(*filename, "wb");
}
void bytes_split(FILE *infile)
{
...
               do {
                       if (bytes_remaining <= 0) {
                               if (outfile) fclose(outfile);
                               outfile = open_next_file(&filename);
                               bytes_remaining = split_size;
                       }
                       to_write = ...
                       bytes_written = fwrite(src, 1, to_write, outfile);

What will happen if fopen() failed (returned NULL)?

       opt_complementary = "b--l:l--b";
       opt = getopt32(argc, argv, "a:b:dl:", &new_suffix_size,
                       &new_split_size, &new_split_size);
       if(opt & BB_GETOPT_ERROR)
               bb_show_usage();

Just add "?:" to opt_complementary and you can get rid of BB_GETOPT_ERROR
check then.

Also:

* outfile is only used for fwrite -> maybe use 'simple' I/O
(non-stdio-based), open(), not fopen()?
* use xatoi instead of atoi
* many variables and functions can be made static
* space/tab indentation is mixed. Use tabs only 
* bb_error_msg("should start with small letter") 

---------------------------------------------------------------------- 
 jkolb - 03-24-07 16:32  
---------------------------------------------------------------------- 
I think I've addressed your issues, and also added 'g' suffix support, as
well as making the applet return an error when a write error occurs. 

---------------------------------------------------------------------- 
 bernhardf - 03-25-07 14:48  
---------------------------------------------------------------------- 
Please post your
$ size coreutils/split.o*

output.
Mine (on trunk now) is:
   text	   data	    bss	    dec	    hex	filename
    602	      4	      0	    606	    25e	coreutils/split.o

and is still bloated as hell.. 

---------------------------------------------------------------------- 
 vda - 03-25-07 14:55  
---------------------------------------------------------------------- 
2.patch is mine.

w/fancy prefixes
# size split.o
   text    data     bss     dec     hex filename
    820      20       4     844     34c split.o
w/o prefixes
# size split.o
   text    data     bss     dec     hex filename
    765      20       4     789     315 split.o

bernhard's smaller :( 

---------------------------------------------------------------------- 
 vda - 03-25-07 14:59  
---------------------------------------------------------------------- 
Bernhard, on most CPUs

char *count_p = NULL;
getopt32(argc, argv, "l:b:a:", &count_p, &count_p, &sfx_len);
if (count_p) ...

is bigger than 

char *count_p;
getopt32(argc, argv, "l:b:a:", &count_p, &count_p, &sfx_len);
if (opt & (1|2)) ...

because (one store + one check to NULL) > (one '&') 

---------------------------------------------------------------------- 
 rockeychu - 03-25-07 18:48  
---------------------------------------------------------------------- 
"split" could create a 0 byte file when INPUT_FILE_SIZE = N *
OUTPUT_FILE_SIZE.

Patch as following:

Index: coreutils/split.c
===================================================================
--- coreutils/split.c   (revision 18237)
+++ coreutils/split.c   (working copy)
@@ -77,16 +77,17 @@
                int inp = xopen(input_file, O_RDONLY);
                int flags = O_WRONLY | O_CREAT | O_TRUNC;
                do {
-                       int out = xopen(pfx, flags);
                        buf = xzalloc(cnt);
                        lseek(inp, bytes, SEEK_SET);
                        bytes += i = full_read(inp, buf, cnt);
+                       if (i <= 0 ) break;
+                       int out = xopen(pfx, flags);
                        xwrite(out, buf, i);
                        close(out);
                        free(buf);
                        if (next_file(&pfx))
                                flags = O_WRONLY | O_APPEND;
-               } while(i > 0);
+               } while(1);
        } else { /* -l */
                FILE *fp = fopen_or_warn_stdin(input_file);
                char *buf; 

---------------------------------------------------------------------- 
 bernhardf - 03-26-07 07:29  
---------------------------------------------------------------------- 
rockeychu, please update your checkout of trunk and retest (and descibe a
brief testcase if it still fails).
TIA 

---------------------------------------------------------------------- 
 jkolb - 03-26-07 11:15  
---------------------------------------------------------------------- 
The way the byte split is now implemented will not work if the chunk sizes
are large.  One of the main reasons I want split is to break large files
into 1-2G chunks for storage on fat32 drives. 

---------------------------------------------------------------------- 
 bernhardf - 03-26-07 11:33  
---------------------------------------------------------------------- 
I see. We should unify -l and -b anyway. Could use memchr with
common_bufiz1 for the l case and iterate over common_bufsiz1 for b. That
would avoid allocating the read buffer. Sounds like using both the
toplevel Makefile and TODO for testing was not a good idea of mine ;)

Feel like implementing that? 

---------------------------------------------------------------------- 
 vda - 03-26-07 13:19  
---------------------------------------------------------------------- 
I modified svn:
   text    data     bss     dec     hex filename
    664       4       0     668     29c busybox.t0/coreutils/split.o
    628       0       0     628     274 busybox.t1/coreutils/split.o
/me is happy - data section 0 bytes.
also will handle arbitrarily large files now, but not arbitrarily large
fragments . Should use off_t instead of ulong for sizes to address that. 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
08-05-06 22:53  Sadara         New Issue                                    
08-05-06 22:53  Sadara         Status                   new => assigned     
08-05-06 22:53  Sadara         Assigned To               => BusyBox         
08-17-06 02:28  bernhardf      Note Added: 0001568                          
08-28-06 22:21  Sadara         Note Added: 0001598                          
02-07-07 14:26  sega32x        Note Added: 0002137                          
03-23-07 11:05  jkolb          File Added: busybox-1.4.1-split.patch            
       
03-23-07 11:06  jkolb          Note Added: 0002273                          
03-24-07 08:17  vda            Note Added: 0002274                          
03-24-07 16:29  jkolb          File Added: busybox-1.4.1-split.2.patch          
         
03-24-07 16:32  jkolb          Note Added: 0002278                          
03-25-07 14:48  bernhardf      Note Added: 0002280                          
03-25-07 14:48  vda            File Added: 2.patch                          
03-25-07 14:55  vda            Note Added: 0002281                          
03-25-07 14:59  vda            Note Added: 0002282                          
03-25-07 18:48  rockeychu      Note Added: 0002283                          
03-26-07 07:29  bernhardf      Note Added: 0002284                          
03-26-07 11:15  jkolb          Note Added: 0002285                          
03-26-07 11:25  jkolb          File Added: fancy_d.patch                    
03-26-07 11:33  bernhardf      Note Added: 0002286                          
03-26-07 13:19  vda            Note Added: 0002287                          
======================================================================




More information about the busybox-cvs mailing list