Friday, November 9, 2012

dd and the power of programming

I recently needed to extract part of a large (~8GB) file. The canonical method is to use dd. On a Linux system, the performance was abysmal. Here is the straight dd result.
# dd if=one.img of=two.img bs=1 skip=32768 count=8587192319
8587192319 bytes (8.6 GB) copied, 12442.3 s, 690 kB/s
That's 3.45 hours for those keeping track at home. Using /dev/shm helped.
# dd if=/dev/shm/one.img of=/dev/shm/two.img bs=1 skip=32768 count=8587192319
8587192319 bytes (8.6 GB) copied, 6232.88 s, 1.4 MB/s
That helped, but it's still 1.73 hours. Okay, so we all know the 1 byte block size is killing the performance. We also all know 8587192319 = 41809 205391, so let's bump up the block size.
# dd if=one.img of=two.img bs=205391 skip=32768 count=41809
9054+1 records in
9054+1 records out
1859682304 bytes (1.9 GB) copied, 1.59914 s, 1.2 GB/s
Clearly, much faster but also wrong: it should copy 8.6 GB. Let's reverse the numbers and try again.
# dd if=one.img of=two.img bs=41809 skip=32768 count=205391
172688+1 records in
172688+1 records out
7219937280 bytes (7.2 GB) copied, 29.4616 s, 245 MB/s
Closer, but still wrong. Something is wrong with dd's math. So, I wrote a Java program that takes an input file name, an output file name, and pairs of skip/write values. Here is the result.
# time java Chopper one.img two.img 32768 8587192319
Skipped 32768 bytes.
Writing 8587192319 bytes.

real    0m10.656s
user    0m1.744s
sys     0m8.933s
So, 10.7 seconds versus 3.45 hours. Here's the program.
/*
** Argrments: input file, output file, pairs of skip/write bytes
*/

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class Chopper
{
    private static void readAndWrite (FileInputStream in, FileOutputStream out,
        long length)
    {
        /*
        ** Keep this sized to fit in an int for the read below.  Besides, 8K
        ** is a typical disk block size.
        */
        final long BUFFSIZE = 8192;
        byte buffer[] = new byte[(int) BUFFSIZE];

        for (;;)
        {
            int read = 0;
            int count = (int) (BUFFSIZE > length ? length : BUFFSIZE);

            try
            {
                read = in.read (buffer, 0, count);
                out.write (buffer, 0, read);
            }
            catch (IOException IOE)
            {
                System.out.println (IOE);
                return;
            }

            length -= read;

            if (length <= 0)
                break;
        }
    }

    public static void main (String args[])
    {
        FileInputStream in = null;
        FileOutputStream out = null;

        try
        {
            in = new FileInputStream (args[0]);
            out = new FileOutputStream (args[1]);
        }
        catch (IOException IOE)
        {
            System.out.println (IOE);
            return;
        }

        int i = 2;

        while (i < args.length)
        {
            try
            {
                long skipped = in.skip (Long.parseLong (args[i++]));
                System.out.println ("Skipped " + skipped + " bytes.");
            }
            catch (IOException IOE)
            {
                System.out.println (IOE);
                return;
            }

            long write = Long.parseLong (args[i++]);

            System.out.println ("Writing " + write + " bytes.");
            readAndWrite (in, out, write);
        }
    }
}

No comments: