I recently needed to extract part of a large (~8GB) file. The canonical method is to use dd. On a Linux system, the performance was abysmal. Here is the straight dd result.
# dd if=one.img of=two.img bs=1 skip=32768 count=8587192319
8587192319 bytes (8.6 GB) copied, 12442.3 s, 690 kB/s
That's 3.45 hours for those keeping track at home. Using /dev/shm helped.
# dd if=/dev/shm/one.img of=/dev/shm/two.img bs=1 skip=32768 count=8587192319
8587192319 bytes (8.6 GB) copied, 6232.88 s, 1.4 MB/s
That helped, but it's still 1.73 hours. Okay, so we all know the 1 byte block size is killing the performance. We also all know 8587192319 = 41809 205391, so let's bump up the block size.
# dd if=one.img of=two.img bs=205391 skip=32768 count=41809
9054+1 records in
9054+1 records out
1859682304 bytes (1.9 GB) copied, 1.59914 s, 1.2 GB/s
Clearly, much faster but also wrong: it should copy 8.6 GB. Let's reverse the numbers and try again.
# dd if=one.img of=two.img bs=41809 skip=32768 count=205391
172688+1 records in
172688+1 records out
7219937280 bytes (7.2 GB) copied, 29.4616 s, 245 MB/s
Closer, but still wrong. Something is wrong with dd's math. So, I wrote a Java program that takes an input file name, an output file name, and pairs of skip/write values. Here is the result.
# time java Chopper one.img two.img 32768 8587192319
Skipped 32768 bytes.
Writing 8587192319 bytes.
real 0m10.656s
user 0m1.744s
sys 0m8.933s
So, 10.7 seconds versus 3.45 hours. Here's the program.
/*
** Argrments: input file, output file, pairs of skip/write bytes
*/
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class Chopper
{
private static void readAndWrite (FileInputStream in, FileOutputStream out,
long length)
{
/*
** Keep this sized to fit in an int for the read below. Besides, 8K
** is a typical disk block size.
*/
final long BUFFSIZE = 8192;
byte buffer[] = new byte[(int) BUFFSIZE];
for (;;)
{
int read = 0;
int count = (int) (BUFFSIZE > length ? length : BUFFSIZE);
try
{
read = in.read (buffer, 0, count);
out.write (buffer, 0, read);
}
catch (IOException IOE)
{
System.out.println (IOE);
return;
}
length -= read;
if (length <= 0)
break;
}
}
public static void main (String args[])
{
FileInputStream in = null;
FileOutputStream out = null;
try
{
in = new FileInputStream (args[0]);
out = new FileOutputStream (args[1]);
}
catch (IOException IOE)
{
System.out.println (IOE);
return;
}
int i = 2;
while (i < args.length)
{
try
{
long skipped = in.skip (Long.parseLong (args[i++]));
System.out.println ("Skipped " + skipped + " bytes.");
}
catch (IOException IOE)
{
System.out.println (IOE);
return;
}
long write = Long.parseLong (args[i++]);
System.out.println ("Writing " + write + " bytes.");
readAndWrite (in, out, write);
}
}
}