2009-01-27

Parallel bzip in CMT (multicore) environment

We have a lot of archive logs from our Oracle databases. To save storage space we compress them. Of course this process (single threaded) on CMT-based CPUs takes quite long time. To use power of these CPUs we should parallelize all (if possible) tasks. Pbzip2 befriends us - it is a parallel implementation of the bzip2. This is an example:

rudy:/tmp/l# ls -alh
total 4194336
drwxr-xr-x 2 root root 233 Jan 27 16:35 .
drwxrwxrwt 3 root sys 286 Jan 27 16:30 ..
-rw-r----- 1 root root 1.0G Jan 27 16:35 1
-rw-r----- 1 root root 1.0G Jan 27 15:41 2
rudy:/tmp/l# time bzip2 1

real 13m4.397s
user 12m59.987s
sys 0m4.310s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 2

real 0m34.842s
user 34m46.095s
sys 0m7.972s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 -l 2

real 0m41.907s
user 28m52.137s
sys 0m6.302s


We can see that the difference is tremendous !!!

The '-l' parameter determines max number processors to use (calculation based on load average). This server is quite idle so more heavy workload might lengthten the time.