Data Compression for DBAs and SysAdmins

Data compression trades off CPU time for disk space.  This page looks at how data compression may be used in database and system administration and points out some of the difficulties you may encounter.

The first difficulty you may hit is CPU utilisation, data compression is greedy for CPU cycles - and two compression processes can max out a CPU, so using data compression as a routine part of database and system administration is best done on multi-CPU machines.  If you decide to use it, you should keep a watchful eye on CPU usage.

This section is based on experience using compression for files outside a database. Some databases also allow various types and degrees of compression inside the database itself, this is quite well documented by the vendors and is not covered here.

The simplest way to reduce disk occupancy for operating system level backup archive files, Informix archive files, Oracle export dump files and so on, is simply to compress the files after they have been produced.  This has the advantage that the compression job can be run outside the regular backup and export job stream and its success or failure is only important in terms of the disk space being occupied.

Here's a simplified example of a home directory backup in an ordinary unix environment:

TARFILE=/tmp/`basename $HOME`.tar

cd $HOME

tar cf $TARFILE .

# error checking removed

compress $TARFILE 2>tarcompression.log &

This is enough to get you going, but for long term use there are other details to look after.  Error checking is a must, and the impact of multiple jobs on your system must be considered. 

 
nohup nice -10 compress $DUMPFILE 2>dumpcompression.log &

The "nohup" is a precaution for situations when your backup is run interactively instead of from cron.  The "nice" simply reduces the impact on the rest of the system.

The steps can be combined to avoid producing an intermediate uncompressed file, like this

COMPRESSEDTARFILE=/tmp/`basename $HOME`.tgz

cd $HOME

tar cf - . | compress -c > $COMPRESSEDTARFILE

This slides off the problem of error checking, so the code begins to look like:

COMPRESSEDTARFILE=/tmp/`basename $HOME`.tgz

cd $HOME

tar cf - . 2>tar_error.log     |	 \
compress -c 2>compresserror.log > $COMPRESSEDTARFILE

if    [ -s tar_error.log -o -s compresserror.log ]
then  echo "It all went wrong"
fi

I hope this will give you enough of the flavour to get started, but if you get stuck email me.

You can reach me by email at


  © Copyright 2006 Colin MacKellar. All rights reserved.