"Linux Gazette...making Linux just a little more fun!"

A Convenient and Practical Approach to Backing Up Your Data

By Vincent Stemen

July 19,1998

Every tool I have found for Linux and other UNIX environments seems to be designed primarily to backup files to tape or any device that can be used for streaming backups. Often this method of backing up is infeasible, especially on small budgets. This led to the development of bu, a tool for backing up by mirroring the files on another file system. bu is not necessarily meant as a replacement for the other tools (although I have set up our entire disaster recovery system based on it for our development servers), but more commonly as a supplement to a tape backup system. The approach I discuss below is a way to manage your backups much more efficiently and stay better backed up without spending so much money.

* Some problems I have found with streaming backups

1. The prices and storage capacities often make it infeasible.

The sizes of hard drives and the amount of data stored on an average server or even workstation is growing faster than the capacity of the lower end tape drives that are affordable to the individual or small business. 5 and 8 gig hard drives are cheap and common place now and the latest drives go up to at least 11 gig. However, the most common tape drives are only a few gig. Higher capacity/performance tape drives are available but the costs are out of the range of all but the larger companies.

For example:
Staying properly backing up with 30GB of data (which can be just 3 or 4 hard drives) to a midrange tape drive, can cost $15,000 to $25,000 or more inside of just 2 to 4 years. There is a typical cost scenario on http://www.exabyte.com/home/press.html.

This is just the cost for the drive and tapes. It does not include the cost of time and labor to manage the backup system. I discuss that more below. With that in mind, the comments I make on reliability, etc, in the rest of this article are based on my experience with lower end drives. I haven't had thousands of extra dollars to throw around to try the higher end drives.

2. The cost of squandered sys admin time and the lost productivity of users or developers waiting for lost files to be restored, can get much more expensive than buying extra hard drives.

To backup or restore several gig of data to/from a tape can take up to several hours. The same goes for trying to restore a single file that is near the end of the tape. I can't tell you how frustrating it is to wait a couple of hours to restore a lost file only to discover you made some minor typo in the filename or the path to the file so it didn't find it and you have to start all over. Also, if you are backing up many gig of data, and you want to be fully backed up every day, you either have to keep a close eye on it and change tapes several times throughout the day, every day, or do that periodically and do incremental backups onto a single tape the rest of the days. With tapes, the incremental approach has other problems, which leads me to number 3.

3. Incremental backups to tape can be expensive, undependable and time consuming to restore.

First, this kind of backup system can consume a lot of time labeling, and tracking tapes to keep track of the dates and which ones are incremental and which ones are full backups, etc. Also, if you do incremental backups throughout a week, for example, and then have to restore a crashed machine, you can easily consume up to an entire day restoring from all the tapes in sequence in order to restore all the data back the way it was. Then you have Murphy to deal with. I'm sure everybody is familiar with Murphy's laws. When you need it most, it will fail. My experience with tapes has revealed a very high failure rate. Probably 20 or 30% of the tapes I have tried to restore on various types of tape drives have failed because of one problem or another. This includes our current 2GB DAT drive. Bad tape, dirty heads when it was recored, who knows. To restore from a sequence of tapes of an incremental backup, you are dependent on all the tapes in the sequence being good. Your chances of a failure are very high. You can decrease your chance of failure, of course, by verifying the tape after each backup but then you double your backup time which is already to long in many cases.

* A solution (The history of the bu utility)

With all the problems I described above, I found that, like most other people I know, it was so inconvenient to back up that I never stayed adequately backed up, and have payed the price a time or two. So I set up file system space on one of our servers and periodically backed up my file systems over nfs just using cp. This way I would always be backed up to another machine if mine went down and I could quickly backup just one or a few files without having to mess with the time and cost of tapes. This still wasn't enough. There were still times I was in a hurry and didn't want to spend the time making sure my backup file system was NFS mounted, verifying the pathname to it, etc, before doing the copy. Manually dealing with symbolic links also was cumbersome. If I specified a file to copy that was a symbolic link, I didn't want it to follow the link and copy it to the same location on the backup file system as the link. I wanted it to copy the real file it points to with it's path so that the backup file system was just like the original. I also wanted other sophisticated features of an incremental backup system without having to use tapes. So, I wrote bu. bu intelligently handles symbolic links, can do incremental backups on a per directory basis with the ability to configure what files or directories should be included and excluded, has a verbose mode, and keeps log files. Pretty much everything you would expect from a fairly sophisticated tape backup tool (except a GUI interface :-) but is a fairly small and straight forward shell script.

* Backup strategy

Using bu to backup to another machine may or may not be a good replacement for a tape backup system for others as it has for us, but it is an excellent supplement. When you have done a lot of work and have to wait hours or even days until the next scheduled tape backup, you are at the mercy of Murphy until that time, then you cross your fingers and hope the tape is good. To me, it is a great convenience and a big relief to just say "bu src" to do an incremental backup of my whole src directory and know I immediately have an extra copy of my work if something goes wrong.

It is much easier and faster to restore a whole file system over NFS than it is from a tape. This includes root (at least with Linux). And, it is vastly faster and easier to restore just one file or directory just using the cp command.

So far as cost: You can get extra 6GB hard drives now for less than $200 dollars. In fact I can buy a whole new computer with extra hard drives to use as a backup server for $1000 or less now. Much less than the cost of buying just a mid to high end tape drive, not counting the cost of all the tapes and extra time spent managing them. In fact, one of the beauties of Linux is, even your old 386 or 486 boat anchors make nice file servers for such things as backups.

For those individuals and small businesses who use zip drives and jaz drives for backing up so they can have multiple copies or take them off site, bu is also perfect, since incremental backups can be done to any file system. I often use it to back up to floppies to take my most critical data and recent work off site.

Here is an interesting strategy we have come up with using bu that is the least expensive way to stay backed up we could come up with for our environment. It is the backup strategy we are setting up for our development machines which house several GB of data. Use bu to backup daily and right after doing work, to file systems that are no more than 650 mb. Then, once or twice a month, cut worm CD's from those file systems to take off site. WORM CD's are only about a dollar each in quantities of 100, and CD WORM writers have gotten cheap. This way your backups are on media that doesn't decay like tapes and floppies tend to do. Re-writable CD's are also an option if you don't mind spending a bit more money. If you have just too much data for that to be practical, hard drives are cheap enough now that it is feasible to have extra hard drives and rotate them off site. It is nice to have one of those drive bays that allow you to un-plug the drive from the front of the machine if you take this approach. Where bu will really shine with large amounts of data, is when we finally can get re-writable DVD drives with cheap media. I think, in the future, with re-writable DVD or other similar media on the horizon, doing backups to non-random access devices such as tape will become obsolete and other backup tools will likely follow the bu approach anyway.

* Getting bu

bu is freely re-distributable under the GNU copyright.

Copyright © 1998, Vincent Stemen
Published in Issue 32 of Linux Gazette, September 1998