Skip to content
Matteo Cypriani edited this page Oct 5, 2017 · 1 revision

Why backup?

You're joking, right?

Why do I need a backup software?

The answer is simple, you'll forget to backup yourself and you'll screw up, that's unless you have super human mental abilities. A properly configured backup software will automatically backup every file that's changed or added on you computer everyday to a safe location, preferably to a remote one.

Even the most disciplined people, who do not use a backup software that is, backup once a week to an external drive on the weekend. What happens when your computer's hard disk fails on Thursday? you loose all the weeks work. What happens if you backup is a couple of month old or even when you don't have a backup at all? Well, welcome to the stone age.

In short: You won't do it if you have to think about it. A backup is to be done daily, automatically and consistently. Only a backup software and a backup strategy can do that for you.

Backup Methods

There are many different methods of data backup that ranges form manually copying files on an external drive to continuous on line offsite backup with versioning, snapshot and open file backup capabilities. If you don't know what the latter means, don't worry, neither do I :). However the majority of backup operations needed in this world are done - should be done - using one of three simple methods:

Full backup method

That's making a complete copy of all your files everytime you backup. This is what you do when you copy your home directory to that external drive for example.

With a backup software set for full daily backup, it will make a daily backup archive of your home directory in the backup device. Say your home directory is 20 GB in size, the backup archive might be 10 GB since it's compressed as well. If you want to have a week's worth of backup you'll need a backup device that will hold 70 GB.

Restoration: You can restore the complete backup by decompressing the archive or you can selectively select the files to restore using a GUI or command line archiving tool.

Full + incremental backup method

That's copying all your files to the backup drive the first time. The second time you copy only the files that were changed or added.

You set the backup software to make a full backup once a week on Saturday. The resulting archive is called the master archive. From Sunday till Friday the backup software looks for changed or added files and only back those up in what is called the incremental archives.

The master archive will be huge since it contains everything, so to follow our example above the master archive will be 10 GB. Say we change about 1 GB of uncompressed files a day then the daily incremental archive will be 0.5 GB. So a week's worth of backup will need just 13 GB on the backup devices. As you can see this backup method is much more efficient when it comes to space. It's also much faster the six days of the week where only incremental backups are made.

Restoration: for complete restoration, you decompress the master archive and decompress on top of it the daily incremental archives. For selective restoration, you decompress the file from the incremental archive if the day the file was last modified. If you're not sure of the date, check the latest incremental archives first. Sounds complicated but it's not.

Synchronization

Synchronization means that the backup software makes sure that two directories are in sync. Whatever is added, deleted or modified in the first directory is also added, deleted or modified in the second.

As you might have guessed this method doesn't give you protection against things like accidental deletion of files, but it comes in very handy for things like syncing my home directory between my laptop and my desktop. Whenever I switch to using the laptop I fire up the syncing software and both my home directories are in sync.

I still backup my desktop's home directory daily using full + incremental method. I do not backup my laptop, I just sync it with the desktop whenever I can to make sure that any changes made on it are replicated to the desktop which is backed up.

Versioning side effect

Versioning means having the ability to revert to a specific version of a file in history and it's a nice side effect of using the full + incremental backup method.

For example you've been working on a presentation all week. The presentation is stored in the master archive made on Saturday. Since you changed a bit in the presentation almost every day this week, that day's version will be backed up in that day's incremental backup. Say you want to retrieve a slide that you deleted Wednesday, you just pull the presentation file from Wednesday's incremental archive and copy that deleted slide and paste it in today's file. Pretty cool, don't you think?

Most people don't think they'll need this feature, but once you use to it you'll find it invaluable. I've personally used this with graphics, code as well as office documents, and each time having this ability saved me hours.

If you use the full only backup method you can still have versioning capabilities if you keep your daily full backups for a long enough period, say 2 weeks. This is however a little inefficient so the full + incremental method is recommended.

Which method to use?

In most cases full + incremental is the preferred method. These case include home directories, shared directories, web roots and so on. However there are situations where other methods are more suitable. here are a couple of examples:

Cases where full backup is better

It's always advisable to install the os in a sperate partition. That goes for any OS. These partitions only change when you install or remove applications, something that doesn't happen everyday. It's also a low priority backup since I can always reinstall the os and applications. So I have my backup software backup the root directory only once a week and only keeps one archive deleting the old. I do however backup /etc with my regular daily backup to avoid reconfiguring installed apps. I also mount important /var directories like /var/www else where where it has more space and gets backed up daily.

Some data are difficult to backup incrementally. For example a database is usually stored in a single file, so it's easier for the backup software to backup that whole database file rather than dig into the file itself trying to figure out if there are modified or new records in the tables. The easiest and most reliable way to backup a database is to simply dump the database to a file and backup that file up.

Cases where synchronization is better

I have a huge multi gigabytes directory of images and clip art that was collected over the years from different sources. Sometimes it won't change for weeks and sometimes a data is added daily. Backing it up using full + incremental method would be resource intensive especially on the full master archive day. And since I like to keep backups for three or more weeks, I will end up with at least 3 full duplicates of this multi gigabytes directory. Not an attractive option. What I do is simply sync this directory to another computer. I run the sync script daily to see if anything changed that day and replicate that.

The only problem with this setup is that if I mistakenly delete a file and forget to retrieve it from the synced directory before the overnight synchronization I will loose it forever. This isn't a problem with this kind of data. If I use an image in a project I copy that image to the project folder and that is properly backed up. I do the same with my MP3 files even though I have the CDs because I don't want to spend time ripping them again.

Where do I backup to?

It's important to understand that you're backing up for:

  • Human errors: accidental deletion
  • Application errors or malicious applications
  • Hard disk failures
  • Computer theft
  • Natural disasters

Advice

Copied from wikipedia.org.

  • The more important the data that is stored on the computer the greater the need is for backing up this data.
  • A backup is only as useful as its associated restore strategy.
  • Storing the copy near the original is unwise, since many disasters such as fire, flood and electrical surges are likely to cause damage to the backup at the same time.
  • Automated backup and scheduling should be considered, as manual backups can be affected by human error.
  • Backups will fail for a wide variety of reasons. A verification or monitoring strategy is an important part of a successful backup plan.
  • It is good to store backed up archives in open/standard formats. This helps with recovery in the future when the software used to make the backup is obsolete. It also allows different software to be used.