This is the fourth and final in a series of articles about a new backup process I have implemented for my home network. In the previous article I covered the tools I used to create this new backup process and the issues I faced along the way. In this article I'll give an overview of the backup process and highlight some of the implementation details of creating this new backup process.
The view from 50,000 feet
At the core of the new backup process, things are pretty straightforward. The 50,000 foot view is something like this, with just 5 main steps:
- Create a VSS snapshot of the drive to be backed up.
- Compare the previously backup files to the current files and get the list of files that are new, newer, the same, and missing (i.e. deleted).
- Build the directory structure for the new backup set.
- For each file that is the same, create hard links in the new directory structure that link to the previous backup's files.
- Copy over all the new and newer files.
As they say though, the devil's in the details. But first, let me get this out of the way.
In the following discussion, I will be leaving a lot of little details out to keep things simpler. I'm not going to cover things like creating log files, creating system event log entries, email notifications, etc... These are all important things to have when creating a set-and-forget backup system but suffice it to say, the scripts that I have created produce detailed logs and have rich error handling along with notifications for when things go wrong.
Flying lower - servers, vaults, and sets
On my server, I have a dedicated backup file network share where each machine that participates in the backup scheme has a directory reserved for it. Within each machine directory there is a machine configuration XML file and one or more backup vaults . Each backup vault is simply a directory and a backup configuration file that specifies the set of files to back up into that vault.
Within each vault is one or more backup sets where each set is a complete snapshot of the vault's data. These backup sets are again just directories which are named using the date that the set was created. Since each backup set in a vault is a complete snapshot of the data being backed up there will be many duplicate and unchanged files between sets. To minimize the storage necessary to preserve these complete snapshots, files that are unchanged between backup sets are hard linked together. This results in exactly one copy of the unchanged file's data being shared between all backup sets in a vault. This also means that each new backup set only takes as much physical storage space as is necessary to hold the files that have changed since the last back up yet it still preserves the complete snapshot view for each backup set.
This multi-level approach to defining backups allows me to specify multiple backups vaults for each machine, tuning each for the type of data and/or the frequency of change. In most cases, each machine has one backup vault that contains Window's "Documents and Settings" folders (i.e. all user's data). On my server however I also have other vaults for things like the IIS web server's files and for other shared network folders. Having multiple vaults also allows me to separate out backups based on the frequency of change since a new backup set is only created in a vault if the backup process determines that files have actually changed and need to be backed up.
Having all of the configuration on the backup server also allow me to centrally manage each of the client machines and their backups. To set up a new client, I just create the directories on the server, set up a few configuration files, and then install and schedule the backup scripts on the client. Once that is done I can adjust the backup process for any client just by updating its configuration files on the server. I can even pull down new versions of the backup scripts and components to the client as needed.
Step #1 - Freezing the file system
Since there are several processing steps that access the files to be backed up, the first thing I needed to do was to take a snapshot of the file system so that I had a frozen-in-time view of the current files. To do this I use my custom VSS COM component along with the machine's XML configuration file. This simple XML configuration file, which is trivial to read in PowerShell, specifies which disk volumes to snapshot and what DOS device names (i.e. drive letters) to expose the snapshots as. Special care had to be taken though to ensure that any errors in the backup process would properly release the VSS snapshot. I accomplished this by setting up a global PowerShell trap handler that cleans up the VSS COM object as necessary.
Note: For more details on working with PowerShell trap handlers and exceptions, here is an excellent PDF article on the subject: PowerShell Debugging and Error Handling
Once I have a frozen view of the file system, I can then process each backup vault defined to determine what if anything needs to be backed up. The process that follows is repeated for each vault that is defined for the machine. It is only after all vaults have been processed that the VSS snapshots are released.
Note: I have recently read another article about using VSHADOW and accessing the shadow copy on Windows XP by using a utility named DOSDEV. This technique effectively replaces the need for my custom VSS COM object. Here are the details: How to assign drive letters to VSS shadow copies... on Windows XP !
Step #2 - Processing a backup vault and classifying the files
Each vault has a configuration file that specifies the directories to back up along with the files and directories to ignore (e.g. "Temp" directories, temporary files, etc...). Since robocopy has sophisticated options for file selection, I use robocopy job files for this.
Robocopy job files are just a way to save robocopy command line arguments into a file for later use. The robocopy documentation has more details on how to create these but they are basically just plain text files that contain command line arguments. Robocopy is also flexible in that you can mix job files with commands specified on the command line. This ability to mix and match options allows me to separate out selection information from other command line options used during various parts of the backup process.
Now that I have a way to save the selections of the files to be backed up, the next step is to determine the lists of files that are new, newer, the same, or have been deleted. Luckily robocopy will do this by using the listing-only command line switch along with the verbose switch. Using these two switches together causes robocopy to produce a very complete log file for what would happen if the listing-only command had not been specified. For each file and directory it outputs the classification of the file (i.e. New, Newer, Changed, Extra, Same) along with the full path. So to get the list of files and their classification I simply invoke robocopy using the source data (as exposed via the VSS snapshot) while specifying the last backup set as the target. This give me one giant log file that lists every file that is new, newer, the same, or has been deleted since the last backup. This potentially makes for one very large log file depending on the number of files that are being backed up but it also gives me all the information I need to both determine whether a backup is necessary and if so, which files have not change and can therefore be hard links in the new backup set.
To make this log easier to parse, robocopy places the classification information along with the path of the object at a fixed column in the log file, which is documented in the robocopy documentation. This makes it pretty easy to use a tool like PowerShell to parse the robocopy log file to determine which files and directories are the same, new, changed, and or have been deleted. There is only one problem that I have encountered with using robocopy to produce the file lists. It seems that robocopy's log files are always written using standard ASCII characters.
Note: Windows Vista ships with a new version of robocopy that fixes this limitation but that new version will only work on Windows Vista unfortunately.
So if you have files that have UNICODE characters in their name (i.e. ©, ®, ½, etc...) then robocopy will substitute plain ASCII characters for those symbols in the output. The side effect of this is that if you have any files with UNICODE characters, the path returned from robocopy cannot be used to directly access the file. This does cause a few hiccups when creating hard links as the files will appear to be missing. The worst case however is that hard link creation fails and then the file gets re-copied to the new backup set. Not an ideal situation but one which does leave the new backup set's integrity intact albeit at the expense of backup storage space. A simple way to reduce or eliminate these soft-errors is to simply rename the files to contain only ASCII characters. I've done this for the handful of files where this was an issue when it was clear that changing the file name would have no ill effects.
Once I have the list of files and their classification I can then determine whether a backup is necessary (i.e. are there any New or Newer files). If no files have changed then I simply end the processing for this vault. No new backup set is created in this case since nothing has changed. If I'm backing up something like Window's "Documents and Settings" folder then this is highly unlikely but for some sets of infrequently changed data this might be more common.
If a backup is necessary, I now also have my list of unchanged files that are to be hard linked between the last backup set and the new one. Step #1 of the backup process is now complete.
Step #3 - Building a new set
The next step is to build the directory structure for the new backup set. Again robocopy will do this for me but it's not completely obvious how to do this from reading the documentation.
One of robocopy's commands tells it which attributes of the object to copy. By default it will copy the data, attributes, and timestamp. You can also specify to copy security descriptors, owner info, and auditing info. The trick is you can also tell it copy everything except the data in which case it will just build the directory structure along with all the NTFS security descriptors, owner info, and auditing info.
Again I use my robocopy job file with a different set of command line options to copy just the directory structure for the new backup set. This time however I first create a new backup set directory, using the current date as its name (i.e. yyyy-mm-dd, etc...) and use that as the target directory for robocopy.
Once robocopy has built the new directory structure, step #2 of the backup process is now complete.
Step #4 - Hard linking the unchanged files
Next comes the creation of the hard links for all of the files which are unchanged since the last backup set. From my parsed robocopy log file, I iterate over all of the files that where classified as the "same" and I create a hard link in the new backup set, at the proper location, to the last backup set's copy of the file. For each hard link I need to create in my PowerShell script, I call the hard link helper function that I wrote in C#. This is typically a very fast procedure but the shear number of unchanged files in some cases can make this one of bigger steps in the backup process.
As I mentioned earlier, there is the case of failure when creating hard links due the to limitation of robocopy's log file format and file names with UNICODE characters. There is also another case were hard links may fail, when the account that is running the backup process does not have enough permission to access the underlying file. In my experience this is pretty rare but can happen for certain system files that only the system account has access to. The worst case scenario however, is that the hard link creation will fail and the copy stage of the process will re-copy the file. Because of the possibility of these hard link failures I allow for a certain number of hard links to fail before I abort the entire backup process.
After this stage in the process the new backup set will contain all of the directories and all of the files that are the same since the last backup. This also means that any files that have been deleted since the last backup have been effectively dropped from the new backup set as well. They haven't been physically deleted though; they just have not been carried forward to the new backup set via hard linking.
Once these hard links are created, step #3 of the backup process in now complete.
Step #4 - Backing up the new data
The next and final step is to simply allow robocopy to copy over any new and changed files into the new backup set. Since we are working with a VSS snapshot of the data, there is no possibility that files have changed since we classified them and thus no danger that any hard link to a previous backed up file will get overwritten by the copy process. Since the only files that currently exist in the new backup set are the files that are unchanged since the last backup, by default robocopy will only copy over files that are either new or newer.
During the copy process however, there can be issues with file access permissions if the account that the backup process is running under does not have sufficient privileges to copy the files. This will sometime be the case for certain system files and other user's private files. To get around these issues I use the robocopy option to copy files in a special backup mode. This special backup mode allows robocopy to copy files that it might not otherwise have access to for the purposes of backing them up. When using this option though you have to ensure that the account that this backup process is running under is a member of the Backup Operators group on all of the machines that are participating in the backup.
Once this file copy step is completed, the backup process is complete for the vault being processed and the next vault can then be processed. This cycle continues until all of the vaults for the machine have been processed.
Step #5 - Tidying up
After all of the vaults have been processed for a machine it is then safe to release the VSS snapshots. After that has been completed, the backup process is complete for that machine and my script exits.
There is always room for improvements
Rsync, which was the main inspiration for my new backup process, was designed as a client-server tool that can greatly reduce the time required to copy data between machines by using a differential file copy process. This special copy process examines each file so that only the parts that are change get sent over the wire. When I started out it was my hope to patch up Rsync to be more Windows-aware but in the end, it was far less work to just recreate the parts that I needed the most especially given that wire transfer speed within my network was not an issue. So the differential file copy process got dropped in favor of leveraging robocopy. Someday however I would like to add that back but that would require moving to a client-server architecture.
Another thing I would like to add someday is the detection of moved files. This is a typical scenario for me, especially when working with digital camera photos. I copy new photos to an import folder on my main computer where I spend some time working and sorting through them. They are then usually moved to their own folder somewhere in my photo collection. If a backup were to happen in the middle of this workflow, the files would appear to have been added, deleted, and added again (in the new location). From robocopy's viewpoint, these moved files are treated as groups of deleted and new files. One could imagine an extension of the backup process where these deleted and new files where matched together and recognized as a moved file. Then a hard link from the old location to the new location could be created and the file would not have to be re-copied. To do this with absolute certainty though, the files should be binary compared first to ensure that they really are the same files. Again without a client-server process, the comparison would be moving the same amount of data over the wire as the copy does so for now, this feature will have to wait.
Wrapping up
I've been using this system for about two months now on a variety of Windows machines from servers to laptops. So far it is working very well and has even prevent disaster at least once. Shortly after getting this system up and running I had a hard drive fail. Luckily I had robust backups of all of the important files.
It has also given me the security to try new things like moving all of my photos and video to a RAID-0 stripped drive, which offers no redundancy if a drive fails but is blindingly fast. Knowing that I have reliable and automatic backups let me sleep at night secure in the knowledge that if one of these RAID-0 drives fails the most I will lost is that day's work.
I should also note that this new system is not the end point for my backup strategy. While my backups are stored on my server in a RAID-5 disk array, I also try to regularly back up the backup data to tape which I move off site. I am just now experimenting with this system, but what I do is to create NTFS junction points of all of the the latest backup sets into a folder which I then point NTBACKUP at. Each time I want to refresh the tape backup, I remove the previous junction points and set them up again using the latest backup data. By doing this I can even create incremental backups to tape since to anything looking at the junctioned folders it just appears that the files have been updated (i.e. the ready for archive bit is set).
Down the road I hope to someday utilize an Internet backup service instead of tape, perhaps built upon Amazon's Simple Storage Service (Amazon S3) but for right now DSL upload speeds are too slow for the volume of data I have.
That about wraps it for what I had planned to discuss in this series of articles. I know that this final article is lacking in concrete examples of how this was actually done, but it is a rather large amount of PowerShell code spread out over seven files and totaling nearly 1200 lines of script code together with several thousand lines of C++ and C# source code. It is my hope that this series of articles will inspire others to check out the technologies I used to create this backup process and perhaps take the ideas further.
In the meantime, you can always leave me a comment if there's some part of the process where you would like to see more details and I will see if I can write up new articles on those aspects of the process.
Thanks for reading.