After hemming and hawing for quite a while I decided to dive into the deep end and make the transition to Windows Vista. The tipping point came with a new gift, a tiny mobile computer called an OQO model 02. It came preloaded with Vista Ultimate and it worked so well even on the lower-powered OQO that I decided that for the new development machine I was going to build, I would try Vista as my primary OS.

So far the transition, while not completely painless, has been remarkable smooth. Sure I've had a few unproductive moments (aka system crashes) but these have all been related to drivers, mainly the display drivers for the new NVidia GeForce 8800 GTS card I put in the machine. I should mention up front that I don't play graphic intensive games so I'm not really pushing the display card to it's limits and thus probably not stressing the driver or card that much. The 8800 is complete overkill for what I need but I also wanted a DX10 card so that I could experience the full Vista/WPF experience. Plus I wanted room to grow as I will probably never trade-up the card for the life of this machine. This whole experience reminds me of the transition from Windows 2000 to Windows XP but it's a bit less painful than that actually.

Along the way I've learned some tips, which I thought I would share here:

Monitor color calibration in Vista

The color management system has changed in Vista however you can still use the older .ICM format color profiles. Unfortunately, it seems that Vista still cannot properly load custom LUT tables into video cards from the ICM profiles; you still need to use a profile loader to set the custom profile for your Video card. Furthermore, there still seems the be the restriction that you cannot load separate color profiles in a multi-monitor setup unless you are using separate video cards as well (i.e. no color profiling for dual-output video cards). This really amazes considering that Macs have had this for something like a decade but what's worse is that there seems to be a bug in Vista that will cause it to reset the gamut table for the video cards shortly after the profile loader sets it at startup. I'm sure this depends on loading order but that's not something that can be easily worked around. My solution was to use the new task schedule as set up a task to launch my profile loader about 1 minute after the login event. This way, shortly after logging in and after Vista has reset my video card, the profile loader can load it again properly. Until Microsoft fixes Vista, this works pretty well.

Virtual CDROM/DVD drive

Every now and then I need to mount an .ISO or .IMG file from a CD or DVD drive, usually for software installation. Vista of course knows nothing about how to do this. There are however several free ISO/IMG loaders that allow you to create virtual CDROM drives. I've used one for the last few years but when I transitioned to Vista I had to search out a new one that would work properly. Virtual CloneDrive is the one I settled on. It's easy to use and works very well and requires a minimum of stuff to be installed. It supports multiple virtual drives and best of all, it's free.

Start++

Another thing that I installed which has become indispensable is a small utility called Start++. If you've ever used the older version of Windows Desktop Search on Windows XP, you might have been aware that you could easily create shortcuts or macro commands that could be launched from the search toolbar. Start++ brings that capability to the search feature in Vista. But it goes further in that it has even richer macro scripting and these macros also work at the command prompt as well. Down the road the author is promising an API where you will be able to create plug-ins for even richer commands.

I use Start++ to set up commands to quickly search Google, or EventId.net. I've also used it to set up shortcuts for quickly launching the remote desktop client and connecting to a specific machine or opening network folders that I use frequently. It has several built-in commands but by far the most useful one is sudo, for launching a program with elevated privileges (UAC will still prompt you though). I can't recommend Start++ enough. It's so useful that I never open the start menu's run command anymore, I just hit the "Windows" key and type in a command to run the program I need.

Indexing network folders with Desktop Search

I love Windows Desktop Search. I've been using it on Windows for years (even in its earlier incarnation as the older index service, which Desktop Search has evolved from and despite Google complaints, has been a part of Windows since the early days of Windows NT). Searching is something that I firmly believe should be a core part of the OS, not an add-on. There are many things that can be accomplished once indexing and searching are services of the OS and Vista is a great example of this. Searching feels natural, not like something tacked on. All applications can share a common and universal API. Microsoft has given Vista a very smart architecture for indexing and searching with its iFilters, property handlers, protocol handlers, and store providers. But one of the things that is missing out of the box is the ability to index remote file locations. This is especially important if you share documents or media files from a network location. Luckily Microsoft has release an add-on for Vista's indexing service that allows you to specify network folders to index. Simply install it and you will have the option to index network locations. There is also an add-on to index your Internet Explorer browser history as well, but I have not tried it yet.

 

That's it for now. I'll blog more about my experiences with Vista and any tips/workarounds that I stumble across.


Update: Since installing this newer RE2 drive firmware, my RAID array has been working flawlessly every since. I have not had one single timeout or error since. It appears that this firmware completely solved my issues.

Here are the links on the Western Digital site:
WDxxxxYS firmware update information
WDxxxxYS firmware download

A hard drive in my server crashed last early December. It was only about 3 year old, but it was also out of warrantee too. Instead of just replacing it I decided to buy a new set of drives and build a RAID 5 array so that if in the future one drive crashes I will have some level of redundancy. After doing some research I choose to build a software RAID 5 array (yes, I know) because I wanted to be able to guarantee that I could move my RAID 5 array to any other Windows machine in case of hardware failure. I didn't want to worry about becoming dependant on a certain RAID controller with a certain revision, certain driver, etc... For the most part this has been a good decision.

In order to do this I also decided to switch to SATA drives as well, which meant I would need to get a PCI SATA controller since my server is a bit older and doesn't support SATA natively. I choose a basic Promise controller that had 4 SATA 3.0gb ports. I then installed the controller and driver and easily built a 1.5TB RAID 5 array. All was well.

Then on December 18th my server mysteriously dropped one of the drives from the RAID array. In my event logs I saw a whole slew of device timeout messages for the failed drive. When I looked in the disk manager sure enough, one of the disks was missing but because it was a RAID 5 array, no data was lost (yet). I suspected that the drive was toasted so I shut down the machine and was going to reboot to run diagnostics in preparation for sending the drive back for replacement. However once I rebooted the drive came back online without an errors. I ran the diagnostics and they said the drive was fine. Windows happily rebuilt the RAID array and all was fine, until January 18th.

On January 18th the same thing happened again, a drive was dropped from the RAID array after a whole slew of device timeout messages. I figured that it was the same drive, getting more flaky but then I noticed that it was a different drive this time. My next thought was that it must be a controller error. Perhaps the cheap Promise controller I bought was not that best decision. I ordered a Adaptec SATA PCI controller as a replacement and kept my fingers crossed that it would not crash again before it arrived.

Once the new controller arrived I felt a little vindicated in my decisions to go with software RAID. I simply swapped out controllers and rebooted and the RAID array came online without a hitch. Now I felt, everything was going to be ok. That was until, February 18th.

On February 18th the system dropped yet a different drive. The fact that it was happening almost exactly 4 weeks after that last two incidents was not lost of me. Could it have just been a strange coincidence? Whatever it was it was clear to me that it was not just a controller issue. But neither was it a single drive as each time it was a different drive that was crashing. Perhaps it was some weird configuration error. I rebuilt the array (which takes 14 hours) and started poking around the system for things that could cause this.

I found all sorts of suspicious things, which would all eventually turn out to be red-herrings. Things like the disks set for auto spin-down, my UPS mysteriously disconnecting for a few seconds which led to the server thinking it was running on batteries for a few moments, old bits of the Promise filter drivers still installed, etc... Each time I thought I found the cause until the array crashed again. However by now the array was crashing much more unpredictably and frequently (did I mentioned that it also almost always crashed when I was out of town?). I also started experiencing other strange issues on the server, such as the system clock jumping into the future whenever the RAID array crashed. At this point I resigned myself to believing that the old server hardware must be going south so I set out to build a new server.

Transferring everything to a new server (domain, configuration, services, Exchange, SQL, IIS, data, etc...) turnout out to be a LOT of work, more so because I also decided to build a new primary domain controller with all the important services in a virtual machine running on the new hardware (which is also a DC with little else running on it). It took me well over a week to plan things out and to transfer and set up all the domain services. The only worrisome part was when I attempted to transfer over my RAID array. The new server recognized it as an array but it kept telling me that not all of the drives where present and that I would lose data if I imported it. After much research (and backing things up) I determined that this was probably not going to be the case so I let it import the array, which it did instantly and perfectly. The RAID array was now transferred and functioning in the new server. Surely everything must be right now. My RAID array by this time had survived no less than 6 crashes without losing data and each time the failing drive appeared to be fine after a reboot.

Then on July 3rd while I was out of town, the new server dropped a drive from the RAID array again after a whole slew of device timeouts. At this point I was just going to send the drives back to Western Digital for replacement. Their must be something wrong with them I figured. As I prepared to request an RMA, I decided to download and run the diagnostics tools one more time. That is when I noticed that for the Western Digital RE2 drives there was a firmware update. When I read the description from their knowledgebase I almost fell out of my seat (emphasis mine):

WD hard drives have an internal routine that is periodically executed as part of the internal “Data Lifeguard” process that enhances the operational life expectancy. While the drive is running this routine, if the drive encounters an error, the drive’s internal host/device timer for this routine is NOT canceled causing the drive to be locked in this routine, never becoming accessible to the host computer/controller. This condition can only be reset by a Power Cycle. WD has resolved this issue by making a change to the firmware so when a disk error is encountered, the host/device timer is checked first and then the routine is canceled allowing the drive to be accessible to the host computer/controller. The interval rate for the error condition to occur is 1-4 weeks, and will only occur if the drive encounters a disk error when running this routine.

Could it be that I was suffering from this? It seems that this is a description of EXACTLY what I was experiencing every 1-4 weeks. I shutdown my server and flashed all the drives with the newer firmware. Again since I was running software RAID I could ignore the warnings about not updating drives that are part of a RAID array since to everything concerned, the are just a bunch of single drives. Note that this KB article seems to imply that my drives are in fact experiencing disk errors that are triggering this and perhaps they are and will still need replacing. So far no diagnostic tools shows that there are. Unfortunately for me, only time will tell. Hopefully it will only be 1-4 weeks though before I know.


This is the fourth and final in a series of articles about a new backup process I have implemented for my home network. In the previous article I covered the tools I used to create this new backup process and the issues I faced along the way. In this article I'll give an overview of the backup process and highlight some of the implementation details of creating this new backup process.

The view from 50,000 feet

At the core of the new backup process, things are pretty straightforward. The 50,000 foot view is something like this, with just 5 main steps:

  1. Create a VSS snapshot of the drive to be backed up.
  2. Compare the previously backup files to the current files and get the list of files that are new, newer, the same, and missing (i.e. deleted).
  3. Build the directory structure for the new backup set.
  4. For each file that is the same, create hard links in the new directory structure that link to the previous backup's files.
  5. Copy over all the new and newer files.

As they say though, the devil's in the details. But first, let me get this out of the way.

In the following discussion, I will be leaving a lot of little details out to keep things simpler. I'm not going to cover things like creating log files, creating system event log entries, email notifications, etc... These are all important things to have when creating a set-and-forget backup system but suffice it to say, the scripts that I have created produce detailed logs and have rich error handling along with notifications for when things go wrong.

Flying lower - servers, vaults, and sets

On my server, I have a dedicated backup file network share where each machine that participates in the backup scheme has a directory reserved for it. Within each machine directory there is a machine configuration XML file and one or more backup vaults . Each backup vault is simply a directory and a backup configuration file that specifies the set of files to back up into that vault.

Within each vault is one or more backup sets where each set is a complete snapshot of the vault's data. These backup sets are again just directories which are named using the date that the set was created. Since each backup set in a vault is a complete snapshot of the data being backed up there will be many duplicate and unchanged files between sets. To minimize the storage necessary to preserve these complete snapshots, files that are unchanged between backup sets are hard linked together. This results in exactly one copy of the unchanged file's data being shared between all backup sets in a vault. This also means that each new backup set only takes as much physical storage space as is necessary to hold the files that have changed since the last back up yet it still preserves the complete snapshot view for each backup set.

This multi-level approach to defining backups allows me to specify multiple backups vaults for each machine, tuning each for the type of data and/or the frequency of change. In most cases, each machine has one backup vault that contains Window's "Documents and Settings" folders (i.e. all user's data). On my server however I also have other vaults for things like the IIS web server's files and for other shared network folders. Having multiple vaults also allows me to separate out backups based on the frequency of change since a new backup set is only created in a vault if the backup process determines that files have actually changed and need to be backed up.

Having all of the configuration on the backup server also allow me to centrally manage each of the client machines and their backups. To set up a new client, I just create the directories on the server, set up a few configuration files, and then install and schedule the backup scripts on the client. Once that is done I can adjust the backup process for any client just by updating its configuration files on the server. I can even pull down new versions of the backup scripts and components to the client as needed.

Step #1 - Freezing the file system

Since there are several processing steps that access the files to be backed up, the first thing I needed to do was to take a snapshot of the file system so that I had a frozen-in-time view of the current files. To do this I use my custom VSS COM component along with the machine's XML configuration file. This simple XML configuration file, which is trivial to read in PowerShell, specifies which disk volumes to snapshot and what DOS device names (i.e. drive letters) to expose the snapshots as. Special care had to be taken though to ensure that any errors in the backup process would properly release the VSS snapshot. I accomplished this by setting up a global PowerShell trap handler that cleans up the VSS COM object as necessary.

Note: For more details on working with PowerShell trap handlers and exceptions, here  is an excellent PDF article on the subject: PowerShell Debugging and Error Handling

Once I have a frozen view of the file system, I can then process each backup vault defined to determine what if anything needs to be backed up. The process that follows is repeated for each vault that is defined for the machine. It is only after all vaults have been processed that the VSS snapshots are released.

Note: I have recently read another article about using VSHADOW and accessing the shadow copy on Windows XP by using a utility named DOSDEV. This technique effectively replaces the need for my custom VSS COM object. Here are the details: How to assign drive letters to VSS shadow copies... on Windows XP !

Step #2 - Processing a backup vault and classifying the files

Each vault has a configuration file that specifies the directories to back up along with the files and directories to ignore (e.g. "Temp" directories, temporary files, etc...). Since robocopy has sophisticated options for file selection, I use robocopy job files for this.

Robocopy job files are just a way to save robocopy command line arguments into a file for later use. The robocopy documentation has more details on how to create these but they are basically just plain text files that contain command line arguments. Robocopy is also flexible in that you can mix job files with commands specified on the  command line. This ability to mix and match options allows me to separate out selection information from other command line options used during various parts of the backup process.

Now that I have a way to save the selections of the files to be backed up, the next step is to determine the lists of files that are new, newer, the same, or have been deleted. Luckily robocopy will do this by using the listing-only command line switch along with the verbose switch. Using these two switches together causes robocopy to produce a very complete log file for what would happen if the listing-only command had not been specified. For each file and directory it outputs the classification of the file (i.e. New, Newer, Changed, Extra, Same) along with the full path. So to get the list of files and their classification I simply invoke robocopy using the source data (as exposed via the VSS snapshot) while specifying the last backup set as the target. This give me one giant log file that lists every file that is new, newer, the same, or has been deleted since the last backup. This potentially makes for one very large log file depending on the number of files that are being backed up but it also gives me all the information I need to both determine whether a backup is necessary and if so, which files have not change and can therefore be hard links in the new backup set.

To make this log easier to parse, robocopy places the classification information along with the path of the object at a fixed column in the log file, which is documented in the robocopy documentation. This makes it pretty easy to use a tool like PowerShell to parse the robocopy log file to determine which files and directories are the same, new, changed, and or have been deleted. There is only one problem that I have encountered with using robocopy to produce the file lists. It seems that robocopy's log files are always written using standard ASCII characters.

Note: Windows Vista ships with a new version of robocopy that fixes this limitation but that new version will only work on Windows Vista unfortunately.

So if you have files that have UNICODE characters in their name (i.e. ©, ®, ½, etc...) then robocopy will substitute plain ASCII characters for those symbols in the output. The side effect of this is that if you have any files with UNICODE characters, the path returned from robocopy cannot be used to directly access the file. This does cause a few hiccups when creating hard links as the files will appear to be missing. The worst case however is that hard link creation fails and then the file gets re-copied to the new backup set. Not an ideal situation but one which does leave the new backup set's integrity intact albeit at the expense of backup storage space. A simple way to reduce or eliminate these soft-errors is to simply rename the files to contain only ASCII characters. I've done this for the handful of files where this was an issue when it was clear that changing the file name would have no ill effects.

Once I have the list of files and their classification I can then determine whether a backup is necessary (i.e. are there any New or Newer files). If no files have changed then I simply end the processing for this vault. No new backup set is created in this case since nothing has changed. If I'm backing up something like Window's "Documents and Settings" folder then this is highly unlikely but for some sets of infrequently changed data this might be more common.

If a backup is necessary, I now also have my list of unchanged files that are to be hard linked between the last backup set and the new one. Step #1 of the backup process is now complete.

Step #3 - Building a new set

The next step is to build the directory structure for the new backup set. Again robocopy will do this for me but it's not completely obvious how to do this from reading the documentation.

One of robocopy's commands tells it which attributes of the object to copy. By default it will copy the data, attributes, and timestamp. You can also specify to copy security descriptors, owner info, and auditing info. The trick is you can also tell it copy everything except the data in which case it will just build the directory structure along with all the NTFS security descriptors, owner info, and auditing info.

Again I use my robocopy job file with a different set of command line options to copy just the directory structure for the new backup set. This time however I first create a new backup set directory, using the current date as its name (i.e. yyyy-mm-dd, etc...) and use that as the target directory for robocopy.

Once robocopy has built the new directory structure, step #2 of the backup process is now complete.

Step #4 - Hard linking the unchanged files

Next comes the creation of the hard links for all of the files which are unchanged since the last backup set. From my parsed robocopy log file, I iterate over all of the files that where classified as the "same" and I create a hard link in the new backup set, at the proper location, to the last backup set's copy of the file. For each hard link I need to create in my PowerShell script, I call the hard link helper function that I wrote in C#. This is typically a very fast procedure but the shear number of unchanged files in some cases can make this one of bigger steps in the backup process.

As I mentioned earlier, there is the case of failure when creating hard links due the to limitation of robocopy's log file format and file names with UNICODE characters. There is also another case were hard links may fail, when the account that is running the backup process does not have enough permission to access the underlying file. In my experience this is pretty rare but can happen for certain system files that only the system account has access to. The worst case scenario however, is that the hard link creation will fail and the copy stage of the process will re-copy the file. Because of the possibility of these hard link failures I allow for a certain number of hard links to fail before I abort the entire backup process.

After this stage in the process the new backup set will contain all of the directories and all of the files that are the same since the last backup. This also means that any files that have been deleted since the last backup have been effectively dropped from the new backup set as well. They haven't been physically deleted though; they just have not been carried forward to the new backup set via hard linking.

Once these hard links are created, step #3 of the backup process in now complete.

Step #4 - Backing up the new data

The next and final step is to simply allow robocopy to copy over any new and changed files into the new backup set. Since we are working with a VSS snapshot of the data, there is no possibility that files have changed since we classified them and thus no danger that any hard link to a previous backed up file will get overwritten by the copy process. Since the only files that currently exist in the new backup set are the files that are unchanged since the last backup, by default robocopy will only copy over files that are either new or newer.

During the copy process however, there can be issues with file access permissions if the account that the backup process is running under does not have sufficient privileges to copy the files. This will sometime be the case for certain system files and other user's private files. To get around these issues I use the robocopy option to copy files in a special backup mode. This special backup mode allows robocopy to copy files that it might not otherwise have access to for the purposes of backing them up. When using this option though you have to ensure that the account that this backup process is running under is a member of the Backup Operators group on all of the machines that are participating in the backup.

Once this file copy step is completed, the backup process is complete for the vault being processed and the next vault can then be processed. This cycle continues until all of the vaults for the machine have been processed.

Step #5 - Tidying up

After all of the vaults have been processed for a machine it is then safe to release the VSS snapshots. After that has been completed, the backup process is complete for that machine and my script exits.

There is always room for improvements

 Rsync, which was the main inspiration for my new backup process, was designed as a client-server tool that can greatly reduce the time required to copy data between machines by using a differential file copy process. This special copy process examines each file so that only the parts that are change get sent over the wire. When I started out it was my hope to patch up Rsync to be more Windows-aware but in the end, it was far less work to just recreate the parts that I needed the most especially given that wire transfer speed within my network was not an issue. So the differential file copy process got dropped in favor of leveraging robocopy. Someday however I would like to add that back but that would require moving to a client-server architecture.

Another thing I would like to add someday is the detection of moved files. This is a typical scenario for me, especially when working with digital camera photos. I copy new photos to an import folder on my main computer where I spend some time working and sorting through them. They are then usually moved to their own folder somewhere in my photo collection. If a backup were to happen in the middle of this workflow, the files would appear to have been added, deleted, and added again (in the new location). From robocopy's viewpoint, these moved files are treated as groups of deleted and new files. One could imagine an extension of the backup process where these deleted and new files where matched together and recognized as a moved file. Then a hard link from the old location to the new location could be created and the file would not have to be re-copied. To do this with absolute certainty though, the files should be binary compared first to ensure that they really are the same files. Again without a client-server process, the comparison would be moving the same amount of data over the wire as the copy does so for now, this feature will have to wait.

Wrapping up

I've been using this system for about two months now on a variety of Windows machines from servers to laptops. So far it is working very well and has even prevent disaster at least once. Shortly after getting this system up and running I had a hard drive fail. Luckily I had robust backups of all of the important files.

It has also given me the security to try new things like moving all of my photos and video to a RAID-0 stripped drive, which offers no redundancy if a drive fails but is blindingly fast. Knowing that I have reliable and automatic backups let me sleep at night secure in the knowledge that if one of these RAID-0 drives fails the most I will lost is that day's work.

I should also note that this new system is not the end point for my backup strategy. While my backups are stored on my server in a RAID-5 disk array, I also try to regularly back up the backup data to tape which I move off site. I am just now experimenting with this system, but what I do is to create NTFS junction points of all of the the latest backup sets into a folder which I then point NTBACKUP at. Each time I want to refresh the tape backup, I remove the previous junction points and set them up again using the latest backup data. By doing this I can even create incremental backups to tape since to anything looking at the junctioned folders it just appears that the files have been updated (i.e. the ready for archive bit is set).

Down the road I hope to someday utilize an Internet backup service instead of tape, perhaps built upon Amazon's Simple Storage Service (Amazon S3) but for right now DSL upload speeds are too slow for the volume of data I have.

That about wraps it for what I had planned to discuss in this series of articles. I know that this final article is lacking in concrete examples of how this was actually done, but it is a rather large amount of PowerShell code spread out over seven files and totaling nearly 1200 lines of script code together with several thousand lines of C++ and C# source code. It is my hope that this series of articles will inspire others to check out the technologies I used to create this backup process and perhaps take the ideas further.

In the meantime, you can always leave me a comment if there's some part of the process where you would like to see more details and I will see if I can write up new articles on those aspects of the process.

Thanks for reading.


I use iTunes on Windows only because I have to. I don't really care for it as a music organizer and besides, I rarely listen to music on my computer. I have an iPod though and what I do like about iTunes is that I can just plug in my iPod and it will automatically sync everything without any intervention from me. I also like that iTunes allows me to subscribe to podcasts. It annoys me though that unless I have iTunes running it will not update my podcast subscriptions. Since I only use iTunes for syncing my iPod, iTunes is never running.

Luckily iTunes has an COM automation interface that you can use to force it to update your podcasts. With a bit of PowerShell scripting I am now able to automatically update all of my podcasts in the middle of the night when I don't have to be bothered with the fact that iTunes is running. I used PowerShell for this scripting task as again, it seems perfect for a task like this.

My script is a little more complicated than just starting up iTunes and then starting the podcast downloads. Since I am scheduling this process to run nightly I also wanted to be able to close iTunes once it was finished. Unfortunately the iTunes COM automation interface provides no way to tell whether or not iTunes is busy downloading new podcasts. However it seems that iTunes creates a pretty predictable temporary download folder structure, which it then removes when it has finished downloading all of the podcasts that it updates. I used this bit of knowledge in my script to detect when iTunes was busy downloading new podcasts. I just watch for those directories, waiting for iTunes to finish and then I shut down iTunes. So far this has worked pretty well although I'm sure it is not complete robust under all circumstances.

Here is the PowerShell script I developed. If you want to use it you'll have to tweak the directory path that is checked in the script to match your computer. I should also mention that on at least one of my machines iTunes creates the temporary 'downloads\podcast' directory in a different place under the iTunes music folder. I have not found a setting in iTunes that determines where the temporary folder gets created so you may have to poke around a little while iTunes is downloading podcast updates to figure out where it is on your machine if this does not work.

 

# update iTunes pod casts

function test-Downloading
{
    if(test-path 'C:\Documents and Settings\MusicBox\My Documents\My Music\iTunes\iTunes Music\Downloads\Podcasts')
    {
        return $True
    }
    return $False
}


$iTunes = new-Object -comobject iTunes.Application
if($iTunes -ne $null)
{
    'iTunes started' | out-Host

    # start iTunes podcasts update
    'updating podcasts' | out-Host
    $iTunes.UpdatePodcastFeeds()

    # set a time out time
    $TimeOut = (get-Date).AddMinutes(30)

    # wait a few minutes and check if it is downloading
    'waiting for downloads to start...' | out-Host
    start-sleep (30)

    # loop while iTunes seems to be downloading (presence of temporary download folder)
    'checking for download activity' | out-Host
    while(test-Downloading -and ((get-Date) -lt $TimeOut))
    {
        # give it some more time
        'download activity detected, waiting...' | out-Host
        start-sleep (60)
    }

    if((get-Date) -ge $TimeOut)
    {
        'downloading timed-out' | out-Host
    }

    # now quit iTunes after a little settling time
    start-Sleep (30)
    'quitting iTunes' | out-Host
    $iTunes.Quit()
    $iTunes = $null
}

This is the third in a series of articles about a new backup process I have implemented for my home network. In the previous article I covered a mirror backup process that maintains a storage-efficient backup history. In this article I'll cover the tools I used and the issues I had to overcome while using them.

Common tools and a not so common use of them

Once I had decided to create a backup system that creates space-conserving mirror backups by leveraging NTFS hard links, I set out to make a simple prototype. It occurred to me that I already had a very good tool for copying data around, a free tool called robocopy from the Windows sources kit. Robocopy is a very powerful file copying tool that can be configured in a multitude of ways, including the ability to copy files in backup mode, a special mode of file access that can be used to bypass file security for the purposes of backing up files. It is also faster and more reliable than the file copy tools that come with Windows and has a very good set of options to control which files to copy. However robocopy know nothing about the process of creating hard links to previous versions of files. This step I would have to do myself.

In searching for information on how to create hard links, it wasn't long before I ran across references to the fsutil tool that is included in Windows XP and Windows Server 2003. Using this tool you can creating NTFS hard links from the command line.

Together with robocopy and a bit of creative CMD scripting, I was able to throw together a prototype that could create mirror backups while hard linking to the files that had not changed since the previous backup just like rsync did. I started by duplicating the directory structure of the old backup by using robocopy to copy just the directories. Next I used fsutil to hard linked copies of the previous backup files into the new directories. I did this by traversing the old backup directories and using fsutil to create hard links to each of the older files. Then I used robocopy to generate a list of the files that had changed since the last backup, including files that were no longer present. From that listing I then deleted those files from the newly created mirror backup. Finally, I used robocopy to copy over just the newer files into the new mirror backup. While it wasn't the most efficient method, it worked pretty well but it had one important limitation: fsutil only works on local disks. It was also a pretty hacky bit of CMD script since I had to do string manipulation to create the hard links. I had considered re-writing the whole process in C# but then something else popped on my radar.

PowerShell, isn't that some sort of new gasoline?

It was about this time that Microsoft released RC2 of PowerShell (which as just recently gone RTM). PowerShell is Microsoft's new administrative scripting language for the future. Besides be a very good replacement for command shell scripting and VBScript, it is also the new foundation of the management tools for the next version of Microsoft Exchange. It is an amazingly powerful scripting language, easily learned, easily extended, and is easily the more important tool I have learned in a long time.

PowerShell is different from other scripting languages because it is based on the concept of pipelining objects. Many scripting languages, including the native Windows shell, support pipeling text data from command to command. PowerShell is different however in that it pipelines complete .NET objects instead of just textual data. As full .NET objects, each object in the pipeline has state, properties, and methods. They can be passed as parameters to functions, extended dynamically, coerced into other types, and placed back into the pipeline. Functions in PowerShell can also be treated as objects allowing you to do some types of functional programming tasks that are not easily done in other .NET languages. It is a very powerful idea and my brief description doesn't even scratch the surface of the power that lies within PowerShell. It is all still very new to me but already I am finding many uses for it.

Tip: Here's a PowerShell gotcha to keep in mind. Every expression in PowerShell that produces output places that output in the pipeline. This can lead to pretty weird debugging issue if you aren't careful. I had more than one case where a function was returning more than I wanted because I was calling a command that placed things in the pipeline without realizing it. There are two ways to avoid this however. One is to assign the output of commands to a variable and the other is to redirect the output to $null (i.e. do-something > $null).

PowerShell's object pipeline nature along with the rich set of built-in commands knows as cmdlets, makes for a perfect system for doing administrative computer tasks. There are cmdlets for accessing PowerShell providers such as the file system and the registry, accessing WMI object, COM objects, and the full 2.0 .NET framework. I've seen examples of everything from a simple file parsing scripts to a simple but complete HTTP server written in PowerShell in just a few lines of code. To me it appeared to be the perfect language for scripting a new backup process. However PowerShell does not offer support for creating NTFS hard links either. For this I would need to extend PowerShell.

Extending PowerShell through custom C# objects and P/Invoke

Starting with Windows XP there is a new API for creating hard links, CreateHardlink. In previous versions of Windows, creating hard links was somewhat of a black art. You had to use the complex and sparsely documented Win32 Backup API's. It could be done and there are examples of how to do it out there, but it was not for the faint of heart. The CreateHardlink API however solves that, making it almost trivial to create hard links on NTFS. Furthermore, unlike fsutil the CreateHardlink API fully supports creating hard links on remote network NTFS drives. PowerShell cannot easily call native API's on its own though. To do that, you need to extend PowerShell with a bit on .NET code.

PowerShell is very easy to extend. You can write complete cmdlets', objects that fully plug into the PowerShell pipeline framework or you can just create simple .NET objects that can be created and invoked thanks to PowerShell's ability to access the .NET framework.

Using C# and a bit of P/Invoke it was almost trivial to solve the problem of not being able to create hard links in PowerShell (and .NET) by writing a simple object that called the Win32 CreateHardlink API. Once that was done, I could easily create my new .NET object in PowerShell and use it to create all the hard links that I wanted. Now I could create a more complete backup script from the ground up using PowerShell.

If you'd like to access the CreateHardlink API in PowerShell or .NET, here is a C# code snippet to help you. Simply create a new class in a .DLL and add this method. I added this method as a static member since it does not require any state from the class. This also makes it very easy to call from PowerShell.

[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
internal static extern int CreateHardLink(string lpFileName, 
    string lpExistingFileName, IntPtr lpSecurityAttributes);

static public void CreateHardlink(string strTarget, string strSource)
{
    if(CreateHardLink(strTarget, strSource, IntPtr.Zero) == 0)
    {
        throw new System.ComponentModel.Win32Exception(Marshal.GetLastWin32Error());
    }
}

To call this code from PowerShell, you simply load the .NET assembly and then call that static method on your class. Note that this will throw an exception if it fails so make sure you have a PowerShell trap handler somewhere in your script.

# load the custom .NET assembly
[System.Reflection.Assembly]::LoadFrom('YourLibrary.dll')

# create a hard link
[YourLibraryName.YourClass]::CreateHardlink($Target, $Source) > $null

Whoops, that file is in use

There was still one more issue to tackle before I could write a robust backup system, accessing files that are in use. Starting with Windows XP Microsoft introduced a new system for accessing files that are currently in use on Windows systems, the Volume Shadow Copy Service (VSS for short, but not to be confused with Microsoft's VSS source control system).

One of the ideas behind VSS is that when requested, the OS will make a read-only copy of the drive, a snapshot frozen in time, available to a backup program. Other programs can continue to change the original disk files but this shadow copy, or snapshot will remain frozen and completely accessible to the program that created it. Furthermore when a backup program requests that a shadow copy is to be created, the OS can coordinate with shadow copy providers to ensure that the data on the disk is in a consistent state before the shadow copy is created. This further ensures that the files that the backup program has access are in a consistent enough state on the disk to be backed up. This is especially useful for files that are either always open or always changing like the system registry, user profiles, Exchange, or SQL databases. Once the backup program is finished with this temporary read-only shadow copy, it then releases it and it disappears from the system. By using the VSS system backup programs can gain access to every file on the drive even if they are exclusively in use by other programs. For me it was essential to use VSS with any backup process I implemented.

There were a few tough problems though. On windows XP these VSS snapshots are very temporary in that they only exist for as long as you hold a reference to them via COM. Once released, they auto-delete themselves. And unlike VSS on Windows Server 2003 they cannot be exposed as a drive letter for easy access. You have to access them via the native NT kernel's method of addressing NT namespace objects, the GLOBALROOT namespace. On XP when you ask the VSS service to create a snapshot, what you get is a NT GLOBALROOT path that looks like this:
\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1. Unfortunately this is something that not even the native Windows command shell fully understands and if you try and access it from PowerShell or .NET you'll get an exception telling you that you really shouldn't be accessing internal NT paths in .NET. To solve this I would need another bit of custom code to extend PowerShell.

VSHADOW.EXE and exposing a snapshot as a drive letter

VSHADOW is a sample tool that is part of the VSS SDK. It is a command line interface to the VSS API. By using this tool you can create and release VSS snapshots at will. It even has a way around the COM auto-destruction of snapshots on Windows XP by allowing you to call an external program once the snapshot has been created so that you can access the snapshot while VSHADOW is still keeping it alive. It will even create a set of environment variables for you to let you know the names of the GLOBALROOT shadow copies that it has created. This still didn't solve my problem of not being able to access them via PowerShell though (or robocopy for that matter) but having this source code was a good start.

All physical devices in Windows like hard drives exist in the GLOBALROOT namespace. It is only through device name mapping that we can access them via their friendly DOS names like C:, D:, etc.... Normally the OS creates these device mapping automatically at start up or whenever a new device has been connected. VSS snapshots however don't automatically get recognized and mapped. Mapping a friendly name to a VSS snapshot has to be done by directly using the win32 DefineDeviceCreate API. By using this API you can create and remove DOS device mappings to VSS snapshots on the fly even on Windows XP. But since VSS snapshots are temporary you have to manage them carefully or the system could become unstable.

Creating a VSS snapshot and mapping it to DOS device names is well beyond what I wanted to try to do in C#. Luckily for me, the VSHADOW C++ source code was written in a very reusable manor and I could easily reuse it by wrapping a COM object around it.

The not so nice COM interop experience with .NET 2.0

Creating a snapshot is not the simplest of procedures. You have to query for the list of VSS writers, map them against the target volume, determine which ones to include in the process, and finally request that the snapshot be created. You have to hold on to the VSS COM interface to keep the snapshot alive on XP for the duration of its use. When you are done, you have to release it in a controlled manor or the VSS system can completely degrade and require a system restart to recover from in most cases. It is also not the fastest process either, something that would come back to bite me later. However the VSHADOW source which is written in C++, was written in such a way that it made it very easy to turn into a COM object using ATL. It was as simple as creating a new ATL COM object project in Visual Studio and including the core VSHADOW sources file into the project. Once I had it building as a COM object it didn't take me long to put a .NET friendly interface on this new COM object that exposed methods to create and destroy VSS snapshots as well as map them to DOS device names.

PowerShell has native support for create and calling COM objects that is even easier than in other .NET languages. There is no need to create .NET interop classes, you just dynamically creating the COM object and use it much like you would in VBScript. Once I created my new VSS COM object it was trivial to create VSS snapshots on the fly and map them to DOS device names using PowerShell. With my new VSS COM object I now had complete access to VSS snapshots from any tool that could access a standard drive. It has some limitations but for this backup process it works very well.

Releasing the VSS snapshot in PowerShell however was another story. There is no clean way that I can find to force a created COM object to be released in PowerShell. You have to wait for the .NET garbage collector to do its thing which is usually not until the PowerShell process is exiting. My new COM object had its clean-up code in the COM object's Release method so that when it was released it would clean up the VSS state in the proper way, ensuring that the system remained stable. Unfortunately for me relying on a COM object's Release method to work during the .NET shutdown process proved to be one huge headache.

After many, many hours of debugging and not really believing what I was seeing I finally had to accept what was going on. From what I was seeing and from the research I have done it is my understanding that Finalizers in .NET, which are called when an object is being destroyed and which are also responsible for calling a COM object's Release method in PowerShell, are not guaranteed to complete when a process shuts down. Usually this is not a problem as the process is going away anyway. It is a problem however when you have native resources to release.

What I was seeing and not believing for literally hours and hours was that in the middle of my COM object's Release method the PowerShell process would just exit normally. No exceptions, no faults, nothing - just poof it's gone. And every time that it did this it would leave the VSS system in such a state that the machine had to be restarted because I was never given the chance to properly execute the VSS clean-up code which can be a lengthy process. It seems that the PowerShell shutdown process was timing out my clean-up code. It was a complete mess and still one that I cannot believe is acceptable but apparently to the folks who created .NET it is (you can read about it here in way more detail than anyone should have to know. Just search for "timeout" and "watchdog" on that page). The thought that external native code can have the plugged pulled just blows me away.

The fix was rather simple once I realized that I cannot count on my COM object's Release method to always complete. I had to move all critical clean-up code and put it in a public method that my PowerShell script would always call. Luckily PowerShell has pretty decent error handling and it wasn't too hard to ensure that I always called the clean-up method on my COM object before PowerShell normally terminates. I'm still not thrilled about this though. I would have preferred that my COM object be allowed to clean up after itself as necessary.

The moral of this story is that you are responsible for all complex clean-up even when calling native code. Don't depend on the .NET framework to always play nice.

Now that I had this behind me I had all the pieces that I needed: a robust file copy tool, a powerful scripting language, the ability to create hard links, and full access to Volume Shadow Copy snapshots.

 

In part four I'll cover the process overview and implementation details of creating the intelligent mirror backup process that I choose to be the foundation of my new backup strategy.


Flux and Mutability

The mutable notebook of David Jade