This is the third in a series of articles about a new backup process I have implemented for my home network. In the previous article I covered a mirror backup process that maintains a storage-efficient backup history. In this article I'll cover the tools I used and the issues I had to overcome while using them.

Common tools and a not so common use of them

Once I had decided to create a backup system that creates space-conserving mirror backups by leveraging NTFS hard links, I set out to make a simple prototype. It occurred to me that I already had a very good tool for copying data around, a free tool called robocopy from the Windows sources kit. Robocopy is a very powerful file copying tool that can be configured in a multitude of ways, including the ability to copy files in backup mode, a special mode of file access that can be used to bypass file security for the purposes of backing up files. It is also faster and more reliable than the file copy tools that come with Windows and has a very good set of options to control which files to copy. However robocopy know nothing about the process of creating hard links to previous versions of files. This step I would have to do myself.

In searching for information on how to create hard links, it wasn't long before I ran across references to the fsutil tool that is included in Windows XP and Windows Server 2003. Using this tool you can creating NTFS hard links from the command line.

Together with robocopy and a bit of creative CMD scripting, I was able to throw together a prototype that could create mirror backups while hard linking to the files that had not changed since the previous backup just like rsync did. I started by duplicating the directory structure of the old backup by using robocopy to copy just the directories. Next I used fsutil to hard linked copies of the previous backup files into the new directories. I did this by traversing the old backup directories and using fsutil to create hard links to each of the older files. Then I used robocopy to generate a list of the files that had changed since the last backup, including files that were no longer present. From that listing I then deleted those files from the newly created mirror backup. Finally, I used robocopy to copy over just the newer files into the new mirror backup. While it wasn't the most efficient method, it worked pretty well but it had one important limitation: fsutil only works on local disks. It was also a pretty hacky bit of CMD script since I had to do string manipulation to create the hard links. I had considered re-writing the whole process in C# but then something else popped on my radar.

PowerShell, isn't that some sort of new gasoline?

It was about this time that Microsoft released RC2 of PowerShell (which as just recently gone RTM). PowerShell is Microsoft's new administrative scripting language for the future. Besides be a very good replacement for command shell scripting and VBScript, it is also the new foundation of the management tools for the next version of Microsoft Exchange. It is an amazingly powerful scripting language, easily learned, easily extended, and is easily the more important tool I have learned in a long time.

PowerShell is different from other scripting languages because it is based on the concept of pipelining objects. Many scripting languages, including the native Windows shell, support pipeling text data from command to command. PowerShell is different however in that it pipelines complete .NET objects instead of just textual data. As full .NET objects, each object in the pipeline has state, properties, and methods. They can be passed as parameters to functions, extended dynamically, coerced into other types, and placed back into the pipeline. Functions in PowerShell can also be treated as objects allowing you to do some types of functional programming tasks that are not easily done in other .NET languages. It is a very powerful idea and my brief description doesn't even scratch the surface of the power that lies within PowerShell. It is all still very new to me but already I am finding many uses for it.

Tip: Here's a PowerShell gotcha to keep in mind. Every expression in PowerShell that produces output places that output in the pipeline. This can lead to pretty weird debugging issue if you aren't careful. I had more than one case where a function was returning more than I wanted because I was calling a command that placed things in the pipeline without realizing it. There are two ways to avoid this however. One is to assign the output of commands to a variable and the other is to redirect the output to $null (i.e. do-something > $null).

PowerShell's object pipeline nature along with the rich set of built-in commands knows as cmdlets, makes for a perfect system for doing administrative computer tasks. There are cmdlets for accessing PowerShell providers such as the file system and the registry, accessing WMI object, COM objects, and the full 2.0 .NET framework. I've seen examples of everything from a simple file parsing scripts to a simple but complete HTTP server written in PowerShell in just a few lines of code. To me it appeared to be the perfect language for scripting a new backup process. However PowerShell does not offer support for creating NTFS hard links either. For this I would need to extend PowerShell.

Extending PowerShell through custom C# objects and P/Invoke

Starting with Windows XP there is a new API for creating hard links, CreateHardlink. In previous versions of Windows, creating hard links was somewhat of a black art. You had to use the complex and sparsely documented Win32 Backup API's. It could be done and there are examples of how to do it out there, but it was not for the faint of heart. The CreateHardlink API however solves that, making it almost trivial to create hard links on NTFS. Furthermore, unlike fsutil the CreateHardlink API fully supports creating hard links on remote network NTFS drives. PowerShell cannot easily call native API's on its own though. To do that, you need to extend PowerShell with a bit on .NET code.

PowerShell is very easy to extend. You can write complete cmdlets', objects that fully plug into the PowerShell pipeline framework or you can just create simple .NET objects that can be created and invoked thanks to PowerShell's ability to access the .NET framework.

Using C# and a bit of P/Invoke it was almost trivial to solve the problem of not being able to create hard links in PowerShell (and .NET) by writing a simple object that called the Win32 CreateHardlink API. Once that was done, I could easily create my new .NET object in PowerShell and use it to create all the hard links that I wanted. Now I could create a more complete backup script from the ground up using PowerShell.

If you'd like to access the CreateHardlink API in PowerShell or .NET, here is a C# code snippet to help you. Simply create a new class in a .DLL and add this method. I added this method as a static member since it does not require any state from the class. This also makes it very easy to call from PowerShell.

[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
internal static extern int CreateHardLink(string lpFileName, 
    string lpExistingFileName, IntPtr lpSecurityAttributes);

static public void CreateHardlink(string strTarget, string strSource)
{
    if(CreateHardLink(strTarget, strSource, IntPtr.Zero) == 0)
    {
        throw new System.ComponentModel.Win32Exception(Marshal.GetLastWin32Error());
    }
}

To call this code from PowerShell, you simply load the .NET assembly and then call that static method on your class. Note that this will throw an exception if it fails so make sure you have a PowerShell trap handler somewhere in your script.

# load the custom .NET assembly
[System.Reflection.Assembly]::LoadFrom('YourLibrary.dll')

# create a hard link
[YourLibraryName.YourClass]::CreateHardlink($Target, $Source) > $null

Whoops, that file is in use

There was still one more issue to tackle before I could write a robust backup system, accessing files that are in use. Starting with Windows XP Microsoft introduced a new system for accessing files that are currently in use on Windows systems, the Volume Shadow Copy Service (VSS for short, but not to be confused with Microsoft's VSS source control system).

One of the ideas behind VSS is that when requested, the OS will make a read-only copy of the drive, a snapshot frozen in time, available to a backup program. Other programs can continue to change the original disk files but this shadow copy, or snapshot will remain frozen and completely accessible to the program that created it. Furthermore when a backup program requests that a shadow copy is to be created, the OS can coordinate with shadow copy providers to ensure that the data on the disk is in a consistent state before the shadow copy is created. This further ensures that the files that the backup program has access are in a consistent enough state on the disk to be backed up. This is especially useful for files that are either always open or always changing like the system registry, user profiles, Exchange, or SQL databases. Once the backup program is finished with this temporary read-only shadow copy, it then releases it and it disappears from the system. By using the VSS system backup programs can gain access to every file on the drive even if they are exclusively in use by other programs. For me it was essential to use VSS with any backup process I implemented.

There were a few tough problems though. On windows XP these VSS snapshots are very temporary in that they only exist for as long as you hold a reference to them via COM. Once released, they auto-delete themselves. And unlike VSS on Windows Server 2003 they cannot be exposed as a drive letter for easy access. You have to access them via the native NT kernel's method of addressing NT namespace objects, the GLOBALROOT namespace. On XP when you ask the VSS service to create a snapshot, what you get is a NT GLOBALROOT path that looks like this:
\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1. Unfortunately this is something that not even the native Windows command shell fully understands and if you try and access it from PowerShell or .NET you'll get an exception telling you that you really shouldn't be accessing internal NT paths in .NET. To solve this I would need another bit of custom code to extend PowerShell.

VSHADOW.EXE and exposing a snapshot as a drive letter

VSHADOW is a sample tool that is part of the VSS SDK. It is a command line interface to the VSS API. By using this tool you can create and release VSS snapshots at will. It even has a way around the COM auto-destruction of snapshots on Windows XP by allowing you to call an external program once the snapshot has been created so that you can access the snapshot while VSHADOW is still keeping it alive. It will even create a set of environment variables for you to let you know the names of the GLOBALROOT shadow copies that it has created. This still didn't solve my problem of not being able to access them via PowerShell though (or robocopy for that matter) but having this source code was a good start.

All physical devices in Windows like hard drives exist in the GLOBALROOT namespace. It is only through device name mapping that we can access them via their friendly DOS names like C:, D:, etc.... Normally the OS creates these device mapping automatically at start up or whenever a new device has been connected. VSS snapshots however don't automatically get recognized and mapped. Mapping a friendly name to a VSS snapshot has to be done by directly using the win32 DefineDeviceCreate API. By using this API you can create and remove DOS device mappings to VSS snapshots on the fly even on Windows XP. But since VSS snapshots are temporary you have to manage them carefully or the system could become unstable.

Creating a VSS snapshot and mapping it to DOS device names is well beyond what I wanted to try to do in C#. Luckily for me, the VSHADOW C++ source code was written in a very reusable manor and I could easily reuse it by wrapping a COM object around it.

The not so nice COM interop experience with .NET 2.0

Creating a snapshot is not the simplest of procedures. You have to query for the list of VSS writers, map them against the target volume, determine which ones to include in the process, and finally request that the snapshot be created. You have to hold on to the VSS COM interface to keep the snapshot alive on XP for the duration of its use. When you are done, you have to release it in a controlled manor or the VSS system can completely degrade and require a system restart to recover from in most cases. It is also not the fastest process either, something that would come back to bite me later. However the VSHADOW source which is written in C++, was written in such a way that it made it very easy to turn into a COM object using ATL. It was as simple as creating a new ATL COM object project in Visual Studio and including the core VSHADOW sources file into the project. Once I had it building as a COM object it didn't take me long to put a .NET friendly interface on this new COM object that exposed methods to create and destroy VSS snapshots as well as map them to DOS device names.

PowerShell has native support for create and calling COM objects that is even easier than in other .NET languages. There is no need to create .NET interop classes, you just dynamically creating the COM object and use it much like you would in VBScript. Once I created my new VSS COM object it was trivial to create VSS snapshots on the fly and map them to DOS device names using PowerShell. With my new VSS COM object I now had complete access to VSS snapshots from any tool that could access a standard drive. It has some limitations but for this backup process it works very well.

Releasing the VSS snapshot in PowerShell however was another story. There is no clean way that I can find to force a created COM object to be released in PowerShell. You have to wait for the .NET garbage collector to do its thing which is usually not until the PowerShell process is exiting. My new COM object had its clean-up code in the COM object's Release method so that when it was released it would clean up the VSS state in the proper way, ensuring that the system remained stable. Unfortunately for me relying on a COM object's Release method to work during the .NET shutdown process proved to be one huge headache.

After many, many hours of debugging and not really believing what I was seeing I finally had to accept what was going on. From what I was seeing and from the research I have done it is my understanding that Finalizers in .NET, which are called when an object is being destroyed and which are also responsible for calling a COM object's Release method in PowerShell, are not guaranteed to complete when a process shuts down. Usually this is not a problem as the process is going away anyway. It is a problem however when you have native resources to release.

What I was seeing and not believing for literally hours and hours was that in the middle of my COM object's Release method the PowerShell process would just exit normally. No exceptions, no faults, nothing - just poof it's gone. And every time that it did this it would leave the VSS system in such a state that the machine had to be restarted because I was never given the chance to properly execute the VSS clean-up code which can be a lengthy process. It seems that the PowerShell shutdown process was timing out my clean-up code. It was a complete mess and still one that I cannot believe is acceptable but apparently to the folks who created .NET it is (you can read about it here in way more detail than anyone should have to know. Just search for "timeout" and "watchdog" on that page). The thought that external native code can have the plugged pulled just blows me away.

The fix was rather simple once I realized that I cannot count on my COM object's Release method to always complete. I had to move all critical clean-up code and put it in a public method that my PowerShell script would always call. Luckily PowerShell has pretty decent error handling and it wasn't too hard to ensure that I always called the clean-up method on my COM object before PowerShell normally terminates. I'm still not thrilled about this though. I would have preferred that my COM object be allowed to clean up after itself as necessary.

The moral of this story is that you are responsible for all complex clean-up even when calling native code. Don't depend on the .NET framework to always play nice.

Now that I had this behind me I had all the pieces that I needed: a robust file copy tool, a powerful scripting language, the ability to create hard links, and full access to Volume Shadow Copy snapshots.

 

In part four I'll cover the process overview and implementation details of creating the intelligent mirror backup process that I choose to be the foundation of my new backup strategy.


Comments (14) -

Anonymous
12/10/2006 1:58:01 PM #

I was looking into backing up my home system using a Powershell script and I stumbled upon your site here.  Great ideas throughout.  Your idea with links works great - I tried the C# program above on my machine.  I am looking forward to part 4!

Farhan Ahmad
12/10/2006 5:59:47 PM #

Hi,

I have been looking for a backup solution for my home network for the last few weeks.  Unfortunately I haven't been able to find anything good yet :-/  So far what you have described looks promising, I can't wait for more implementation details Smile Keep up the good work, and thanks for sharing your experience!

Farhan

Lawrence Mok
3/22/2007 4:45:58 AM #

Before reaching your site, I was spending nights in these 2 weeks trying to figuring out incremental backup using robocopy and hardlink exactly like what you have described in the paragraph beginning with "Together with robocopy and a bit of creative CMD scripting....."

It was really a nightmare to make those strings work. I'm near success now, just figuring how to "use robocopy to generate a list of the files that had changed since the last backup, including files that were no longer present. From that listing deleted those files from the newly created mirror backup", as obviously robocopy is not breaking the hardlinks in the backupcopy when running.

Could you share us your batch scripts too, it must be very helpful if I can see your scripts earlier before writing my own.

And here's part of the script I've got so far..

REM create empty folder structure
ROBOCOPY %backup_dest_dir%\latest %backup_dest_dir%\%backup_start_datetime% /MIR /R:2 /W:1 /XF *

REM create hardlinks for files
ECHO Creating hardlinks
FOR /R %backup_dest_dir%\latest %%a IN (*) DO (
  SET backup_source_file=%%a
  SET backup_dest_file=!backup_source_file:%backup_dest_dir%\latest=%backup_dest_dir%\%backup_start_datetime%!
  MKLINK /H "!backup_dest_file!" "!backup_source_file!"
)

Stephen Thomas
8/21/2007 4:39:02 AM #

Last year, I implemented the same kind of hard-link-based redundancy-avoiding Robocopy backup you're talking about for a school server running Windows Server 2003; like yours, my prototype system was based on CMD scripts and Robocopy and FSUTIL HARDLINK.  It worked fine - it wasn't even terribly slow - but once I had a few tens of sets of hardlinked trees on my backup drive and went to try to delete the oldest one, the deletion took literally hours.  Didn't matter whether I did it with Explorer, or rmdir /s, or with Cygwin; deleting the oldest tree was always at least twice as slow as running my script to build an entirely new one.

I came to the conclusion that NTFS is simply not very good at dealing with filesystems that have millions of directory entries in total.

Simply turning on Server 2003 Volume Shadow Copy on the backup drive, and running a regular Robocopy job to back up files that had changed on the main drive, gives me similar levels of space efficiency, and a backup drive that's actually usable; and the Previous Versions client makes the backups available to workstations in much the same way as my old hard-linked trees did.

If there's some easy way to un-cripple XP's shadow copy stuff so it works like what's in Server 2003, that would be good to know about.

David Jade
8/21/2007 5:42:41 PM #

I too ran into the issue of slow directory tree deletions. One of the things that I did was to only do tree deletions locally on the backup server and it use "rd" which for whatever reason is magnitudes faster than anything else at directory removal. To do this I had to use the hacky solution of using PSEXEC to run "rd /s" remotely. A terrible hack but it's been working for months. Someday I'll make a service that I can trigger to do the work locally but for now this works.

Julian Ross
7/22/2008 8:56:19 AM #

Great article!
A couple of questions... Are you still using NTbackup to create the system state?

Also, just out of curiosity.. if I understood correctly, the main reason for using rsync was it's inability to replicate all file attributes etc. right? (Can it at least create hardlinks on NTFS through cygwin?)

Again, thanks
Julian

David Jade
7/23/2008 11:00:17 AM #

Yes, on Server 2003 & XP systems I still use NTBackup for backing up system state. On Vista there is no option other than to image the system disk (I use something called DriveSnapshot rather than the Vista built-in stuff).

In fact I image all of my disks periodically as well as using my backup process and just used an image backup plus my nightly data backups when my laptop drive recently failed. I was back up and running in under 2 hours, once I had the new drive.

And yes, I don't use rsync because it cannot back up all of my data's state. It does not support NTFS ACLs, alternative streams, junctions, or symbolic links. It will however properly create NTFS hardlinks.

david

Julian Ross
8/13/2008 2:56:28 AM #

Would using DefineDosDevice to assign a drive letter to a shadow copy mean that \\?\GLOBALROOT paths would be accessible from C#/managed code through the new drive letter?

This hasn't worked yet since I haven't managed to make DefineDosDevice work consistently yet...  (Drive shows up without a name, as "Disconnected" or many times does not show up at all from Explorer).. Not sure if it's even meant to show up or just be available from the command prompt only.

I want to make sure it's actually worth it (re. C# question above) before I keep banging my head :S

Julian Ross
10/10/2008 12:51:45 AM #

Have you noticed that: if you put CreateHardlink() in a loop to hardlink for example 20,000 files, then the time taken to complete the process increases in proportion to the number of hardlinks the files already have?

(in other words, each execution of the loop takes longer and longer)

Felix Brodmann
11/20/2008 3:07:19 AM #

I made the Hardlink-Backup with the unix command cp (with cygwin). so i need no script, only one command.

cp -lrv /cygdrive/xx cygdrive/yy

I see another problem from robocopy: for changing new files, the old file must be deletet first for killing the hardlink bevor it copy the new file over the old one.

Thorsten Albrecht
3/30/2009 3:46:39 AM #

An alternative to deloping a hardlink based backup for yourself is using the free open source software package "rsnapshot":

http://www.rsnapshot.org/

It works without any problem under cygwin.

Thorsten

David Jade
3/30/2009 10:50:29 AM #

True, but like all cygwin/rsync solutions it lacks support on a Windows system for:

* Utilizing VSS snapshots so that in-use files can be safely backed up
* Backing up using the Windows backup process mode, which allows you to traverse security permissions
* Backing up NTFS security info
* Backing up NTFS alternative streams

David

Júlio Maranhão
4/4/2009 8:18:13 PM #

About cygwin/rsinc solutions:

1) VSS snapshots can be made by means of vshadow.exe and dosdev.exe. They allow any program use VSS promptly.

2) No ADS backups. EFS uses ADS, but I do not recommend EFS in a backup. If you loose the user account that encrypted the files, you are lost. Just use TrueCrypt or another password-based solution.

3) Windows backup process mode. Just use an Admin account and punish users that change ACL to block Admins. Smile

4)NTFS ACL. Nor robocopy (XP), neither xxcopy handles ACL changes as "files needs to be copied". As the data is much more important than ACLs, I stick with rsnapshot, rdiff-backup, ntbackup or a comercial enterprise -grade solution, depending my requirements. They are apps that just work.

Robocopy is a buggy app (XP). Maybe a total reworking using PowerShell and/or .NET and/or Win32/64 API can make an NTFS-compatible rsync type app.

Cheers.

JĂșlio

Stephen Thomas
10/2/2009 7:38:25 AM #

Grrr. I'm back on the forest-of-hardlinks trail again, having pretty much given up on persistent volume shadow copies.

I'd developed quite a nifty script for deleting old shadows, which if shadows worked as advertised, should have given me a Time Machine-like set of backups (30 dailies, 13 weeklies and 21 monthlies) but I have been bitten *twice* now by Windows just summarily deciding to delete *all* the existing shadow copies and start over, for no reason I can understand.

It might have something to do with large amounts of change in the "live" drive contents compared to that of the shadows, but with Windows's usual inscrutable failure to log things properly I doubt I'll ever find out.

Also, the Previous Versions workstation client regularly fails to work as it should: the Previous Versions tab often mysteriously goes missing from file or folder property sheets that should have one, even when all shadow copies are present and correct on the backup drive. This makes the concept of self-service backup retrieval mostly unworkable.

So here we go, NTFS. Let's see whether you really can be persuaded to deal reliably with a directory tree containing millions of entries.

Grrrr.

It's almost enough to make a body virtualize the entire environment and use the VM host's volume snapshot tools to do what NTFS should be doing natively.

Comments are closed

Flux and Mutability

The mutable notebook of David Jade