Automated Backups with Existing Tools
by Peter Hickman02/10/2004
The big difference between the world of Unix and that other place is that in the other place they have applications and we have tools.
Applications require learning, not just parameters and keystrokes but a whole way of working. If the programmer who created the word processor thinks you should spell-check your documents before you save them, then check them you shall.
Tools, on the other hand, also require learning, but instead they ask "What do you want to do?" The choice is yours. And with a good toolbox full of these, anything is possible.
Backup -- The Job Nobody Wants to Do
To backup my iMac, I bought a small FireWire hard disk and used Carbon Copy Cloner to make my backups. Although this software is simple and effective, there were two flaws. First, the backups only happened when I remembered to initiate them, and second, I had only one backup, whenever I last made it. I wanted something that addressed these issues, so I sat down and listed my requirements:
|
Related Reading
Mac OS X Panther for Unix Geeks |
- Automatic backup, preferably when I was not using the Mac.
- Several generations of backup available.
- As cheap as possible.
I did look at some commercial software. But in addition to the cost of the software itself, I would need to get a bigger drive and learn how to use the application (see paragraph 1). The application and I did not get along, it soon found itself on eBay, and I was back where I started, along with the investment of a 120 Gb disk. It was time to dig into my existing toolbox and see what we could cook up.
Apple already provides a tool to make backups -- ASR (Apple Software Restore), which allows one volume to be cloned onto another. In addition to this, we also have the multifunctional hdiutil, which can create, mount, and unmount disk images. With these and a handful of standard Unix tools, we are well on our way to a happy ending.
We need to create a disk image of the required size on the external drive, mount it, clone the source drive into the image, and unmount it.
Writing the Code
Let's write the code (the line numbers here are just for reference).
First we need to name a few things. SOURCE is the volume
that we are backing up -- in this case root. FILEDEST is
the volume where the backup will be stored. All disk images require
a VOLUMENAME, otherwise they get called untitled, which is really not a lot of use to anyone, so we set it to backup_YYYY-MM-DD and use it as the basis of the IMAGENAME.
1:#!/bin/sh
2:############################################
3:# Set some variables
4:############################################
5:### The name of the volume we wish to backup
6:SOURCE='/'
7:### The volume onto which we are putting the backup
8:FILEDEST='/Volumes/Overflow'
9:### Creating the backup disk image file and volume name
10:VOLUMENAME=backup_`date +%Y-%m-%d`
11:IMAGENAME=$FILEDEST/$VOLUMENAME.dmg
Next we need to know how big to create the image. Line 15 tells us the
disk usage in Mb of the SOURCE drive, line 16 gets the
actual Mb used and line 17 adds 5% to the size just to make sure that
there is actually enough room.
12:############################################
13:# Find out the size we require (disk usage in Mb + 5%)
14:############################################
15:SIZE=`df -m $SOURCE | grep '^/'`
16:SIZE=`echo $SIZE | cut -d" " -f3`
17:SIZE=`dc -e "$SIZE $SIZE 5 / + n"`
Next we create the image file on line 21 and mount it on line 25. As
we mount (attach) the new volume we are grabbing the device name, /dev/disk1s4
for example, that we will require when we unmount the image at the end
of the script.
18:############################################
19:# Create the image
20:############################################
21:hdiutil create -quiet -megabytes $SIZE -fs HFS+ \
-volname $VOLUMENAME $IMAGENAME|
22:############################################
23:# Mount the image and capture the device name
24:############################################
25:DEVICE=`hdiutil attach $IMAGENAME | grep $VOLUMENAME | cut -d" " -f1`
Now we can backup the SOURCE into the image file. The full
path for asr is given so that the command can be found
when we run it under the more limited cron path arguments.
26:############################################
27:# Do the actual backup
28:############################################
29:/usr/sbin/asr -source $SOURCE -target /Volumes/$VOLUMENAME -noprompt
Finally we need to unmount the image using the DEVICE we
captured on line 25.
30:############################################
31:# Unmount the image
32:############################################
33:hdiutil detach -quiet $DEVICE
Making Things Even Easier
So we have a tool that will backup a designated volume into an image
file. To run it, which will have to be done as root, we have to enter
sudo backupimage, and almost everything we wanted is ours.
Now we want to automate it so that we no longer have to remember to do this
chore, and it can execute when we are tucked safely in bed. In the world
of Unix we have cron, which runs tasks for us at regular
intervals. So we need to set up a cron,entry as root to run our script
once a week (or the frequency you prefer).
[peterhickman]$ sudo crontab -e -u root
0 0 * * 0 /usr/local/bin/backupimage
Should things go wrong, cron will mail you if it can, but if sendmail is not configured or running on your machine
(and by default it is not running) then look at /var/mail/root,
which will tell you what the problem was.
So there we have it. All we ever needed to do our backups, other than the external disk, was in our toolbox all the time. External hard disks are cheap and there is no reason to buy additional software.
Other Issues
It's important to remember that your external hard disk, although large, is not infinite. Every now and then check that you're not going to run out of space. That too could be automated, but would detract from the simplicity of our current solution -- another problem for another day. Also just because the script runs without error does not mean that it has worked; open your backup image file and rummage around. Are the files there? Are they correct?
You can automate your chores but you cannot automate your responsibility.
Final Thoughts
Thanks to solid built-in tools like these, Macs are putting the power back into the hands of users who like to fiddle under the hood. Almost everything we need is there. We just forget sometimes.
Peter Hickman is currently working as a programmer for Semantico, which specializes in online reference works and Access Control Systems. When not programming or reading about programming he can be found sleeping.
Return to MacDevCenter.com.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 44 of 44.
-
incremental backup without archive bit?
2005-08-07 07:07:38 gebseng [Reply | View]
-
incremental backup without archive bit?
2005-12-07 09:43:12 mp459 [Reply | View]
I, too, use rsync. I do not have an external HD, but I _do_ have an exteranl dvd burner, and internal CD burner. I do a full backup to my DVD and then incrementals (differentials, actually) using rsync to a disk image which I then burn to CD. Using the --compare-dest=the_DVD arg in rsync, I can keep a full backup on a DVD and differentials on CDs.
-
incremental backup without archive bit?
2005-08-07 13:38:52 peterhickman [Reply | View]
Well for each backup archive you could create a catalog of the files that are in the archive along with their size, creation and modification dates and a sha1 checksum. Then when creating an incremental archive you only have to compare the files you are processing to the catalog to decide which files to include.
You do not necessarily need to keep the old archives available but you would need to keep the last full catalog to hand.
Infact there is a tool in OS X called mtree that does just that. It builds a catalog of the filesystem and can be used to compare the catalog against the current filesystem and report any differences such as files added or removed, file sizes or checksums changing. From this report you could then decide which files to add to your incremental archive.
-
New error under OS X 10.4.x
2005-05-27 07:50:13 todd_a [Reply | View]
I've found the script in this article very useful and have used it a number of times. However, since upgrading to 10.4 I now get an error I wasn't getting before:
Validating target...done
Validating source...done
Validating sizes...done
Restoring...
Copying "/" (/dev/disk0s3) to "/Volumes/backup_2005-05-26" (/dev/disk2s2)...
asr: did not copy blessed information to target, which may have missing or out-of-date blessed folder information.
I'm currently on 10.4.1 and didn't run the backup while on 10.4.0 so I'm not sure if the error started at 10.4.0 or 10.4.1.
Is this error something I need to be concerned about?
-
Interesting Article
2004-03-22 11:08:12 duckfoo [Reply | View]
But couldn't you save yourself some time and create a sparse disk image? A sparse disk image, if I understand correctly, can dynamically increase in size as needed, and should make your shell script easier. -
Interesting Article
2004-03-24 13:31:04 peterhickman [Reply | View]
There is a simpler solution that has the image file built up as it goes.
hdiutil create -srcfolder $SOURCE -fs HFS+ -format UDRW -volname $VOLUMENAME $IMAGENAME
This does not create a compressed image but avoids the whole 'create a blank image' thing. It makes the script shorter and more portable, however the whole backup process actually takes longer (1 hour 6 minutes versus 52 minutes for my 10 Gb backup).
For small amounts of data, say 10 to 40 Gb, it doesn't take too long, up to 4 and a half hours. But when the amount of data gets passed 70 Gb it becomes the difference between 6 hours and nearly 8 hours.
Did I really just say that 40 Gb was a small amount of data? -
Re: There is a simpler solution
2005-06-27 11:45:54 LHaim [Reply | View]
Hello Peter,
Thanks for posting your script ("origial" & "simpler"). I tried them both, and failed.
First, "simpler": here's what happened after running the script (writing out to a FW external with 85GB free space; data is 41 GB)
prompt> backupscript_simpler
Initializing...
Creating...
Copying...
.......
Finishing...
hdiutil: create failed - Bad file descriptor
prompt>
-----
Next "original"
prompt> backupscript_original
(script crashed so ran line-by-line from shell)
prompt> /bin/sh
sh-2.05b# SOURCE='/'
sh-2.05b# FILEDEST='/Volumes/MAXTOR_2/Mac_backups'
sh-2.05b# VOLUMENAME=backup_`date +%Y-%m-%d`
sh-2.05b# IMAGENAME=$FILEDEST/$VOLUMENAME.dmg
sh-2.05b# SIZE=`df -m $SOURCE | grep '^/'`
sh-2.05b# SIZE=`echo $SIZE | cut -d" " -f3`
sh-2.05b# SIZE=`dc -e "$SIZE $SIZE 20 / + n"`
sh-2.05b# hdiutil create -quiet -megabytes $SIZE -fs HFS+ -volname $VOLUMENAME $IMAGENAME
sh-2.05b# DEVICE=`hdiutil attach $IMAGENAME | grep $VOLUMENAME | cut -d" " -f1`
load_hdi: IOHDIXControllerArrivalCallback: timed out waiting for IOKit to finish matching.
hdiutil: attach failed - No such file or directory
sh-2.05b#
---
Result: no backup.
Any comments, observations, suggestions?
Thanks!!! -
Re: There is a simpler solution
2005-06-28 14:00:14 peterhickman [Reply | View]
Not seen that one before. You could try setting SOURCE to some smaller directory such as /User/fred/Documents or whatever and running it with that.
Also try removing the-quietoption from thehdiutilline and see if that tells us anything interesting.
After thehdiutil createdoes the IMAGENAME exist? You should be able to mount it even if it is empty.
-
one liner for SIZE
2004-03-09 11:33:50 rread [Reply | View]
SIZE=`df -k / | awk '/^\// {printf("%d\n", (($3 / 1024) * 1.05))}'`
-
been using rsync with no problems
2004-02-16 17:51:42 darndog [Reply | View]
cron calls incremental backups of my users folder every morning at 6am except for mondays when the destination is synced to source. Nice to see so many free solutions to personal backups on OS X though. dD
-
Compression?
2004-02-15 17:22:05 lonney [Reply | View]
I'm going to get a firewire disk soon for backups of my shiny new PowerBook 12" DVI (its my first apple and I love it to bits)
Nice article, I recently asked this question on IRC in one of the Mac channels but the answer was just to be lazy and use CCC.
This is good, its nice and simple. just what I was after.
Are disk images compressed? if not is it possible to compress them and still work with them in the same way?
-
Compression?
2004-02-16 15:05:06 peterhickman [Reply | View]
By default the images created are compressed but as my script creates a blank image of a fixed size first the compression is to no avail. However as you may have seen suggested in the feedback there is another way of creating an image. Keep lines 1 to 11 of my origonal script, delete the rest and then add the following line.
hdiutil create -srcfolder $SOURCE -fs HFS+ -volname $VOLUMENAME $IMAGENAME
If you hold on for 24 hours I am testing this and will tell you how it goes but I don't expect it will be amazingly smaller (but we shall see) but it should be quicker.
And you should be able to use the resulting image files in the same way as before. -
Compression?
2004-02-16 23:57:40 peterhickman [Reply | View]
I take that back, it does indeed compress well my 11Gb went down to 5.5Gb. However, and there is always one, the backup took longer, almost twice as long, and the image file takes a damn long time to mount.
Presumably uncompressing a 5Gb image takes some time.
-
no "-m" flag for Jaguar's df command
2004-02-14 18:15:13 dorkypants [Reply | View]
df -m must be a Panther thing, it provokes a usage message on Jaguar (10.2.8). df -k reports in kbytes and the result can be divided by 1024 to get SIZE in Mbytes.
-
Try SuperDuper
2004-02-14 05:49:21 mjeb [Reply | View]
Copy, clone, rollback.
http://www.shirt-pocket.com/SuperDuper/SuperDuperDescription.html
-
Alternative method
2004-02-13 03:46:15 Fraser Speirs | [Reply | View]
I didn't want to back up everything every time. so I adapted your method slightly. What I did was this:
1. Create a temp folder
2. Create a list of excluded directories (Music, Pictures, Movies, .Trash)
3. Use rsync to sync my home directory to the temp directory, using --exclude-from to avoid copying those big diretories
4. Use hdiutil's -srcfolder option to image the temp directory
-
RsyncX, rsnapshot
2004-02-12 20:34:19 dameronm [Reply | View]
Here's something to try:
RsyncX http://www.macosxlabs.org/rsyncx/rsyncx.html
gives fast, incremental syncing with hfs support
rsnapshot http://www.rsnapshot.org/
uses rsync to create hardlinked "snapshots" of Disk/Folder state without duplication of file content. You get full backups, with previous versions available, very little wasted space, and all accessable with standard file system tools.
That should take care of all your requirements, yes? :)
-
How about this?
2004-02-12 05:41:37 rspeed [Reply | View]
I would absolutely love to see something like this that uses CVS instead of disk images. It would be a whole lot more space efficient and it would be much easier to see when changes were made. It seems like it would be a near-perfect use of the technology. -
How about this?
2004-02-12 20:51:53 jslabovitz [Reply | View]
CVS is problematic since it doesn't retain any sort of metadata (eg, resource forks, Finder info, etc.), nor does it work well with binary files at all.
The "Subversion" system might work better; I don't know.
I've been using rdiff-backup. It's a Python program wrapped around the librsync library, which enables incremental backups, optionally networked via ssh. The nice thing is that the backup location always has the full version of the current file; older revisions are stored as diffs. Rdiff-backup works great with binary files, and recent versions even handle all the Mac metadata.
Unfortunately, there's no GUI for rdiff-backup. CCC is definitely easier to use. Personally, I use both: CCC for an emergency, gotta-get-working-NOW backup, and rdiff-backup for long-term, incremental backups.
--John Labovitz
-
cron and sleep
2004-02-11 17:21:38 dejones1 [Reply | View]
How are you handling the fact that OS X does not run cron jobs if it is asleep? cf <http://docs.info.apple.com/article.html?artnum=107388>, where Apple recommends either using third party software or running the tasks manually. -
cron and sleep
2004-02-15 07:57:22 peterhickman [Reply | View]
Well I've run some tests and it doesn't handle sleep mode. -
cron and sleep
2004-02-23 16:42:34 dejones1 [Reply | View]
What will work (sort of) is to load anacron. At least that way the job, even if scheduled when the system is asleep, will be sometime run after the scheduled time. Of course, for running a backup it would be much nicer to have it kick off in the middle of the night when the system is not being used.
-
srcfolder
2004-02-11 11:41:45 chneeb [Reply | View]
If you use the-srcfolderor-srcdiroption ofhdiutil, you don't need to calculate the size, mount the image, copy the$SOURCEand umount the image. hdiutil can does everything for you. I only tried this with directories not with whole partitions.
Christian -
srcfolder
2004-02-12 09:20:44 franiglesias [Reply | View]
Sorry. I've reading man hdiutil (Mac OS X 10.2) but didn't find anything about -scrfolder. Is it a Panther-only feature? -
srcfolder
2004-02-12 13:03:17 peterhickman [Reply | View]
The-srcfolderflag is available under thecreateoption.
"-srcfolder directory specifies the image size based on the contents of directory. -srcfolder also specifies that the contents of directory should populate the resulting image. -srcfolder copies file by file, creating an optimized filesystem on the destination image (which then could be restored by asr(8)). -srcdir is a synonym for -srcfolder."
This is under panther, not sure about jaguar.
-
Calculation is wrong
2004-02-10 23:50:50 chrisridd [Reply | View]
The calculation on line 17 actually adds 20% to the size of the disk, not 5%!. To really just add 5%, divide the disk size by 20 instead of 5.
-
Errors
2004-02-10 22:21:19 jb- [Reply | View]
I'm getting an error that it can't attach in thehdiutil attachstep. The disk is still churning away and creating the 82GB file for the backups. Outside of a for/next loop to make it wait any suggestions? -
Errors
2004-02-11 01:42:03 peterhickman [Reply | View]
Not sure what the problem could be. For me a 12 Gb backup takes around 52 minutes, so it would take around six hours for you, all things being equal.
If you could email me your values forSOURCEandFILEDESTalong with the output ofdfI'll have a quick look at it. -
Errors
2004-02-11 07:26:50 jb- [Reply | View]
It is creating the .dmg properly. HOwever I'm getting these errors
hdiutil: attach failed - No such file or directory
Validating target..."/Volumes/backup_2004-02-10" is not a volume
couldn't validate target - No such file or directory.
When I do an ls the file is there but is sill growing in size and I can see the disk writing (creating the empty disk image). If I wait until the image has been created and then comment out thehdiutil createline it seems to work fine (its running now). I could add aSLEEP NNNline but I'm looking for a cleaner way.
-
Errors
2004-02-11 07:43:45 peterhickman [Reply | View]
The error makes sense if thehdiutil -createis still running when it attempts to attach the .dmg. If it was still creating it then it is not formated correctly.
I note that there appears to be a stray pipe '|' character at the end of line 21. Dont know how that got there. Is the pipe in the script you are using?
-
Why Not Ditto?
2004-02-10 22:03:01 datasetgo [Reply | View]
I just posted a brief how-to for backups using ditto rather than ASR and hdiutil. From what I understand, CarbonCopyCloner uses ditto for its backup routines. Plus, it's a one-line script - much easier to implement.
Go to my site, DataSetGo to check it out. -
Why Not Ditto?
2004-02-11 08:27:24 sharumpe [Reply | View]
The reason is that the intent was to copy to a dynamically created disk image, not to another disk as a whole. You could probably do the same thing using ditto instead of asr, though since ASR is built for cloning drives, whereas ditto is built for copying files (similar but distinct), ASR may offer benefits over ditto. (I don't know that for sure)
Mr. Sharumpe -
Why Not Ditto?
2005-06-27 11:31:21 LHaim [Reply | View]
I know it's been a long time, but I tried your "ditto -rsrcFork -c -k /path/of/src/dir /path/to/dest/archive/backup.zip" (with appropriate replacements of paths & names)
Got the following errors:
ditto: //automount/Servers: Operation not supported
ditto: //automount/static: Operation not supported
ditto: //dev/fd/3: Bad file descriptor
And ended up with a 2.55 GByte .zip file for a 60 GB HD w/41 GB on it.
Any explanations/help?
In general, I've had lots of frustration trying (and failing) to do backup on this mac using different tools/approaches.
Thanks. -
Why Not Ditto?
2004-02-12 11:01:15 ponder [Reply | View]
Ditto is capable of creating a single archive file. No need to worry about a static-size image and ditto can compress on the fly. You can pipe the results to another application or store them locally. Here's a piece of my own backup script, run from cron:
ditto -c -z -rsrc $filesystem - | ssh backup@$DUMPHOST dd of=$filename
-
ASR vs. CCC
2004-02-10 21:14:11 fdiv_bug [Reply | View]
Thanks for posting an informative article -- /usr/sbin/asr is indeed a very handy utility. There are, however, some issues which I'd like to raise with your new backup strategy.
The first is that you mention that Carbon Copy Cloner requires you to remember to run it which is not the case. Once you've configured a clone operation, but before you click the Clone button, click the Scheduler button. This will enable you to make a scheduled backup without you having to remember to do anything.
The second issue I have is that ASR seems to be a bit inefficient. You'll end up with multiple disk images over time, which is fine I suppose, but each one is making a full dupe of your hard drive. Setting up CCC with the proper preference settings and psync requires it to only copy what's changed, thus saving time on your backup.
Thanks again for the informative article! -
ASR vs. CCC
2004-02-11 01:34:04 peterhickman [Reply | View]
You are correct that Carbon Copy Cloner allows scheduling, that is an error on my behalf. Indeed CCC is a great tool and is very adaptable. However one of my requirements was for several generations of backups so that I could restore to a previous state.
I never worked out how to get this to work with CCC but my script did what I wanted. Being a programmer I must confess to coding up my own solution to a problem rather than looking for pre-existing solutions. -
ASR vs. CCC
2004-02-12 08:23:28 kelleherk [Reply | View]
In fact, when you set up a schedule in CCC, it actually creates a full custom shell script and adds a cron task to the system crontab to execute it. So the CCC program itself does not actually open and run ... it has just automatically created a sophisticated synchronize shell script (which you can customize yourself if you are a programmer).
The nice thing is that with the pysnc functionality enabled in the preferences, the backup is super quick. I have an external 60GB Firewire drive on my desk at home. Each night when I get home I just plug the FW drive into my powerbook (also with 60GB drive) and at midnight, the CCC backup script runs.
I have about 45GB used on my drive and the whole backup task which insludes permissions repair and syncing the changes to the backup drive takes about 18 minutes every night usually.
The only drawback is not having incremental restore points. But, for me having a backup clone of my drive lets me not worry about Powerbook theft, field damage, hard drive failure, major upgrades, trying a hack on the OS, etc.
Try CCC .. it's great and free! ... all you need is a FW drive the same size as your internal drive.






the obvious problem is, that the backup software can determine whether or not to add a file only by it's creation or modification date. BUT a file that has been added to the source folder since the last backup that has NOT been modified AND has an older creation date (e.g: an older photo imported from a digital camera) will be left out of a new backup and will not be backuped at all.