Cleaning iPhoto
by brian d foy02/27/2004
It seems that almost every other soldier in Iraq has a digital camera and a CD burner. So not only do I have my own photos, but I have a collection of CDs of photos from other soldiers, some of whom I know and some I don't. These CDs have been copied and passed around so much that we do not know who took some of the photos or which unit they were in. Indeed, some pictures show up on different CDs as people merged photo collections to create new ones.
Organizing these thousands of pictures by hand was a daunting task, especially since iPhoto is so slow, so I came up with some scripts to do it for me. The scripts are only a bit faster, but at least I can leave them running while I go do something else. I ended up using a mix of AppleScript and Perl's Mac::Glue module.
Upgrading to iPhoto 4
I left for the Middle East before Apple released iPhoto 4, skipping a version 3 to release something reportedly faster than its predecessor, iPhoto 2.
I had to wait for awhile to get iPhoto 4 since it comes with iLife 4, which includes iTunes 4, available separately for free, and two applications that I did not want, iDVD and iMovie, making the entire package more than I could download even if Apple offered that as an option. I had to buy the entire package for $49, which Apple shipped to me on a CD and DVD.
|
Related Reading
iPhoto 2: The Missing Manual |
The upgrade was not much help -- certainly not $49 worth of help since even iPhoto 4 is sluggish on my G4 PowerBook. I still have hope that iPhoto will one day be usable, which is why I keep using it.
Using Multiple Libraries
I could only tolerate the previous versions of iPhoto if I kept the library size under a couple hundred photos, even though I have thousands of pictures. The file ~/Library/Preferences/com.apple.iPhoto.plist tells iPhoto which directory holds the photo library. I can change that myself, if I like, or I can use a program like iPhoto Library Manager or iPhoto Buddy.
If I use multiple libraries, I can keep a small number of pictures in each library so I can burn an entire library onto one CD. So far, iPhoto does not have a way to archive large libraries across multiple CDs. iPhoto tends to be more responsive with smaller libraries only, which was a big problem with iPhoto 2. And although iPhoto 4 improves on this, it still has a way to go.
Finding Thumbnails
Some of the CDs that we passed around had thumbnails of the full-sized photos, as part of some HTML export feature of some photo software. I wanted to get rid of those.
When I delete a photo with any of these, I do not really remove it from the iPhoto -- iPhoto just moves it to the special library named Trash, so I can still recover them if I make a mistake. When I really want to get rid of the photos I empty the Trash with the "Empty Trash" item in the File menu. I keep a backup copy of all my iPhoto libraries in case I make a really big mistake (it's only happened a couple of times).
I wrote a script to go through each photo in the Photo Library and check each photo's dimensions. If the photo is smaller than a certain size, I remove it. I first tried this with AppleScript.
To start, I get the count of photos in the Photo Library, then start processing them in reverse by their order in the library. If I start at the beginning, then remove a photo, all of the photos after it move down a number and I end up skipping some photos, and when I get to the end, I will try to access a photo number that no longer exists.
For each photo index, I check the height and width of the photo. I figure the image is a thumbnail if either of those dimensions are less than 200 pixels, an arbitrary number I chose as the threshold. If either of those tests are true, I remove the image. I have to remove the image from the Photo Library album itself, because if I remove it from an album that I created, it is still in the Photo Library.
When I run any of the scripts in this article, I should not interact with iPhoto -- not even to look at photos in another album. These scripts pretend to be me doing the same thing, and when I manually play with iPhoto while the scripts do their work, the script gets confused.
tell application "iPhoto"
set myAlbum to photo library album
repeat with myIndex from (count of photos in photo library album) to 1 by -1
set thisPhoto to photo myIndex of photo library album
if width of thisPhoto < 200 or height of thisPhoto < 200 then
remove thisPhoto from photo library album
end if
end repeat
end tell
For my library of approximately 6,000 photos, this script takes all night to run. I do not automatically empty the Trash (although iPhoto has an AppleScript command for that) so that I can manually inspect it to check if I accidentally deleted something I want to keep.

I tried a Perl version of this AppleScript, using the Mac::Glue module, to check if it would be any faster. I translated it as closely as I could to make a fair comparison, and this script finished the same library (which I had restored from a backup copy) in about an hour. That is quite the speed-up! Chris Nandor and Simon Cozens introduced Mac::Glue in an earlier article.
#!/usr/bin/perl
use warnings;
use strict;
use Mac::Glue;
my $iphoto = Mac::Glue->new( "iPhoto" );
my $album = "photo library album";
my $library = $iphoto->prop( $album );
my $count = $library->prop( "photos" )->count;
print "My count is $count\n";
for( my $index = $count; $index > 0; $index-- )
{
my $photo = $library->obj( photo => $index );
my $width = $photo->prop( "width" )->get;
my $height = $photo->prop( "height" )->get;
next unless( defined $width and defined $height );
if( $width < 200 or $height < 200 )
{
print "\t--->deleting $index: w $width h $height\n";
$photo->remove();
}
}
Now all of the thumbnails in my library should be in the Trash, where I can recover them if I like. Once I inspect them and ensure I am not getting rid of anything I want to keep, I empty the Trash.
Getting Rid of GIFs
Not only does a Perl version of the equivalent AppleScript run a lot faster, I get to use all of the power of Perl to decide what I want to do in between the interactions with iPhoto.
When I imported some photos, I found some GIF images that look like they were probably background images for the web pages their photo manager created when exporting the photos. I also found one CD that had GIF slides from a PowerPoint presentation, which I thought was odd.
I start the same Perl program I used earlier, but I changed
the test to check the image's file name (an image also has a
title, a name, and a path). The "image_filename" property
includes the full filename with its extension, and if that
ends in .gif, no matter the case (so .GIF matches too),
I remove the image. Again, these images end up in the
Trash, and I can manually inspect them before I finally get
rid of them.
#!/usr/bin/perl
use warnings;
use strict;
use Mac::Glue;
my $iphoto = Mac::Glue->new( "iPhoto" );
my $album = "photo library album";
my $library = $iphoto->prop( $album );
my $count = $library->prop( "photos" )->count;
print "My count is $count\n";
for( my $index = $count; $index > 0; $index-- )
{
my $photo = $library->obj( photo => $index );
my $name = $photo->prop( "image_filename" )->get;
print "$index: $name\n";
next unless defined $name;
if( $name =~ /\.gif$/i )
{
print "\t--->deleting $name\n";
$photo->remove();
}
}
I could also do this with AppleScript if I boned up on its string handling features. I would much rather use my AppleScript In a Nutshell book printed on real paper than make my way through Apple's PDF version of its more than 300-page AppleScript Language Guide.
Finding Duplicates
iPhoto can generally keep me from importing duplicate images by popping up a dialog that shows me an image already in the Photo Library and the one I am importing if it thinks they are the same. And most of the time iPhoto is right … but not always.
Instead of telling iPhoto to automatically not import duplicates, I tell it to import all files, then handle it myself later. This way I can walk away from the computer while iPhoto imports a CD full of pictures, which takes a long time, without worrying about it being held up with a dialog to which I do not respond.

I found removing duplicate images a bit more tricky than my previous clean-ups. I assume that "duplicate" means the exact same image file, not just the same photo with a different name, different comments, or other different meta-information that someone might have changed.
I use the MD5 digest to determine which photos are the same.
This digest is a digital fingerprint of the file, and each
file should have a unique fingerprint. For each photo, I get
the image_path property, which is the full path to the
image. For each path, I get the MD5 digest by reading the
file directly in the Perl program without going through
iPhoto. If that particular digest does not exist in my
%digests hash, I add it. If the digest does exist, I must
have seen the exact same file before, which means the one I
am currently processing is a duplicate, so I remove it.
#!/usr/bin/perl
use warnings;
use strict;
use Digest::MD5;
use Mac::Glue;
my $iphoto = Mac::Glue->new( "iPhoto" );
my $album = "photo library album";
my $library = $iphoto->prop( $album );
my $count = $library->prop( "photos" )->count;
print "My count is $count\n";
my $md5 = Digest::MD5->new();
my %digests = ();
PHOTO: for( my $index = $count; $index > 0; $index-- )
{
my $photo = $library->obj( photo => $index );
my $path = $photo->prop( "image_path" )->get;
next unless defined $path;
open my($fh), $path or do { warn "$path: $!\n"; next PHOTO };
$md5->addfile( $fh );
my $digest = $md5->hexdigest;
if( exists $digests{ $digest } )
{
print "$digests{ $digest }\n -->$path\n";
$photo->remove;
}
else
{
print "$index->$path: $digest\n";
$digests{ $digest } = $path;
}
$md5->reset;
}
I have to use the image_path property because it is the only property
that should always point to the right file. If I import two
files with the same name (and they have the same modification
year and month, because iPhoto sorts them into directories
based on that), the new file's name gets changed so it does
not overwrite the existing one. However, iPhoto retains
their original file name, without the extension, for the
title, which is the text I see with the photo when I select
"View Title" from iPhoto's View menu.

Conclusion
iPhoto 4 made a lot of improvements, but it is still slow enough that I manage it with a lot of batch processing through either AppleScript or an equivalent Perl script. The Mac::Glue can have a speed advantage over AppleScript, and I can use all of the power of Perl. Either way, once I script a task, I can walk away from my computer while iPhoto slogs along.
brian d foy is a Perl trainer for Stonehenge Consulting Services and is the publisher of The Perl Review.
Return to MacDevCenter.com.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 13 of 13.
-
Mac::Glue error
2007-05-29 12:52:35 Kreme [Reply | View]
-
Mac::Glue error
2007-05-29 13:59:02 brian d foy |
[Reply | View]
You have to create the glue for iPhoto first. See the Mac::Glue documentation or the link to the Mac::Glue article.
Good luck :)
-
That's honorable
2004-03-01 05:32:24 r_miller [Reply | View]
It's honorable that you have written this up for us, but Apple® has dropped the ball here. Users should not have to fire up a Perl script to get rid of extra images we don't want. I don't want to have to learn Perl to do this. I know Applescript, but should I have to use it for this.
Also, I am still using iPhoto 2, so it is a little disheartening to learn that my copy of iPhoto 4, which I have not loaded yet, may not increase the speed of iPhoto. I will give it a try once I decide to go with Panther full-time. Anyway, thanks for the article. -
That's honorable
2004-03-01 22:53:30 brian d foy |
[Reply | View]
I can understand your disappointment, but I have a different perspective. I would like to control other applications with whichever tool I decide to use, whether that is AppleScript, Perl, or something else. I cannot expect the application designers to anticipate everything I might want to do. Indeed, I would rather have the iPhoto developers concentrate on getting rid of the file-based storage system they use on the backend. -
That's honorable
2004-03-05 11:08:25 martshal [Reply | View]
I hate to bring discord to this happy gathering, but I will.
I think this reply demonstrates why a lot of *nix users have taken so long to recognize mac as a resonable alternative to other operating systems. It was (pre os x) an operating system that was SO user-centric that the users gave up any responsibility for what they generate. And, on the flip side. If an advance system user would ask how anything complicated was done with a mac. They were likely to to be met with "well why would you want to" or "who would do something like that". As if the only things worth doing were things that you could associate with pretty pictures.
As polite as this user may be, he still sounds like a user stomping his feet, demanding the developer think of everything he could possibly imagine, give him an eloquent graphical way to do it, and do it yesterday.
The reason that Unix tools are generally command-line and compos-able is that every application wasn't designed to do everything (except maybe emacs:-), and the USER took responsibility for what he generated, provided he had some chain of tools he could pipe together to do it.
-
RE
2004-03-02 13:47:37 r_miller [Reply | View]
Oh I understand that point. I do not think Apple should cripple a program so that advanced users cannot alter things to get what they want out of it. However, the ability to reduce image (size) and discard duplicates is in such demand it should have been included all along. That's all I was trying to say. Yeah, I am glad that Apple is leaving the backdoor open so developers can come up with new solutions. I just think Apple should do some of this on the front end. iPhoto I think is great, but its feature set is not much and now that we are paying for it, I expect more. -
RE
2004-03-05 14:20:42 schmiddi [Reply | View]
I think that one has to keep in mind that iPhoto is not a photo editing application. Apple always said and markets iPhoto as nothing more than a shoebox replacement, somewhere where you store your pictures. This includes turning them around, giving them names, maybe some minor picture enhancement, but if you want to do some picture changes then you need should get Photoshop or Photoshop Elements.
You said you are paying for it, and yes you do, but you not only get iPhoto but three other apps, most of all Garageband, I can understand if people don't need all four of them or only want one and think Apple should sell them individually for $10, but the price is not bad, and for what iPhoto does I think $10 to $15 is still a great deal. It is the easiest way to manage your digital pictures and that't its purpose, not to be some thing more. That Apple left the back door open for developers is great.
-
Use the Force, Luke.
2004-02-28 08:21:04 hhas [Reply | View]
tell application "iPhoto"
remove photos of photo library album whose width < 200 or height < 200
end tell -
Use the Force, Luke.
2005-01-21 11:07:30 pixel [Reply | View]
ok, so that might be shortened, and it MIGHT work faster, but it sure eats up memory for some reason. I killed iPhoto after it ran for 20 minutes, was using over 500MB of physical and 1.5GB of virtual memory... and it was steadily rising. Anyone have any ideas? -
Use the Force, Luke.
2007-02-15 17:19:36 paulskinner [Reply | View]
In this shorter version you're asking Applescript to first iterate through every item in your library and retrieve the properties from each one of them, then compare two values from those properties to your given values. After this you then have it store the items that match the range you test for into a list. It may be storing a reference (memory cheep) or it may store the value (memory expensive) of the database entry.
This is indeed very inefficient despite being concise.
This version is more memory and processor efficient. I ran it against my 22000 image library and checked 5500 in ten minutes.
tell application "iPhoto"
tell photo library album
set c to count of photos
repeat with myIndex from c to 1 by -1
tell photo myIndex
set {thisWidth, thisHeight} to {width, height}
end tell
if thisWidth < 200 or thisHeight < 200 then
remove photo myIndex
end if
end repeat
end tell
end tell
-
Use the Force, Luke.
2007-04-14 16:59:47 sttkaufman [Reply | View]
Great applescript thanks. Is it possible to do the "remove duplicate" script written in Perl as an Applescript. That is my real problem. Thousands of dulpcates from merging three semi-duplicate iphoto libraries on three machines. Now I have a mess on my hands. I would like to search for dulpicate images that may or may not have different names, and choose the largest res, keep that one, and throw out the others. Any thought or hints? Any one tried this in AppleScript?
Stephen -
Use the Force, Luke.
2007-02-15 17:26:30 paulskinner [Reply | View]
I forgot to mention that if you're running iPhoto on an older or lower-powered system you should go to the appearace pane of the iPhoto preferences and turn off drop shadow and outline options for a substantial speed improvement.
Seeing the dates on this article I hope you're back in the states Brian, but perhaps not... if you're still over there, keep your head down. -
Use the Force, Luke.
2004-03-01 22:58:44 brian d foy |
[Reply | View]
Thanks for the reduction. :)
I'll have to test this to see if it comes out faster, and I suspect it will.






And sure enough, there is no iPhoto in that location
This is right after doing a perl -MCPAN -e 'install "Mac::Glue"' which succeeded, so Mac::Glue is installed.