How do you check if an image file (tiff, psd, psb) is corrupted, other than looking at its thumbnail with Bridge, or opening it on Photoshop? With a small Python script and ImageMagick!
Background
The client of mine I work as a retoucher for had some problems with the so-called Data Migration (the dull, time consuming, and error prone process of transferring a lifetime backup from old, once very expensive external drives to a set of new, somehow still equally expensive external drives). As a result, he got some corrupted files here and there in the destination drives – that’s the reason why you migrate data: the source has insufficient capacity, it has become unstable, obsolete, or both combined. Problem is that we’re talking about several TB of data, mostly as .psb files (ranging from about 1 up to 20GB each), and it goes without saying that opening them all in Photoshop is not an option; nor you can trust Adobe Bridge thumbnails – provided that you’ve set the preferences to render previews for big files too – because it’s a manual process anyway. Even if I’m paid by the hour, staring at thumbs is not my preferred way to get blind. After some research, I’ve found no way (other than the one I’m about to describe) to check for psd/psb files corruption in an automated fashion. Which seems to me quite odd – if you have better, i.e. faster and/or simpler, solutions, please do suggest them to me in the comments below.
What you need
Python 3, and ImageMagick. Both will work either on Mac or Windows: I’ve no experience of them on the latter platform, so I will just assume that you will be successful in following the installation instruction provided in the official home pages. PC owners: read at least the part relative to ImageMagick. Mac users: read it all.
Python 3
If you’re on a Mac like me, you already have Python installed. Chances are that it is version 2.7, or another one but 3. Open the Terminal and type:
This is what I still get after having installed Python 3.6 myself, via Homebrew. I’m no Python expert, so it took me some Google time to understand that on a Mac you can still have the python
command pointing to the System’s old 2.7 version, even if you’ve freshly installed the new one. Solutions involve manually changing symlinks (power users advise against it), or using one of the available packages to create isolate Python environments (e.g. virtualenv, pyenv, etc. A list of them is found here). I couldn’t make neither of them to work in a reasonable amount of time, so I’ve resorted to simply use the python3
command, e.g.
ImageMagick
ImageMagick is a multiplatform, open source commandline utility that performs a huge amount of tasks on all kinds of image files. I’ve installed via Homebrew, but it turns out that, at least on the Mac, it doesn’t come by default with the proper Delegates (aka Libraries) to deal with .psb files, which is what I needed the most. Finding the proper way to do so proved almost impossible to me: while reading the source code documentation (the last thing I wanted to do was to compile it from the source), I’ve discovered that via Homebrew you can list all the possible installation options for a package:
So, after a first installation (without psb support), with no clear hint about the proper option(s) to use in my case, and even less spare time to test, I’ve chained them all – at least the seemingly appropriate ones, with little worries about being redundant. At all events, no one was watching me, nor would have ever known :-) The embarrassing line I’ve used is:
It worked, so I was a happy camper.
The Python Script
Which is far from perfect, but it does the job – I’m sure that a proper Python developer can make it much better: it comes from surgical copy&paste from various Google Search result, plus a very light editing on my side.
How it works
The basic is the identify
call (that comes from ImageMagick), which is set to -verbose
. This is what performs the check: the rest is just looping through the filesystem, looking for the appropriate file extension, and logging a message.
How to use it
Save this on a file with a .py
extension, and then run it with the python3
command on a terminal, e.g.
Before doing so, do change the content of the folderToCheck
variable with an actual folder on your disk (with absolute path), and the fileExtension
too: I’ve used .psb
, but you can change it to .psd
, .tif
, .jpg
, etc. As a result you’ll get a log in the Terminal; I’ve used a nifty, cheap application called Code Runner for such tests, and this is the result:
As you see, I’m just logging OK/ERROR with the path, very basic. What to do with this newly acquired piece of knowledge is up to you. Please note that:
- The script processes nested folders too.
- It is awfully slow and hungry: it eats CPU cycles and RAM. But it’s automatic, so heck!
- The file extension is case sensitive, so
".JPG"
is different from".jpg"
.
How to make it better
Few suggestions for the skilled Python developer (which I’m not, alas)
- Write on a log file instead of the console
- Keep track of the processing status and resume from there
- Display the advancement status (say, “image 34 of 320”)
- Make the file extension case insensitive.
If you know how to do any of this, please share your knowledge in the comments! Thank you!