Creating JPG images from PDF's using Perl and ImageMagick

One of the things that has always bugged me is the excessive bandwidth that PDF's use when presenting schematics. We also have a fair number of users that aren't computer literate enough to understand what Acrobat Reader is, although this seems to be less of an issue lately. Many of our vendors only supply PDF's, and with the 1000's and 1000's of schematics we have it's just not a good option to manually manipulate each one. Enter ImageMagick.


ImageMagick is available for many languages, but I prefer using Perl to manipulate the images. Most of the concepts will be the same regardless of language, even if the syntax is changed. So, to begin, I'll include the module and assign the path I want to convert files from. I've also bolded any items you may want to change for your implementation in all of the code snippets below.
use Image::Magick;
use IO::Dir;
my $path = [path to images];
tie %dir, IO::Dir, $path;
At this point, we've got a few things done. The ImageMagick module is going to be used, and I have a hash tied to the path for filenames. Now, we can start doing some processing within that directory.
foreach (keys %dir) {
my $file = $_;
my $pImg = new Image::Magick;
my $jpg = $file; $jpg =~ s/\.pdf$/\.jpg/i;
next if (-e "$path$jpg");
At this point, we've got the pointer to our Image object set up and know some filenames. Now, we need to figure out the density to open this file. I suggest opening it 3 - 5 times your final resolution, so there's a simple way to get that figured out. This is our next chunk of code.
my ($width, $height, $size, $format) = $pImg->Ping("$path$file");
my $density = sprintf("%d", 16500 / $width);
$density = qq(60) if ($density < 60);
$density = qq(250) if ($density > 250);
$density = $density . 'x' . $density;
This gives us the dimensions and uncompressed size of the image. One item to note here: the dimensions are for the first page and not for the entire document. But since we're working with a select width that we'd like to end up at, it doesn't really matter. I chose 550 as our ideal width to fit on the site, so the calculation is 550 (end width) * 3 * 10 (to get the decimal point in the right spot), or 16500. We divide that by the width to get the density factor we need to use to open it at about 3x the end result size needed, then format it like "200x200". I also added in a minimum of 60x60 and a maximum of 250x250 to make sure it's sufficient quality but doesn't use too much memory. You'll want to experiment with these numbers just a bit.

Now, we want to set the density on our Image pointer object and open all pages of the file, then stack them if there is more than one page. This is actually pretty simple.
$pImg->Set(density => "$density");
if ($pImg->[1]) {
$pImg->Append(stack => 'true');
$pImg = $pImg->Append();
next unless ($pImg->[0]);
The last line here checks to make sure we've got at least an ordinal page, otherwise the following lines of code will die. Now that we've got the image loaded and all the pages stacked on top of each other, it's time to manipulate. Oh, goody!

We're going to start by trimming the whitespace from the edges, then make sure it's RGB (most of our schematics are B&W) so we can add text of our domain to the newly created image. We'll also Despeckle, Sharpen, and adjust the Contrast before changing it to a JPG and resizing.
$pImg->Quantize(colorspace => 'RGB');
$pImg->Set(magick => 'jpg', compression => 'JPEG', quality => '51');
($height, $width) = $pImg->Get('rows', 'columns');
my $newwidth = 550;
my $newheight = sprintf("%d", $newwidth / $width * $height);
$pImg->Resize(width => $newwidth, height => $newheight, blur => '1', filter => 'Box');
$pImg->Annotate(text => '', align => 'Left', x => $newwidth, y => $newheight - 15, fill => 'Blue', rotate => '270', pointsize => '4');
Hey, we're done! Some of the settings in there may need to be tweaked for your purposes, but since most PDF are heavy text and we've already adjusted most of the settings for text, it should be pretty close to what will work for you. Let me know if you spot any settings that look even better, but we're using this with reasonably good results now. Happy converting!


Popular posts from this blog

Yii multiple select dropdownlist with default values

Audition results

Another audition