The satisfaction of a learning week!

The satisfaction of a learning week!

Every so often you get given a job, usually an incredibly repetitive, boring and tedious job that just calls out for automating even if the job is a one-off. Most times you will start investigating and soon realise that by the time you get anywhere with code, etc that automating the process will take longer than do the job by hand. In this case however, it was soon apparent that not only would it be quicker to automate it, but that actually the whole job would be easier as well.

We were recently given a whole bundle of PDF files that needed uploading to a customers website, easy you might think but no the PDF itself needed to be viewable directly on the website and be broken up into segments across different pages. In addition the PDF was made up of scanned images from handwritten work.

The easiest way to do this was to extract individual images for each page of every PDF file which after a very brief check of various editors didn’t seem easy at all so I fell back on some old favorites here – some linux command line tools, in this case ImageMagick.

convert -density 300 MyFile.pdf FileImages/page-%d.png

That command runs through 1 PDF and dumps out a single PNG file for each page. I did find that it needed quite a bit of memory and stopped before finishing some of the larger PDF’s

convert -density 300 MyFile.pdf[0-14] FileImages/page-%d.png

This command adds a limiter so we only do the first 15 pages, breaking it down this way allowed easy processing of all the files.

The resulting images can be rather large and as we want these on a web page eventually we want some optimisation of the resulting images

mogrify -path $3 -filter Triangle -define filter_support=2 -thumbnail $2 -unsharp 0.25x0.08+8.3+0.045 -dither None -posterize 136 -quality 82 -define jpeg:fancy-upsampling=off -define png:compression-filter=5 -define png:compression-level=9 -define png:compression-strategy=1 -define png:exclude-chunk=all -interlace none -colorspace sRGB $1

That is a command that takes variables from the command line and compresses and re-sizes your images. Thanks go to Dave Newton for that one (and this next bit). Quite a lot of work to run through individual files for the loads we had. However Dave had a few more tips as well:

smartresize() {
   mogrify -path $3 -filter Triangle -define filter_support=2 -thumbnail $2 -unsharp 0.25x0.08+8.3+0.045 -dither None -posterize 136 -quality 82 -define jpeg:fancy-upsampling=off -define png:compression-filter=5 -define png:compression-level=9 -define png:compression-strategy=1 -define png:exclude-chunk=all -interlace none -colorspace sRGB $1
}

On Linux in your .bashrc file (if you’re using Bash obviously) add the above and you can us:

smartresize MyFile.png 750 smaller/

Which will take your file a resulting width and chuck it out to the smaller directory.

With a smalle Bash script:

for FILE in *;do smartresize $FILE 750 smaller/;done

We can then run through a whole folder in one go. The files are compressed much better than a lot of automated tools (thanks Dave) and I now have a series of tools for automating things in the future.

Obviously nothing new to a lot of people, but until now I wasn’t aware you could add functions to your .bashrc file for easier/shorter commands. It’s always good when you learn something new like this!

Share this post