Extract images from a PDF document

27 January, 2008 - 20:06

When you want to extract a bitmap image from a PDF document, it is tempting to do the "print screen" trick. The drawback of this approach is that you'll inevitably lose quality: the image pixels will typically not map to your screen's pixels in a one to one fashion because of the decimation/resampling/scaling (or even rotation) when viewing the PDF document.

There are probably a lot of tools extract the bitmap image correctly out there. I guess this functionality is built in in Adobe Acrobat reader. But if you're in my situation (no desire to use Adobe's bloat) or you just need a small handy command line tool for linux (or other "unixes"): try pdfimages. It's part of the xpdf package, which is probably available for all major linux distributions.

Usage is very straightforward:

pdfimages -j foo.pdf  bar

This will extract all images from foo.pdf and save them in JPEG format (option -j) to bar-000.jpg, bar-001.jpg, bar-002.jpg, etc.

Inspired by http://www.boekhoff.info/?pid=linux&tip=extract-images-from-pdf-files