PDF

PDF to PostScript conversion: pdf2ps versus pdftops

5 February, 2010 - 12:16

Occasionally, I have to convert a PDF file to Postscript (e.g. for subsequent processing with some PostScript utility). In the Linux/command line area, I know two options: pdf2ps and pdftops. I also know that one of the two sucks has some issues and the other is better. But because their names are so close I can't manage to remember which one to take. This post should put an end to that!

[Spoiler alert and a questionable mnemonic: pdftops is da top.]

Flag a PDF file as binary for Subversion

11 December, 2008 - 01:05

Sometimes, Subversion thinks that a PDF is a text file, instead of binary data. This can hurt during commits or diffs, because Subversion tries to do textual diffs with binary data.

Solution: you can explicitly flag the file as PDF data and Subversion will handle it as binary from then:

svn propset svn:mime-type application/pdf yourPDFfile.pdf

Extract images from a PDF document

27 January, 2008 - 20:06

When you want to extract a bitmap image from a PDF document, it is tempting to do the "print screen" trick. The drawback of this approach is that you'll inevitably lose quality: the image pixels will typically not map to your screen's pixels in a one to one fashion because of the decimation/resampling/scaling (or even rotation) when viewing the PDF document.

There are probably a lot of tools extract the bitmap image correctly out there. I guess this functionality is built in in Adobe Acrobat reader. But if you're in my situation (no desire to use Adobe's bloat) or you just need a small handy command line tool for linux (or other "unixes"): try pdfimages. It's part of the xpdf package, which is probably available for all major linux distributions.

Usage is very straightforward:

pdfimages -j foo.pdf  bar

This will extract all images from foo.pdf and save them in JPEG format (option -j) to bar-000.jpg, bar-001.jpg, bar-002.jpg, etc.

Inspired by http://www.boekhoff.info/?pid=linux&tip=extract-images-from-pdf-files

Extracting fonts from PDF's

19 June, 2006 - 15:56
Categories:

Thanks to Planet Ubuntu NL I found a blog entry by Pascal de Bruijn with a hack to extract fonts from a PDF, using pdftops and FontForge. When I have some time, I definitely should try this.