## Annotate or write on top of a pdf file

Sometimes I need to annotate a pdf file, either to take notes or to fill it out as a form since I have ugly handwriting; for the latter, I’m referring to the case where the pdf file does not have form fields you can type in with Adobe Reader. This post describes some programs for annotating pdf files. Xournal is useful when you just want to write on top of a file and print out the content; it converts the content of the original pdf file to an image and saves the old and new contents as a pdf file. I use Xournal on my Asus T101MT touchscreen netbook. Okular seems good except that you can’t save the annotated content and original content as one file or even print the annotated content with the original content.

I just tested Foxit Phantom via Wine and I am able to annotate, save the new pdf file, and print the old+new contents. For form-filling, I recommend printing to pdf as I noticed some glitches with how contents are displayed in the saved pdf files.

## extracting pages in a pdf file

This post shows how one can use the pdftk command The PDF Toolkit to extract pages from a pdf file:

 <pre class="src src-sh">pdftk myDocument.pdf cat 1-9 15 17-19 26-end output removedPages.pdf


## Embed all fonts in a pdf file

I recently had to embed all fonts in a pdf file for electronic submission of my dissertation. Embedding of fonts is usually required for publishing academic articles as well.

I generate my pdf files mainly with LaTeX using the pdflatex command. This post shows how one can embed fonts generated by the tex file; this option was turned on (default?) on my Ubuntu 11.04 laptop with TeX Live (2009?). However, this does not embed fonts from pdf files (figures) that are included in the tex file. This post and this post shows how to embed all fonts used in a pdf file:

 <pre class="src src-sh">pdf2ps myfile.pdf


ps2pdf13 -dPDFSETTINGS=/prepress myfile.ps myfile_embedded.pdf

or

 <pre class="src src-sh">gs -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sPAPERSIZE=letter -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=myfile_embedded.pdf -f myfile.pdf


My embedFontsPdf.sh script:

 <pre class="src src-sh"><span style="color: #ff4500;">#</span><span style="color: #ff4500;">! /bin/</span><span style="color: #00ffff;">bash</span>


# http://colinm.org/tips/latex for file in “$@” do bn=basename "$file" NameNoExt=${bn%.} ## no extension Ext=${bn/./} ## extension http://www.linuxforums.org/forum/programming-scripting/128625-how-get-file-extension-without-dot.html gs -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sPAPERSIZE=letter -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=“${NameNoExt}_FontsEmbedded.pdf” -f “$file” done

Use pdffonts to check that all fonts are embedded.

## emacs keybindings in xpdf and xdvi

I’ve been using xpdf and xdvi for reading documents I’ve downloaded or my LaTeX-generated documents on the laptop more and more these days due to their speed. The one thing I require (desire) in all programs I use on a day to day basis, especially when I have to navigate the file, is to have emacs keybindings.

To do so for the two readers, first, add the following to ~/.Xresources:

 <pre class="src src-sh">! look<span style="color: #00ffff;"> in</span> man xdvi


xdvi.mainTranslations: #override Ctrlv: down-or-next()n Altv: up-or-previous()n Alt Shift <: goto-page(1)n Alt Shift >: goto-page()n Ctrlf: right(0.015)n Ctrlb: left(0.015)n Ctrln: down(0.015)n Ctrlp: up(0.015)n l: right(0.015)n h: left(0.015)n j: down(0.015)n k: up(0.015)n Ctrls: find()n xdvigeometry: 1350×700 xdvishrinkFactor: 4

! look in man xpdf and xpdfrc; ~/.xpdfrc xpdfgeometry: 1350×700 xpdfinitialZoom: width

Also, create and add the following to ~/.xpdfrc:

 <pre class="src src-sh">initialZoom width


continuousView yes bind ctrl-v any pageDown bind alt-v any pageUp bind alt-shift-< any gotoPage(1) bind alt-shift-> any gotoLastPage bind ctrl-n any scrollDown(16) bind ctrl-p any scrollUp(16) bind ctrl-f any scrollRight(16) bind ctrl-b any scrollLeft(16) bind h any scrollLeft(16) bind l any scrollRight(16) bind k any scrollUp(16) bind j any scrollDown(16) bind ctrl-s any find

Now I can at least navigate the file with emacs keybindings.

I would also like to get this to work on ghostview (for postscript files) or djview (for djvu files), but have yet to find out how to do so on these programs (or found programs for these formats that allow custom keybindings). I’ve written about using emacs’ doc-view to view all my files, but it can be sub-optimal:

• the conversion process on large files can take a long time,
• no “continuous” view mode,
• searching the text does not highlight the text,
• and the file doesn’t update when the dvi or pdf file is updated (or have a keybinding to update) for use with LaTeX (editing and updating files). UPDsATE: actually, you can refresh using the r keybinding; it works quite fast and stays on the same page you are on.

Please do let me know if you know how to make custom keybindings on ghostview or djview (or similar, fast programs). Also let me know if I am not aware of any features in doc-view that would make my life easier. Thanks!

## Foxit Phantom, an Adobe Acrobat alternative, for Linux via WINE

There are plenty of open source tools on Linux that can assist with pdf files. However, sometimes you just have to do something that requires Adobe Acrobat (e.g., editing pdf files). You can’t install Adobe Acrobat directly on Linux; I currently use Adobe Acrobat via a Windows virtual machine via VirtualBox. This post lists some popular pdf tools, and among them are tools from Foxit. The reader can be installed on Linux, but not Phantom, their Acrobat alternative. However, you can install the Windows version through WINE.

The installation file is an MSI file. Double-clicking an MSI doesn’t work. To launch it, this post shows:

 <pre class="src src-sh">wine msiexec /i myfile.msi


## Convert raster image to vector image

I wanted to convert some raster images (bmp, png, etc.) to vector images (svg, pdf, eps) so that they can be rescaled easily. I found Vector Magic, which used to be free from Stanford University but now cost a grip; however, they do allow some free conversions. This post let me know that Potrace is implemented in Inkscape. I just used Potrace directly since it is command line based.

 <pre class="src src-sh">potrace image.bmp <span style="color: #ff4500;">## </span><span style="color: #ff4500;">output image.eps by default</span>


potrace -s image.bmp ## output image.svg; transparent

Now I can easily convert basic artwork to vector images!

## Convert eps to pdf with the correct page size or boundaries

ps2pdf by itself always converted the ps file to pdf onto a different sized paper. I finally discovered how to get the right paper size via this post. Here is my eps2pdf.sh script:

 <pre class="src src-sh"><span style="color: #ff4500;">#</span><span style="color: #ff4500;">! /bin/</span><span style="color: #00ffff;">bash</span>


## look in comments ## http://opendevice.blogspot.com/2007/05/eps-to-pdf-how-to-avoid-clipping.html for file in “$@” do ps2pdf -dEPSCrop “$file” done

Sometimes I want to download all files on a page. The flashgot plugin works, but it involves clicking which can be a pain if you have a lot of pages to download.

Recently I’ve been wanting to download pdf’s off a page. Found out that I can do so with wget on the command line:

 <pre class="src src-sh">wget -U firefox -r -l1 -nd -e <span style="color: #eedd82;">robots</span>=off -A <span style="color: #ffa07a;">'*.pdf'</span> <span style="color: #ffa07a;">'url'</span>


However, the page I wanted to download off of has the same name for a lot of the pdf’s. The above command should not overwrite files (.1, .2, …, is appended). However, the exception list (pdf) deletes these appended files. The issue is brought up here. To get around this, do

 <pre class="src src-sh">wget -U firefox -r -l1 -nd -e <span style="color: #eedd82;">robots</span>=off -A <span style="color: #ffa07a;">'*.pdf,*.pdf.*'</span> <span style="color: #ffa07a;">'url'</span>


Note that it is best to but the url in quotes since I was having an issue where the same files were being downloaded.

Also, check here for an example with cookies and referrers.

UPDATE 7/27/2010

Suppose I want to download a list of links that differ very little (say a number). For example, http://example.com/whatever1 (-whatever100). A simple bash script with wget:

 <pre class="src src-sh">wget_script.sh


#! /usr/bin/env bash

## wget “$1.fmatt” -O fmatt.pdf ## wget “$1.bmatt” -O bmatt.pdf for chapnum in seq 1 $2; do wget “$1$chapnum” -O ch$chapnum.pdf ##echo $1$chapnum done

I can now do wget_script.sh "http://example.com/whatever" 100 to download the 100 files.