wget to mass download files

Sometimes I want to download all files on a page. The flashgot plugin works, but it involves clicking which can be a pain if you have a lot of pages to download.

Recently I’ve been wanting to download pdf’s off a page. Found out that I can do so with wget on the command line:

 <pre class="src src-sh">wget -U firefox -r -l1 -nd -e <span style="color: #eedd82;">robots</span>=off -A <span style="color: #ffa07a;">'*.pdf'</span> <span style="color: #ffa07a;">'url'</span>

However, the page I wanted to download off of has the same name for a lot of the pdf’s. The above command should not overwrite files (.1, .2, …, is appended). However, the exception list (pdf) deletes these appended files. The issue is brought up here. To get around this, do

 <pre class="src src-sh">wget -U firefox -r -l1 -nd -e <span style="color: #eedd82;">robots</span>=off -A <span style="color: #ffa07a;">'*.pdf,*.pdf.*'</span> <span style="color: #ffa07a;">'url'</span>

Note that it is best to but the url in quotes since I was having an issue where the same files were being downloaded.

I use wget more in downloading from now on!

Also, check here for an example with cookies and referrers.

UPDATE 7/27/2010

Suppose I want to download a list of links that differ very little (say a number). For example, http://example.com/whatever1 (-whatever100). A simple bash script with wget:

 <pre class="src src-sh">wget_script.sh

#! /usr/bin/env bash

## wget “$1.fmatt” -O fmatt.pdf ## wget “$1.bmatt” -O bmatt.pdf for chapnum in seq 1 $2; do wget “$1$chapnum” -O ch$chapnum.pdf ##echo $1$chapnum done

I can now do wget_script.sh "http://example.com/whatever" 100 to download the 100 files.