wget to mass download files

Linux

Author

Vinh Nguyen

Published

July 27, 2010

Sometimes I want to download all files on a page. The flashgot plugin works, but it involves clicking which can be a pain if you have a lot of pages to download.

Recently I've been wanting to download pdf's off a page. Found out that I can do so with wget on the command line:

wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf' 'url'

However, the page I wanted to download off of has the same name for a lot of the pdf's. The above command should not overwrite files (.1, .2, …, is appended). However, the exception list (pdf) deletes these appended files. The issue is brought up here. To get around this, do

wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' 'url'

Note that it is best to but the url in quotes since I was having an issue where the same files were being downloaded.

I use wget more in downloading from now on!

Also, check here for an example with cookies and referrers.

UPDATE 7/27/2010

Suppose I want to download a list of links that differ very little (say a number). For example, http://example.com/whatever1 (-whatever100). A simple bash script with wget:

wget_script.sh
#! /usr/bin/env bash

## wget "$1.fmatt" -O fmatt.pdf
## wget "$1.bmatt" -O bmatt.pdf
for chapnum in `seq 1 $2`; do
 wget "$1$chapnum" -O ch$chapnum.pdf
 ##echo $1$chapnum
done

I can now do wget_script.sh "http://example.com/whatever" 100 to download the 100 files.