Wget examples for a Linux newbie

By David Pratt / Tags: , , / 2 Comments / Published: 06-05-11

Having ditched Windows entirety at the start of the year and made the switch to Linux (Ubuntu flavour), I’ve discovered a range of tools that would have come in very handy in my days as a Windows user. One of those tools is called wget. Wget is basically a bit of software that allows you to download files from webservers via the command line. That in itself doesn’t sound very exciting, but when you start wielding some of its options, you can do some interesting things with it. To showcase some of the things that wget can do, here is a collection of one-liners that you might find interesting or useful – I haven’t come up with them all myself, mostly collected them from around the net from forums and places such as command line fu:

[To understand what the options after the wget command, you’ll need to refer to the wget documentation]

Download a single file

Start with an easy one!

wget http://www.example.com/archive.zip

Download an entire website

If you don’t want to be courteous, then you can ignore the –random-wait switch if you don’t mind running the risk of getting banned. If you only want to download the site to a certain depth, then you can use the switch -l followed by a number to indicate the depth e.g. “-l 2” to a depth level of 2. Downloading an entire website can also be achieved using the -mirror parameter.

wget --random-wait -r -p -e robots=off -U mozilla http://www.example.com

Download an entire ftp directory using wget

Handy if you don’t want to spark up an FTP client.

wget -r ftp://username:password@ftp.example.com
wget --ftp-user=username --ftp-password=password example.com

Check for broken links using wget as a spider

This is the command line equivalent of using a Windows based tool called Xenu Link Sleuth. It will spider an entire website, ignoring the robots.txt and generate a log file of all broken links. Handy.

wget --spider -o wget.log -e robots=off --wait 1 -r -p http://www.example.com

Get server information

wget -S -q -O - http://www.example.com | grep ^Server

Diff remote webpages using wget

diff <(wget -q -O - http://www.example.com) <(wget -q -O - http://www.example2.com)

Schedule a download

If you've got a big file to download, you could schedule getting it in this manner. Couple this command with a cron job and you could feasibly create a snapshot of a website at given time intervals.

echo 'wget http://www.example.com' | at 01:00

Category: Tech

Tags: , ,

Posted: on May 6th, 2011 at 9:42 am.

Feeds: RSS 2.0

2 Responses to “Wget examples for a Linux newbie”

Mike June 26th, 2012 at 4:36 am

Good stuff.

Do you know how to set Wget to download a web page behind Digest Authentication? I just keep getting 401 errors.

John August 15th, 2014 at 1:00 am

Your –random-wait example lacks a –wait=.

According to the –random-wait man page, “This option causes the time between requests to vary between 0 and 2*wait seconds, where wait was specified using the –wait option, in order to mask wget’s presence…”.

Leave a reply