wget
What it is
wget is a command-line utility for downloading files from the internet over HTTP, HTTPS, and FTP. It’s ideal for recursive downloads, mirroring websites, or simply fetching single files reliably.
Installation
Linux
sudo apt update && sudo apt install wget # Debian/Ubuntu
sudo yum install wget # Fedora/CentOS/RHEL
sudo dnf install wget # Newer Fedora
sudo pacman -S wget # Arch Linux
macOS
Using Homebrew:
brew install wget
Windows
Download the executable from the GNU Wget for Windows project and add its directory to your system’s PATH.
Core Concepts
- Recursive Download:
wgetcan download entire websites or directory structures by following links. - Mirroring: It can create a local copy of a remote site, preserving directory structure and downloading only newer files.
- Resuming Downloads: If a download is interrupted,
wgetcan continue from where it left off. - Robots.txt: By default,
wgetrespects therobots.txtfile on web servers, which specifies which parts of a site crawlers should not access.
Commands / Usage
Downloading Single Files
-
Download a file and save it with its original name:
wget https://example.com/files/document.pdfDownloads
document.pdffrom the specified URL. -
Download a file and save it with a different name:
wget -O report.pdf https://example.com/files/document.pdfDownloads
document.pdfand saves it locally asreport.pdf. -
Download a file and continue a previous download:
wget -c https://example.com/large_archive.zipIf
large_archive.zipwas partially downloaded, this command will resume the download. -
Download a file without showing progress:
wget -q https://example.com/script.shDownloads
script.shsilently. -
Download a file and timestamp it:
wget --timestamping https://example.com/latest_data.csvDownloads
latest_data.csvonly if the remote file is newer than the local copy (if it exists).
Downloading Multiple Files
-
Download multiple files from a list:
wget -i urls.txtDownloads all URLs listed in the
urls.txtfile, one URL per line. -
Download files matching a pattern (globbing):
wget https://example.com/images/*.jpgDownloads all files ending with
.jpgin theimagesdirectory.
Recursive Downloads and Mirroring
-
Recursively download a website (basic):
wget -r https://example.com/blog/Downloads the
blogdirectory and its contents recursively. -
Recursively download a website and follow links across hosts (use with caution):
wget -r -H https://example.com/Downloads recursively and follows links to other hosts.
-
Download a website and limit recursion depth:
wget -r -l 2 https://example.com/Recursively downloads files up to 2 levels deep from the root of
example.com. -
Mirror a website locally:
wget --mirror https://example.com/Downloads the entire site, preserving directory structure, timestamps, and ignoring robots.txt (if not specified otherwise). Equivalent to
-r -N -l inf --no-remove-listing. -
Download a website, preserving directory structure and only downloading newer files:
wget -nv -N -r -l inf --no-remove-listing https://example.com/Non-verbose, timestamping, infinite recursion, no removal of listing files, preserves structure.
-
Download only HTML files recursively:
wget -r -A "*.html" https://example.com/Recursively downloads files from
example.comthat end with.html. -
Download recursively, excluding certain file types:
wget -r -R "*.mp3,*.wav" https://example.com/music/Recursively downloads from
music/but excludes.mp3and.wavfiles.
Controlling Output and Behavior
-
Save to a specific directory:
wget -P /path/to/save/ https://example.com/files/data.zipSaves
data.zipinto/path/to/save/. -
Download and convert links for local viewing (span-doc):
wget --span-hosts --convert-links --adjust-extension --page-requisites --no-parent https://example.com/Downloads a site, converts links to work offline, adjusts extensions (e.g.,
.phpto.html), downloads required files (images, CSS), and doesn’t go up to parent directories. -
Set user agent:
wget --user-agent="MyCustomAgent/1.0" https://example.com/api/dataSends a custom
User-Agentheader with the request. -
Use basic authentication:
wget --user=myuser --password=mypassword https://example.com/protected/file.txtProvides username and password for HTTP basic authentication.
-
Limit download speed:
wget --limit-rate=200k https://example.com/large_file.isoLimits the download speed to 200 kilobytes per second.
-
Retry failed downloads:
wget --tries=5 --waitretry=5 https://example.com/unreliable_file.tar.gzRetries download up to 5 times, waiting 5 seconds between retries.
-
Download from FTP:
wget ftp://ftp.example.com/pub/file.txtDownloads
file.txtfrom an FTP server. -
Download from FTP with anonymous login:
wget --ftp-user=anonymous --ftp-password=user@example.com ftp://ftp.example.com/pub/data.zipLogs into an FTP server using anonymous credentials.
-
Turn off
robots.txtchecking:wget --execute robots=off https://example.com/Ignores the
robots.txtfile on the server. -
Turn off timestamping:
wget -N https://example.com/current_version.txtDownloads the file regardless of its modification time compared to the local file.
Common Patterns
-
Download a website for offline viewing:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.com/This is a comprehensive command to create a local, browsable copy of a website.
-
Download all images from a webpage:
wget -nd -r -l 1 -A jpg,jpeg,png,gif https://example.com/gallery/Recursively downloads from
/gallery/(depth 1), accepts only image files, and saves them without creating a directory structure (-nd). -
Download a file and pipe its content to another command:
wget -qO- https://example.com/config.yaml | yq eval '.port' -Downloads
config.yamlsilently (-q), outputs its content to stdout (-O-), and pipes it toyqto extract theportvalue. -
Download a series of numbered files:
wget https://example.com/logs/log_{01..10}.txtDownloads
log_01.txtthroughlog_10.txt. -
Download a file and save it to
/tmp:wget -P /tmp/ https://example.com/archive.tar.gzSaves the downloaded file directly into the
/tmpdirectory. -
Download a file only if it’s newer than the local copy:
wget -N https://example.com/data.jsonIf
data.jsonalready exists locally,wgetwill check the server’s modification time and only download if the remote file is newer.
Gotchas
robots.txtby default:wgetrespectsrobots.txt. If you are trying to download content that is disallowed byrobots.txt, you’ll need to use--execute robots=off. Be mindful of website terms of service.- Infinite Recursion: Without depth limits (
-l), recursive downloads can run for a very long time and consume significant bandwidth and disk space. Always specify a reasonable depth or use--no-parent. --mirrorbehavior:--mirroris a shortcut for-r -N -l inf --no-remove-listing. It implies recursive download, timestamping, infinite depth, and keeping.listingfiles.- User Agent Spoofing: Some websites block default
wgetuser agents. You might need to use--user-agentto mimic a browser. - HTTPS Certificates: For self-signed or untrusted HTTPS certificates, you might need
--no-check-certificate, but this is a security risk and should be used with extreme caution. - Encoding Issues: Sometimes filenames with special characters or non-ASCII characters can be problematic.
wgethas options like--restrict-file-names=windowsor--restrict-file-names=unixto help, but it’s not always perfect. - FTP Passive Mode:
wgetdefaults to passive FTP mode, which is usually required when behind a firewall. If you encounter FTP connection issues, you might need to experiment with--ftp-pasvor--ftp- புர(though the latter is less common).