wget Download Tool

wget cheatsheet — download files from the command line. wget URL, wget -O name URL, wget -c resume, wget -r recursive site mirror, wget --user/--password for auth.

6 min read

wget

What it is

wget is a command-line utility for downloading files from the internet over HTTP, HTTPS, and FTP. It’s ideal for recursive downloads, mirroring websites, or simply fetching single files reliably.

Installation

Linux

sudo apt update && sudo apt install wget  # Debian/Ubuntu
sudo yum install wget                  # Fedora/CentOS/RHEL
sudo dnf install wget                  # Newer Fedora
sudo pacman -S wget                    # Arch Linux

macOS

Using Homebrew:

brew install wget

Windows

Download the executable from the GNU Wget for Windows project and add its directory to your system’s PATH.

Core Concepts

  • Recursive Download: wget can download entire websites or directory structures by following links.
  • Mirroring: It can create a local copy of a remote site, preserving directory structure and downloading only newer files.
  • Resuming Downloads: If a download is interrupted, wget can continue from where it left off.
  • Robots.txt: By default, wget respects the robots.txt file on web servers, which specifies which parts of a site crawlers should not access.

Commands / Usage

Downloading Single Files

  • Download a file and save it with its original name:

    wget https://example.com/files/document.pdf
    

    Downloads document.pdf from the specified URL.

  • Download a file and save it with a different name:

    wget -O report.pdf https://example.com/files/document.pdf
    

    Downloads document.pdf and saves it locally as report.pdf.

  • Download a file and continue a previous download:

    wget -c https://example.com/large_archive.zip
    

    If large_archive.zip was partially downloaded, this command will resume the download.

  • Download a file without showing progress:

    wget -q https://example.com/script.sh
    

    Downloads script.sh silently.

  • Download a file and timestamp it:

    wget --timestamping https://example.com/latest_data.csv
    

    Downloads latest_data.csv only if the remote file is newer than the local copy (if it exists).

Downloading Multiple Files

  • Download multiple files from a list:

    wget -i urls.txt
    

    Downloads all URLs listed in the urls.txt file, one URL per line.

  • Download files matching a pattern (globbing):

    wget https://example.com/images/*.jpg
    

    Downloads all files ending with .jpg in the images directory.

Recursive Downloads and Mirroring

  • Recursively download a website (basic):

    wget -r https://example.com/blog/
    

    Downloads the blog directory and its contents recursively.

  • Recursively download a website and follow links across hosts (use with caution):

    wget -r -H https://example.com/
    

    Downloads recursively and follows links to other hosts.

  • Download a website and limit recursion depth:

    wget -r -l 2 https://example.com/
    

    Recursively downloads files up to 2 levels deep from the root of example.com.

  • Mirror a website locally:

    wget --mirror https://example.com/
    

    Downloads the entire site, preserving directory structure, timestamps, and ignoring robots.txt (if not specified otherwise). Equivalent to -r -N -l inf --no-remove-listing.

  • Download a website, preserving directory structure and only downloading newer files:

    wget -nv -N -r -l inf --no-remove-listing https://example.com/
    

    Non-verbose, timestamping, infinite recursion, no removal of listing files, preserves structure.

  • Download only HTML files recursively:

    wget -r -A "*.html" https://example.com/
    

    Recursively downloads files from example.com that end with .html.

  • Download recursively, excluding certain file types:

    wget -r -R "*.mp3,*.wav" https://example.com/music/
    

    Recursively downloads from music/ but excludes .mp3 and .wav files.

Controlling Output and Behavior

  • Save to a specific directory:

    wget -P /path/to/save/ https://example.com/files/data.zip
    

    Saves data.zip into /path/to/save/.

  • Download and convert links for local viewing (span-doc):

    wget --span-hosts --convert-links --adjust-extension --page-requisites --no-parent https://example.com/
    

    Downloads a site, converts links to work offline, adjusts extensions (e.g., .php to .html), downloads required files (images, CSS), and doesn’t go up to parent directories.

  • Set user agent:

    wget --user-agent="MyCustomAgent/1.0" https://example.com/api/data
    

    Sends a custom User-Agent header with the request.

  • Use basic authentication:

    wget --user=myuser --password=mypassword https://example.com/protected/file.txt
    

    Provides username and password for HTTP basic authentication.

  • Limit download speed:

    wget --limit-rate=200k https://example.com/large_file.iso
    

    Limits the download speed to 200 kilobytes per second.

  • Retry failed downloads:

    wget --tries=5 --waitretry=5 https://example.com/unreliable_file.tar.gz
    

    Retries download up to 5 times, waiting 5 seconds between retries.

  • Download from FTP:

    wget ftp://ftp.example.com/pub/file.txt
    

    Downloads file.txt from an FTP server.

  • Download from FTP with anonymous login:

    wget --ftp-user=anonymous --ftp-password=user@example.com ftp://ftp.example.com/pub/data.zip
    

    Logs into an FTP server using anonymous credentials.

  • Turn off robots.txt checking:

    wget --execute robots=off https://example.com/
    

    Ignores the robots.txt file on the server.

  • Turn off timestamping:

    wget -N https://example.com/current_version.txt
    

    Downloads the file regardless of its modification time compared to the local file.

Common Patterns

  • Download a website for offline viewing:

    wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.com/
    

    This is a comprehensive command to create a local, browsable copy of a website.

  • Download all images from a webpage:

    wget -nd -r -l 1 -A jpg,jpeg,png,gif https://example.com/gallery/
    

    Recursively downloads from /gallery/ (depth 1), accepts only image files, and saves them without creating a directory structure (-nd).

  • Download a file and pipe its content to another command:

    wget -qO- https://example.com/config.yaml | yq eval '.port' -
    

    Downloads config.yaml silently (-q), outputs its content to stdout (-O-), and pipes it to yq to extract the port value.

  • Download a series of numbered files:

    wget https://example.com/logs/log_{01..10}.txt
    

    Downloads log_01.txt through log_10.txt.

  • Download a file and save it to /tmp:

    wget -P /tmp/ https://example.com/archive.tar.gz
    

    Saves the downloaded file directly into the /tmp directory.

  • Download a file only if it’s newer than the local copy:

    wget -N https://example.com/data.json
    

    If data.json already exists locally, wget will check the server’s modification time and only download if the remote file is newer.

Gotchas

  • robots.txt by default: wget respects robots.txt. If you are trying to download content that is disallowed by robots.txt, you’ll need to use --execute robots=off. Be mindful of website terms of service.
  • Infinite Recursion: Without depth limits (-l), recursive downloads can run for a very long time and consume significant bandwidth and disk space. Always specify a reasonable depth or use --no-parent.
  • --mirror behavior: --mirror is a shortcut for -r -N -l inf --no-remove-listing. It implies recursive download, timestamping, infinite depth, and keeping .listing files.
  • User Agent Spoofing: Some websites block default wget user agents. You might need to use --user-agent to mimic a browser.
  • HTTPS Certificates: For self-signed or untrusted HTTPS certificates, you might need --no-check-certificate, but this is a security risk and should be used with extreme caution.
  • Encoding Issues: Sometimes filenames with special characters or non-ASCII characters can be problematic. wget has options like --restrict-file-names=windows or --restrict-file-names=unix to help, but it’s not always perfect.
  • FTP Passive Mode: wget defaults to passive FTP mode, which is usually required when behind a firewall. If you encounter FTP connection issues, you might need to experiment with --ftp-pasv or --ftp- புர (though the latter is less common).