What it is
Weights & Biases (wandb) is a tool for experiment tracking, model versioning, and hyperparameter sweeps, and its CLI allows you to interact with these features from your terminal.
Installation
Linux / macOS
pip install wandb
Windows
pip install wandb
Core Concepts
- Run: A single execution of your training script. Each run is logged to Weights & Biases.
- Project: A collection of related runs. You can group experiments by project.
- Entity: Your username or team name on Weights & Biases. Runs are associated with an entity.
- Artifact: A versioned file or directory logged to Weights & Biases, often used for datasets or model checkpoints.
- Sweep: An automated hyperparameter optimization process.
Commands / Usage
Authentication
-
Login to your W&B account:
wandb loginThis will prompt you to enter your API key, which you can find on your W&B settings page.
-
Login with an API key (e.g., in CI/CD):
wandb login YOUR_API_KEYReplace
YOUR_API_KEYwith your actual W&B API key. -
Logout:
wandb logout
Starting and Managing Runs
-
Initialize a new run (usually done within your script):
import wandb wandb.init(project="my-image-classification", entity="my-username")This code snippet, when run as part of your training script, starts a new W&B run associated with the project "my-image-classification" and entity "my-username".
-
Resume a previous run:
wandb run resume RUN_IDReplace
RUN_IDwith the ID of the run you want to resume. This is typically used when your script was interrupted and you want to continue logging to the same run. -
View your current runs:
wandb runsLists your recent runs in the terminal.
-
View a specific run:
wandb runs --project my-image-classification --entity my-usernameLists runs within a specific project and entity.
-
Get a run’s ID:
wandb runs --project my-image-classification --entity my-username --limit 1 --order -created-atThis command retrieves the ID of the most recently created run in the specified project and entity.
-
View run details in the UI:
wandb sync --sync-localAfter running your script with
wandb.init(), this command can help synchronize any pending logs and provides a link to the run’s dashboard in your browser.
Artifacts (Datasets & Models)
-
Log an artifact (e.g., a dataset):
wandb artifact put my-dataset.tar.gz --name my-dataset --type dataset --description "My training dataset"Uploads
my-dataset.tar.gzas an artifact namedmy-datasetof typedataset. -
Log a model checkpoint as an artifact:
# Assuming your model is saved to models/best.pt wandb artifact put models/best.pt --name my-model --type model --description "Best model checkpoint"Uploads
models/best.ptas an artifact namedmy-modelof typemodel. -
Download an artifact:
wandb artifact get my-username/my-project/my-dataset:latestDownloads the latest version of the
my-datasetartifact from your project. -
List artifacts:
wandb artifact ls --entity my-username --project my-projectLists all artifacts for a given entity and project.
Sweeps (Hyperparameter Optimization)
-
Create a sweep configuration file: Create a YAML file (e.g.,
sweep.yaml):program: train.py method: random metric: name: val_loss goal: minimize parameters: learning_rate: min: 0.0001 max: 0.1 batch_size: values: [32, 64, 128] -
Create a sweep:
wandb sweep sweep.yamlThis command registers a new sweep with the W&B server based on your configuration. It will output a sweep ID.
-
Run a sweep agent:
wandb agent YOUR_SWEEP_IDReplace
YOUR_SWEEP_IDwith the ID obtained from thewandb sweepcommand. This agent will continuously fetch new hyperparameter configurations from the sweep and run yourtrain.pyscript with them. -
Run multiple agents:
wandb agent YOUR_SWEEP_ID --count 5Starts 5 agent processes to run the sweep in parallel.
Other Useful Commands
-
View your W&B settings:
wandb settingsDisplays your current W&B configuration, including API key, default entity, and default project.
-
Configure W&B settings:
wandb config learning_rate 0.01 wandb config project my-new-default-projectAllows you to set default configuration values for W&B.
-
Sync local runs:
wandb sync /path/to/local/wandb/dirUploads logged data from a local W&B run directory to the W&B server.
-
Pull local runs:
wandb pull RUN_IDDownloads the results of a specific run from the W&B server to your local machine.
Common Patterns
-
Logging metrics and hyperparameters within a script:
import wandb import random wandb.init(project="my-experiment") for epoch in range(10): loss = 1.0 / (epoch + 1) + random.random() * 0.1 accuracy = 0.8 + epoch * 0.01 wandb.log({"epoch": epoch, "loss": loss, "accuracy": accuracy}) wandb.log({"final_accuracy": accuracy}) # Log a final metric wandb.finish() # Mark the run as completeThis demonstrates how
wandb.log()is used within a Python script to send metrics and other data to the current W&B run. -
Using W&B to manage datasets for reproducible training:
# 1. Upload your dataset as an artifact wandb artifact put data/train.csv --name my-data --type dataset --description "Raw training data" # 2. In your training script, use the artifact import wandb import pandas as pd run = wandb.init(project="data-versioning-demo") artifact = run.use_artifact("my-data:latest") artifact_dir = artifact.download() train_df = pd.read_csv(f"{artifact_dir}/train.csv") # ... proceed with training using train_df run.finish()This pattern shows how to version datasets using W&B artifacts, ensuring that your training always uses a specific, known version of the data.
-
Running a hyperparameter sweep and monitoring it:
# 1. Define your sweep in sweep.yaml # 2. Create the sweep wandb sweep sweep.yaml # (Copy the sweep ID from the output) # 3. Start an agent to run the sweep wandb agent YOUR_SWEEP_ID # 4. In another terminal, monitor the sweep's progress wandb sweeps my-username/my-projectThis workflow illustrates the typical process of setting up, running, and monitoring hyperparameter optimization jobs.
-
Syncing local runs after an interruption: If your training script crashes, W&B typically saves logs locally. You can then sync them:
wandb sync /path/to/your/project/wandb/run-YYYYMMDD-HHMMSS-RUN_IDThis ensures that no logged data is lost.
Gotchas
wandb.init()must be called inside your script: The CLI commands likewandb loginare for authentication and configuration, but starting a W&B run itself is typically done programmatically within your Python code usingwandb.init().- Run ID vs. Sweep ID: Be careful to distinguish between a
RUN_ID(for a specific experiment execution) and aSWEEP_ID(for a hyperparameter optimization job). - Artifact Naming Conventions: While flexible, it’s good practice to use clear and consistent names for your artifacts (e.g.,
dataset-v1,model-resnet50-epoch-10). wandb.finish(): Explicitly callingwandb.finish()is good practice to ensure all logged data is flushed and the run is marked as completed, especially if you have complex control flow or are not relying on the script exiting naturally.- Default Project/Entity: If you don’t specify
projectorentityinwandb.init(), W&B will use your default settings configured viawandb loginorwandb config. If none are set, it might prompt you or use a generic project name. - Local Data Storage: By default, W&B logs are stored locally in a
wandb/directory within your project. You need to explicitly sync them usingwandb syncor ensure your script callswandb.finish()which usually handles syncing.