dbt CLI

dbt CLI cheatsheet — run models, tests, seeds, snapshots, docs. dbt run --select models/staging, dbt test, dbt docs generate && dbt docs serve. Full reference.

8 min read

What it is

dbt (data build tool) is a command-line interface that enables data analysts and engineers to transform data in their warehouse more effectively by applying software engineering best practices to SQL-based analytics code. You reach for dbt when you need to build reliable, maintainable, and testable data transformations in your data warehouse.

Installation

Linux

# Using pip (recommended)
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.

# Using bash (for specific versions, less common)
curl -L https://raw.githubusercontent.com/dbt-labs/dbt-core/main/scripts/install.sh | bash

Mac

# Using pip (recommended)
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.

# Using Homebrew
brew install dbt-core # Note: Homebrew might lag behind pip for the latest adapter versions.
brew install dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.

Windows

# Using pip (recommended)
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.

Core Concepts

  • Models: SQL SELECT statements that define your data transformations. These are typically saved as .sql files in your models/ directory and dbt compiles them into tables or views in your data warehouse.
  • Sources: Definitions of raw data tables in your data warehouse that you’re bringing into dbt. These are typically defined in .yml files and allow you to document and test your raw data.
  • Seeds: CSV files in your data/ directory that dbt loads directly into your data warehouse as tables. Useful for small, relatively static datasets like country codes or currency lists.
  • Tests: Assertions about your data. dbt has built-in tests (like unique and not_null) and allows custom SQL tests.
  • Documentation: Descriptions for models, columns, sources, etc., written in Markdown and defined in .yml files. dbt can compile this into a website.
  • Project: A directory containing your dbt models, tests, documentation, and configuration files (dbt_project.yml).
  • Profile: A configuration file (profiles.yml) that tells dbt how to connect to your data warehouse and specifies settings for different environments (e.g., dev, prod).
  • Manifest: A JSON file (manifest.json) generated by dbt that contains a comprehensive map of your project’s resources (models, sources, tests, etc.) and their relationships.
  • Run Results: A JSON file (run_results.json) generated by dbt after a run, containing details about the execution of each model and test.
  • Artifacts: A general term for the output files generated by dbt after a run, including manifest.json, run_results.json, catalog.json, and logs.

Commands / Usage

Project Initialization and Management

  • dbt init <project_name> Initialize a new dbt project in a directory named <project_name>.
    dbt init my_dbt_project
    
  • dbt debug Run a series of checks to ensure dbt is configured correctly and can connect to your data warehouse.
    dbt debug
    
  • dbt list List all resources in your dbt project (models, tests, sources, etc.).
    dbt list
    # List only models
    dbt list --resource-type model
    # List models tagged with 'marketing'
    dbt list --select tag:marketing
    
  • dbt seed Load data from CSV files in your data/ directory into your data warehouse.
    dbt seed
    # Seed only specific CSV files
    dbt seed --select my_users.csv
    
  • dbt clean Remove artifact directories (target/, logs/) and compiled SQL files.
    dbt clean
    

Running Transformations and Tests

  • dbt run Compile SQL and execute your dbt models in the data warehouse.
    dbt run
    # Run only models tagged with 'staging'
    dbt run --select tag:staging
    # Run a specific model
    dbt run --select my_model
    # Run models that depend on 'my_upstream_model'
    dbt run --select my_upstream_model+
    # Run models that are upstream of 'my_downstream_model'
    dbt run --select +my_downstream_model
    # Run models that are upstream or downstream of 'my_model'
    dbt run --select my_model+ +my_model
    # Run models in a specific directory
    dbt run --select models/marts/core/
    # Run models excluding those tagged with 'deprecated'
    dbt run --exclude tag:deprecated
    # Run models that haven't run successfully yet in this invocation
    dbt run --fail-fast
    # Select models by their package name
    dbt run --select dbt_project_evaluator
    # Select models by directory and exclude another
    dbt run --select models/staging/ --exclude models/staging/legacy/
    # Run models with a specific state (e.g., only those that have changed)
    dbt run --state <path_to_previous_run_artifacts>
    
  • dbt test Run tests defined in your project.
    dbt test
    # Run tests for a specific model
    dbt test --select my_model
    # Run only custom SQL tests
    dbt test --resource-type test --data
    # Run only generic tests (unique, not_null, etc.)
    dbt test --resource-type test --exclude config:materialized:view
    # Run tests for models tagged with 'finance'
    dbt test --select tag:finance
    # Run tests for models upstream of 'my_model'
    dbt test --select +my_model
    # Run tests for models downstream of 'my_model'
    dbt test --select my_model+
    # Run tests for models that have changed since the last run
    dbt test --state <path_to_previous_run_artifacts>
    
  • dbt build Run models, tests, seeds, and snapshots, in the correct order.
    dbt build
    # Build only models and their downstream dependencies
    dbt build --select my_model+
    # Build models tagged with 'prod' and their tests
    dbt build --select tag:prod --resource-type model,test
    

Documentation and Catalog

  • dbt docs generate Generate the documentation website for your dbt project.
    dbt docs generate
    
  • dbt docs serve Serve the documentation website locally.
    dbt docs serve
    # Serve on a different port
    dbt docs serve --port 8081
    
  • dbt compile Compile SQL files into executable SQL for your data warehouse, but do not run them. Useful for debugging or inspecting generated SQL.
    dbt compile
    # Compile a specific model
    dbt compile --select my_model
    
  • dbt snapshot Run snapshot configurations to track historical changes in your data.
    dbt snapshot
    # Snapshot only a specific snapshot definition
    dbt snapshot --select my_snapshot_config
    

Project Configuration and State

  • dbt deps Install dependencies specified in your packages.yml file.
    dbt deps
    
  • dbt state compare Compare the state of two dbt projects (e.g., development vs. production).
    dbt state compare open <path_to_prod_artifacts> <path_to_dev_artifacts>
    

Common Patterns

  • Running only changed models:

    dbt run --state path/to/previous/run/artifacts
    

    (Requires manifest.json and run_results.json from a previous run, typically from your CI/CD environment.)

  • Running models and then testing them:

    dbt run --select my_model && dbt test --select my_model
    
  • Building a specific model and its downstream dependencies, then testing:

    dbt build --select my_model+
    
  • Generating documentation and serving it:

    dbt docs generate
    dbt docs serve
    
  • Viewing compiled SQL for a model:

    dbt compile --select my_model
    # Then inspect the compiled SQL in the 'target/compiled/<project_name>/models/' directory
    
  • Running models in development with a specific target:

    # Assuming 'profiles.yml' has a 'dev' target profile
    dbt --profile my_project_profile --target dev run
    
  • Creating a staging model from a source:

    -- models/staging/stg_users.sql
    select
        id as user_id,
        first_name,
        last_name,
        email,
        created_at,
        updated_at
    

{% raw %} from {{ source('raw_data', 'users') }} {% endraw %} yaml # models/staging/schema.yml version: 2

sources: - name: raw_data database: my_database schema: public tables: - name: users identifier: users_table # Optional: if the table name in the warehouse is different

models: - name: stg_users columns: - name: user_id description: "Unique identifier for the user." tests: - unique - not_null - name: first_name description: "The user’s first name." # … other columns ```

  • Creating a fact table (mart) from staging models:
    -- models/marts/fct_orders.sql
    select
        o.order_id,
        o.order_date,
        o.amount,
        u.user_id,
        u.first_name,
        u.last_name
    

{% raw %} from {{ ref('stg_orders') }} as o {% endraw %} {% raw %} join {{ ref('stg_users') }} as u {% endraw %} on o.user_id = u.user_id where o.status = 'completed' ```

Gotchas

  • profiles.yml location: By default, dbt looks for profiles.yml in ~/.dbt/. If it’s not there or you have multiple profiles, you might need to specify it using dbt --profile <profile_name> ....
  • Adapter installation: dbt-core is the engine, but you need a specific adapter for your data warehouse (e.g., dbt-postgres, dbt-snowflake). Ensure you install both.
  • dbt run vs. dbt build: dbt run only executes models. dbt build is more comprehensive and includes running seeds, snapshots, and tests in the correct order. Use build for most end-to-end runs. {% raw %}
  • Materializations: By default, dbt creates models as views. To create tables, incremental models, or ephemeral models, you need to specify materialized='table', materialized='incremental', or materialized='ephemeral' in your model’s .yml configuration or in the model’s SQL file using {{ config(...) }}. {% endraw %} {% raw %}
  • Dependencies and graph: dbt run respects model dependencies. If model_b.sql references {{ ref('model_a') }}, model_a will be run before model_b. The + and ~ selectors are crucial for managing these dependencies. {% endraw %}
  • dbt clean deletes artifacts: Be aware that dbt clean removes the target/ directory where compiled SQL, manifest.json, and run_results.json are stored. This is useful for a clean slate but can remove files needed for state comparison if not managed carefully.
  • dbt seed behavior: dbt seed by default drops and recreates tables. Use dbt seed --full-refresh for explicit full refreshes, or consider using incremental loading strategies for larger datasets.
  • dbt test --state: This feature relies on having accurate manifest.json and run_results.json from a previous, successful run. Ensure these artifacts are preserved in your CI/CD pipeline.
  • Database/Schema configuration: Ensure your profiles.yml correctly specifies the database and schema where dbt should create resources. These are often environment-specific.