What it is
dbt (data build tool) is a command-line interface that enables data analysts and engineers to transform data in their warehouse more effectively by applying software engineering best practices to SQL-based analytics code. You reach for dbt when you need to build reliable, maintainable, and testable data transformations in your data warehouse.
Installation
Linux
# Using pip (recommended)
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.
# Using bash (for specific versions, less common)
curl -L https://raw.githubusercontent.com/dbt-labs/dbt-core/main/scripts/install.sh | bash
Mac
# Using pip (recommended)
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.
# Using Homebrew
brew install dbt-core # Note: Homebrew might lag behind pip for the latest adapter versions.
brew install dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.
Windows
# Using pip (recommended)
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-redshift, dbt-bigquery, etc.
Core Concepts
- Models: SQL
SELECTstatements that define your data transformations. These are typically saved as.sqlfiles in yourmodels/directory and dbt compiles them into tables or views in your data warehouse. - Sources: Definitions of raw data tables in your data warehouse that you’re bringing into dbt. These are typically defined in
.ymlfiles and allow you to document and test your raw data. - Seeds: CSV files in your
data/directory that dbt loads directly into your data warehouse as tables. Useful for small, relatively static datasets like country codes or currency lists. - Tests: Assertions about your data. dbt has built-in tests (like
uniqueandnot_null) and allows custom SQL tests. - Documentation: Descriptions for models, columns, sources, etc., written in Markdown and defined in
.ymlfiles. dbt can compile this into a website. - Project: A directory containing your dbt models, tests, documentation, and configuration files (
dbt_project.yml). - Profile: A configuration file (
profiles.yml) that tells dbt how to connect to your data warehouse and specifies settings for different environments (e.g., dev, prod). - Manifest: A JSON file (
manifest.json) generated by dbt that contains a comprehensive map of your project’s resources (models, sources, tests, etc.) and their relationships. - Run Results: A JSON file (
run_results.json) generated by dbt after a run, containing details about the execution of each model and test. - Artifacts: A general term for the output files generated by dbt after a run, including
manifest.json,run_results.json,catalog.json, and logs.
Commands / Usage
Project Initialization and Management
dbt init <project_name>Initialize a new dbt project in a directory named<project_name>.dbt init my_dbt_projectdbt debugRun a series of checks to ensure dbt is configured correctly and can connect to your data warehouse.dbt debugdbt listList all resources in your dbt project (models, tests, sources, etc.).dbt list # List only models dbt list --resource-type model # List models tagged with 'marketing' dbt list --select tag:marketingdbt seedLoad data from CSV files in yourdata/directory into your data warehouse.dbt seed # Seed only specific CSV files dbt seed --select my_users.csvdbt cleanRemove artifact directories (target/,logs/) and compiled SQL files.dbt clean
Running Transformations and Tests
dbt runCompile SQL and execute your dbt models in the data warehouse.dbt run # Run only models tagged with 'staging' dbt run --select tag:staging # Run a specific model dbt run --select my_model # Run models that depend on 'my_upstream_model' dbt run --select my_upstream_model+ # Run models that are upstream of 'my_downstream_model' dbt run --select +my_downstream_model # Run models that are upstream or downstream of 'my_model' dbt run --select my_model+ +my_model # Run models in a specific directory dbt run --select models/marts/core/ # Run models excluding those tagged with 'deprecated' dbt run --exclude tag:deprecated # Run models that haven't run successfully yet in this invocation dbt run --fail-fast # Select models by their package name dbt run --select dbt_project_evaluator # Select models by directory and exclude another dbt run --select models/staging/ --exclude models/staging/legacy/ # Run models with a specific state (e.g., only those that have changed) dbt run --state <path_to_previous_run_artifacts>dbt testRun tests defined in your project.dbt test # Run tests for a specific model dbt test --select my_model # Run only custom SQL tests dbt test --resource-type test --data # Run only generic tests (unique, not_null, etc.) dbt test --resource-type test --exclude config:materialized:view # Run tests for models tagged with 'finance' dbt test --select tag:finance # Run tests for models upstream of 'my_model' dbt test --select +my_model # Run tests for models downstream of 'my_model' dbt test --select my_model+ # Run tests for models that have changed since the last run dbt test --state <path_to_previous_run_artifacts>dbt buildRun models, tests, seeds, and snapshots, in the correct order.dbt build # Build only models and their downstream dependencies dbt build --select my_model+ # Build models tagged with 'prod' and their tests dbt build --select tag:prod --resource-type model,test
Documentation and Catalog
dbt docs generateGenerate the documentation website for your dbt project.dbt docs generatedbt docs serveServe the documentation website locally.dbt docs serve # Serve on a different port dbt docs serve --port 8081dbt compileCompile SQL files into executable SQL for your data warehouse, but do not run them. Useful for debugging or inspecting generated SQL.dbt compile # Compile a specific model dbt compile --select my_modeldbt snapshotRun snapshot configurations to track historical changes in your data.dbt snapshot # Snapshot only a specific snapshot definition dbt snapshot --select my_snapshot_config
Project Configuration and State
dbt depsInstall dependencies specified in yourpackages.ymlfile.dbt depsdbt state compareCompare the state of two dbt projects (e.g., development vs. production).dbt state compare open <path_to_prod_artifacts> <path_to_dev_artifacts>
Common Patterns
-
Running only changed models:
dbt run --state path/to/previous/run/artifacts(Requires
manifest.jsonandrun_results.jsonfrom a previous run, typically from your CI/CD environment.) -
Running models and then testing them:
dbt run --select my_model && dbt test --select my_model -
Building a specific model and its downstream dependencies, then testing:
dbt build --select my_model+ -
Generating documentation and serving it:
dbt docs generate dbt docs serve -
Viewing compiled SQL for a model:
dbt compile --select my_model # Then inspect the compiled SQL in the 'target/compiled/<project_name>/models/' directory -
Running models in development with a specific target:
# Assuming 'profiles.yml' has a 'dev' target profile dbt --profile my_project_profile --target dev run -
Creating a staging model from a source:
-- models/staging/stg_users.sql select id as user_id, first_name, last_name, email, created_at, updated_at
{% raw %}
from {{ source('raw_data', 'users') }}
{% endraw %}
yaml
# models/staging/schema.yml
version: 2
sources: - name: raw_data database: my_database schema: public tables: - name: users identifier: users_table # Optional: if the table name in the warehouse is different
models: - name: stg_users columns: - name: user_id description: "Unique identifier for the user." tests: - unique - not_null - name: first_name description: "The user’s first name." # … other columns ```
- Creating a fact table (mart) from staging models:
-- models/marts/fct_orders.sql select o.order_id, o.order_date, o.amount, u.user_id, u.first_name, u.last_name
{% raw %} from {{ ref('stg_orders') }} as o {% endraw %} {% raw %} join {{ ref('stg_users') }} as u {% endraw %} on o.user_id = u.user_id where o.status = 'completed' ```
Gotchas
profiles.ymllocation: By default, dbt looks forprofiles.ymlin~/.dbt/. If it’s not there or you have multiple profiles, you might need to specify it usingdbt --profile <profile_name> ....- Adapter installation:
dbt-coreis the engine, but you need a specific adapter for your data warehouse (e.g.,dbt-postgres,dbt-snowflake). Ensure you install both. dbt runvs.dbt build:dbt runonly executes models.dbt buildis more comprehensive and includes running seeds, snapshots, and tests in the correct order. Usebuildfor most end-to-end runs. {% raw %}- Materializations: By default, dbt creates models as views. To create tables, incremental models, or ephemeral models, you need to specify
materialized='table',materialized='incremental', ormaterialized='ephemeral'in your model’s.ymlconfiguration or in the model’s SQL file using{{ config(...) }}. {% endraw %} {% raw %} - Dependencies and graph:
dbt runrespects model dependencies. Ifmodel_b.sqlreferences{{ ref('model_a') }},model_awill be run beforemodel_b. The+and~selectors are crucial for managing these dependencies. {% endraw %} dbt cleandeletes artifacts: Be aware thatdbt cleanremoves thetarget/directory where compiled SQL,manifest.json, andrun_results.jsonare stored. This is useful for a clean slate but can remove files needed for state comparison if not managed carefully.dbt seedbehavior:dbt seedby default drops and recreates tables. Usedbt seed --full-refreshfor explicit full refreshes, or consider using incremental loading strategies for larger datasets.dbt test --state: This feature relies on having accuratemanifest.jsonandrun_results.jsonfrom a previous, successful run. Ensure these artifacts are preserved in your CI/CD pipeline.- Database/Schema configuration: Ensure your
profiles.ymlcorrectly specifies thedatabaseandschemawhere dbt should create resources. These are often environment-specific.