What it is
The Airflow command-line interface (CLI) for managing and interacting with your Airflow environment, scheduling DAGs, and monitoring task execution.
Installation
Linux/macOS
Install using pip:
pip install apache-airflow
If you need a specific database backend (e.g., PostgreSQL):
pip install apache-airflow[postgres]
Windows
Airflow is primarily developed and tested on Unix-like systems. While it can run on Windows, it’s not officially supported and may lead to unexpected issues. It’s highly recommended to use a Linux environment (e.g., WSL, Docker) for Airflow development and production.
If you must install on Windows:
- Install Python.
- Install pip.
- Open Command Prompt or PowerShell as Administrator.
- Install Airflow:
pip install apache-airflow - You may encounter issues with certain dependencies; consult Airflow documentation for platform-specific troubleshooting.
Core Concepts
- DAG (Directed Acyclic Graph): A collection of tasks with defined dependencies, organized in a way that reflects their relationships and execution order.
- Task: A single unit of work within a DAG.
- Task Instance: A specific run of a task for a specific DAG run.
- DAG Run: A specific execution of a DAG for a given logical date.
- Operator: A template for a type of task (e.g.,
BashOperator,PythonOperator,PostgresOperator). - Connection: A set of credentials and parameters Airflow uses to connect to external services.
- Variable: A key-value store for storing configuration or dynamic information.
Commands / Usage
DAG Management
List DAGs
airflow dags list
List all DAGs available in your Airflow environment.
airflow dags list --subdir /path/to/your/dags
List DAGs from a specific directory.
Show DAG Information
airflow dags show my_dag_id
Display information about a specific DAG, including its structure and tasks.
airflow dags show my_dag_id --graph
Generate a visual graph (DOT format) of the DAG’s structure. You’ll need Graphviz installed to render this:
airflow dags show my_dag_id --graph | dot -Tpng > my_dag.png
Enable/Disable DAGs
airflow dags state my_dag_id paused True
Pause a DAG.
airflow dags state my_dag_id paused False
Unpause a DAG.
Trigger DAGs
airflow dags trigger my_dag_id
Trigger a new DAG run for my_dag_id.
airflow dags trigger my_dag_id -c '{"key": "value"}'
Trigger a DAG run with configuration JSON.
airflow dags trigger my_dag_id --run-id manual__2023-10-27T10:00:00+00:00
Trigger a DAG run with a specific run ID.
Delete DAGs
airflow dags delete my_dag_id --yes
Delete a DAG and its associated task instances and DAG runs. Use --yes to skip confirmation.
Task Management
List Tasks in a DAG
airflow tasks list my_dag_id
List all tasks within a specific DAG.
airflow tasks list my_dag_id --tree
Display tasks in a tree format, showing dependencies.
Show Task Information
airflow tasks state my_dag_id my_task_id
Show the state of a specific task instance for the latest DAG run.
Run Tasks
airflow tasks run my_dag_id my_task_id 2023-10-27
Run a specific task instance for a given DAG ID and logical date.
airflow tasks run my_dag_id my_task_id 2023-10-27 --local
Run a task instance locally without interacting with the metadata database. Useful for testing.
airflow tasks test my_dag_id my_task_id 2023-10-27
Test a task instance locally. This bypasses most Airflow logic and runs the task’s execute method directly.
Clear Task States
airflow tasks clear my_dag_id --task-regex 'my_task_.*' -s 2023-10-26 -e 2023-10-27
Clear task states for tasks matching a regex within a date range.
airflow tasks clear my_dag_id --dag-run-id manual__2023-10-27T10:00:00+00:00
Clear tasks associated with a specific DAG run.
airflow tasks clear my_dag_id --yes
Clear all task instances for the specified DAG. Use --yes to skip confirmation.
Task Instance States
airflow tasks state my_dag_id my_task_id 2023-10-27
Get the state of a specific task instance for a given logical date.
List Task Instances
airflow tasks list-runs my_dag_id --start-date 2023-10-26 --end-date 2023-10-27
List DAG runs within a date range.
airflow tasks list-instances my_dag_id --state running
List task instances for a DAG that are currently in a 'running' state.
Connections
List Connections
airflow connections list
List all connections configured in Airflow.
airflow connections list --conn-type postgres
List connections of a specific type (e.g., PostgreSQL).
Add Connection
airflow connections add 'my_postgres_conn' --conn-type 'postgres' --conn-host 'localhost' --conn-login 'user' --conn-password 'password' --conn-schema 'mydb' --conn-port 5432
Add a new connection with specified details.
Delete Connection
airflow connections delete 'my_postgres_conn'
Delete a connection by its connection ID.
Get Connection
airflow connections get 'my_postgres_conn'
Retrieve the details of a specific connection.
Variables
List Variables
airflow variables list
List all variables stored in Airflow.
Set Variable
airflow variables set my_variable_key 'my_variable_value'
Set or update a variable.
airflow variables set --json '{"key1": "value1", "key2": 123}'
Set multiple variables from a JSON string.
Get Variable
airflow variables get my_variable_key
Retrieve the value of a specific variable.
airflow variables get --json my_variable_key
Retrieve the value of a variable as JSON.
Delete Variable
airflow variables delete my_variable_key
Delete a variable.
Import/Export Variables
airflow variables import /path/to/variables.json
Import variables from a JSON file.
airflow variables export /path/to/variables.json
Export all variables to a JSON file.
Core Airflow Operations
Initialize Database
airflow db migrate
Initialize or upgrade the Airflow metadata database. This is crucial after installation or upgrading Airflow.
Start Webserver
airflow webserver -p 8080
Start the Airflow web UI on port 8080.
Start Scheduler
airflow scheduler
Start the Airflow scheduler process, which monitors DAGs and triggers task runs.
Check Database Connection
airflow db check
Verify that Airflow can connect to its metadata database.
Version
airflow version
Display the installed Airflow version.
Common Patterns
Running Airflow Locally for Development
- Initialize the database:
airflow db migrate - Create an admin user (if needed):
airflow users create \ --username admin \ --firstname Admin \ --lastname User \ --role Admin \ --email admin@example.com \ --password admin - Start the webserver:
airflow webserver -p 8080 - Start the scheduler in a separate terminal:
airflow scheduler
Testing a Task Locally
airflow tasks test my_dag_id my_task_id 2023-10-27
This is invaluable for debugging individual task logic without needing to trigger a full DAG run or rely on the scheduler.
Clearing Task Instances for Reruns
To rerun a specific task and all its downstream tasks for a particular date:
airflow tasks clear my_dag_id --task-regex 'my_failing_task' -s 2023-10-27 -e 2023-10-27 --yes
This clears the state of my_failing_task and any tasks that depend on it for the logical date 2023-10-27, allowing them to be rescheduled.
Triggering a DAG with Specific Configuration
airflow dags trigger my_dag_id --run-id custom_run_20231027 --conf '{"key1": "value1", "date": "2023-10-27"}'
This triggers a DAG run with a custom run ID and passes a JSON configuration object that can be accessed within the DAG tasks using dag_run.conf.
Managing Secrets with Connections
Instead of hardcoding credentials in DAGs, use Airflow connections:
airflow connections add 'my_s3_conn' \
--conn-type 'aws' \
--conn-extra '{"aws_access_key_id": "AKIA...", "aws_secret_access_key": "SECRET..."}'
Then, reference this connection in your S3-related operators.
Gotchas
airflow db migrateis essential: Always runairflow db migrateafter installing or upgrading Airflow to ensure your metadata database schema is up-to-date.- Scheduler vs. Webserver: You need both the
airflow schedulerandairflow webserverrunning for a functional Airflow environment. They are separate processes. - Logical Date vs. Execution Date: Airflow uses "logical dates" (often referred to as execution dates) to define DAG runs.
airflow tasks runandairflow tasks clearrequire this date. Be mindful of timezone configurations. airflow tasks testvs.airflow tasks run:testis for isolated task execution debugging.runinteracts with the scheduler and database to execute a task instance as part of a DAG run.airflow tasks clearbehavior: Clearing tasks does not stop running tasks. It marks them for rescheduling. If a task is running when cleared, it might continue to completion before Airflow picks up the cleared state.- Task Instance States: Understand the different states (
queued,running,success,failed,skipped,up_for_retry,up_for_reschedule,deferred). The CLI can help you inspect and manage these. - Configuration: Many Airflow settings are controlled via
airflow.cfgor environment variables. The CLI commands operate within the context of this configuration. - Permissions: Ensure the user running the CLI commands has the necessary permissions to interact with the Airflow metadata database and potentially the underlying execution environment (e.g., file system access for DAG files).
- Version Compatibility: Always ensure your CLI version matches your Airflow installation version for predictable behavior.