perf Linux Profiler

perf cheatsheet — profile CPU, count events, record and analyze. perf stat, perf record -g, perf report, perf top. Linux performance analysis with hardware counters.

9 min read

perf Linux Profiler

What it is

perf is a powerful performance analysis tool for Linux that allows you to profile CPU usage, trace kernel events, and gather hardware performance counter data. You reach for perf when you need to understand where your application or system is spending its time, identify performance bottlenecks, or debug performance regressions.

Installation

Linux

perf is part of the Linux kernel’s tools. It’s usually installed by installing the linux-tools package for your kernel version.

# Debian/Ubuntu
sudo apt update
sudo apt install linux-tools-$(uname -r) linux-tools-common

# Fedora
sudo dnf install perf

# Arch Linux
sudo pacman -S perf

Mac & Windows

perf is a Linux-specific tool and is not available on macOS or Windows natively. For performance profiling on these systems, you would use platform-specific tools like Instruments (macOS) or Visual Studio Profiler (Windows).

Core Concepts

  • Events: perf can track various events. These can be hardware-based (like CPU cycles, cache misses) or software-based (like context switches, page faults, kprobes, uprobes).
  • Sampling: The most common usage of perf involves sampling. It periodically interrupts the CPU and records the current program counter (and optionally other information). This gives a statistical overview of where the program spends its time.
  • Tracepoints: perf can also listen to kernel tracepoints, which are specific, well-defined points in the kernel code where instrumentation is added to record events.
  • Kprobes/Uprobes: Dynamic kernel probes (kprobes) and userspace probes (uprobes) allow you to instrument arbitrary kernel functions or userspace functions, respectively, for detailed tracing.
  • Sessions: perf operations are often organized into "sessions," which involve recording events and then analyzing the collected data.

Commands / Usage

Recording Performance Data

The primary command for collecting data is perf record.

  • Basic CPU profiling:

    sudo perf record ./my_application arg1 arg2
    

    Records CPU usage for ./my_application with its arguments.

  • Profiling for a specific duration:

    sudo perf record -g -- sleep 10
    

    Records CPU usage for 10 seconds, enabling call graph recording.

  • Profiling specific events:

    sudo perf record -e cycles,instructions ./my_application
    

    Records CPU cycles and instructions executed.

  • Profiling all available hardware events:

    sudo perf record -e all ./my_application
    

    Records all hardware performance counter events.

  • Profiling specific software events:

    sudo perf record -e context-switches,page-faults ./my_application
    

    Records context switches and page faults.

  • Profiling kernel functions (kprobes):

    sudo perf record -e 'k:*' ./my_application
    

    Records all kernel events.

  • Profiling specific kernel functions:

    sudo perf record -e 'k:__sys_open' ./my_application
    

    Records calls to the __sys_open kernel function.

  • Profiling userspace functions (uprobes):

    sudo perf record -e 'u:my_lib.so:my_function' ./my_application
    

    Records calls to my_function within my_lib.so.

  • Enabling call graph recording (DWARF/Frame pointers):

    sudo perf record -g ./my_application
    

    Records call graphs. Requires debugging symbols or frame pointers.

  • Recording raw trace data (for perf script):

    sudo perf record -o perf.data.raw -- raw_tracepoint:sys_enter_read
    

    Records raw trace data for the sys_enter_read tracepoint.

  • Buffering options:

    sudo perf record -m 102400 ./my_application
    

    Sets the buffer size to 100MB.

  • Overwriting buffer on overflow:

    sudo perf record -W ./my_application
    

    Continues recording even if the buffer overflows by overwriting old data.

Analyzing Performance Data

After recording, perf record creates a perf.data file. This data is analyzed using perf report, perf annotate, perf script, and perf stat.

perf report - Summary View

  • Default report:

    sudo perf report
    

    Opens an interactive TUI to analyze the recorded perf.data.

  • Report with specific events:

    sudo perf report -e cycles,instructions
    

    Shows statistics for cycles and instructions.

  • Report for a specific process/thread:

    sudo perf report --tid 1234
    

    Shows statistics for thread ID 1234.

  • Report with call graphs expanded:

    sudo perf report -g
    

    Displays call graphs in the interactive report.

  • Report sorted by overhead:

    sudo perf report --sort=overhead
    

    Sorts the report by the percentage of overhead.

  • Report for specific DSO (Dynamic Shared Object):

    sudo perf report --dso my_application
    

    Filters the report to show only symbols from my_application.

  • Report for specific symbol:

    sudo perf report --symbol my_function
    

    Filters the report to show only my_function.

  • Dumping report to stdout:

    sudo perf report --stdio
    

    Prints the report to standard output instead of the TUI.

perf annotate - Source Code Annotation

  • Annotate default recording:

    sudo perf annotate
    

    Opens an interactive view showing source code annotated with performance data.

  • Annotate specific symbol:

    sudo perf annotate my_function
    

    Annotates the source code for my_function.

  • Annotate with call graph:

    sudo perf annotate -g
    

    Annotates with call graph information.

perf script - Raw Event Data Processing

perf script is used to process the raw event data, often for scripting or custom analysis.

  • Basic script output:

    sudo perf script
    

    Prints the recorded events in a human-readable format.

  • Scripting with specific events:

    sudo perf script | grep "my_function"
    

    Filters script output for lines containing "my_function".

  • Outputting in JSON format:

    sudo perf script -F comm,pid,tid,time,event,sym,overhead
    

    Outputs specific fields in a custom format.

  • Processing specific perf.data file:

    sudo perf script -i my_custom.data
    

    Processes a specific data file.

  • Generating call graphs for script:

    sudo perf script -g --call-graph dwarf > callgraph.txt
    

    Generates call graph information using DWARF.

perf stat - Event Counting

perf stat runs a command and reports statistics on various events without creating a perf.data file.

  • Basic statistics:

    sudo perf stat ./my_application arg1
    

    Runs ./my_application and prints summary statistics for CPU cycles, instructions, etc.

  • Counting specific events:

    sudo perf stat -e cycles,instructions,cache-references,cache-misses ./my_application
    

    Counts specific hardware events.

  • Counting software events:

    sudo perf stat -e context-switches,page-faults ./my_application
    

    Counts software events.

  • Counting all available hardware events:

    sudo perf stat -e all ./my_application
    

    Counts all hardware performance counter events.

  • Counting tracepoints:

    sudo perf stat -e 'trace_events:syscalls:sys_enter_open' ./my_application
    

    Counts specific tracepoints.

  • Counting kprobes:

    sudo perf stat -e 'k:__sys_read' ./my_application
    

    Counts calls to __sys_read.

  • Counting uprobes:

    sudo perf stat -e 'u:my_lib.so:my_function' ./my_application
    

    Counts calls to my_function in my_lib.so.

  • Per-process statistics:

    sudo perf stat -p $(pidof my_application)
    

    Collects statistics for a running process.

  • Per-thread statistics:

    sudo perf stat -t $(pidof my_application)
    

    Collects statistics for a running thread.

  • Aggregated statistics (default):

    sudo perf stat -a ./my_application
    

    Collects statistics for all CPUs.

  • Statistics for a specific CPU:

    sudo perf stat -C 0 ./my_application
    

    Collects statistics only for CPU 0.

  • Interval statistics:

    sudo perf stat -I 1000 ./my_application
    

    Prints statistics every 1000 milliseconds.

  • Outputting to CSV:

    sudo perf stat -e cpu-cycles,instructions --csv ./my_application
    

    Outputs statistics in CSV format.

Listing Available Events

  • List all available events:

    perf list
    

    Shows all hardware, software, tracepoint, kprobe, and uprobe events perf can track.

  • List hardware events:

    perf list hardware
    
  • List software events:

    perf list software
    
  • List tracepoints:

    perf list tracepoints
    
  • List kprobes:

    perf list kprobes
    
  • List uprobes:

    perf list uprobes
    

Other Useful Commands

  • Top-like view of events:

    sudo perf top
    

    Shows a real-time, top-like display of functions consuming CPU cycles.

  • Top-like view with specific events:

    sudo perf top -e cache-misses
    

    Shows functions consuming CPU time based on cache misses.

  • Top-like view for kernel space:

    sudo perf top -K
    

    Filters the perf top view to kernel functions.

  • Top-like view for user space:

    sudo perf top -U
    

    Filters the perf top view to user-space functions.

  • Benchmarking:

    sudo perf bench sched messaging
    

    Runs built-in benchmarks.

  • Listing CPUs:

    perf list cpu
    

    Lists available CPUs for profiling.

Common Patterns

  • Profile application and analyze call graphs:

    sudo perf record -g -o perf.data ./my_application
    sudo perf report -g
    

    This is the most common workflow for understanding where an application is spending CPU time, including function call relationships.

  • Identify cache miss hotspots:

    sudo perf record -e cache-misses,cache-references -o perf.data ./my_application
    sudo perf report
    

    Focuses on memory access patterns.

  • Profile system-wide for a short period:

    sudo perf record -a -- sleep 5
    sudo perf report
    

    Captures system-wide activity for 5 seconds.

  • Count specific system calls:

    sudo perf stat -e 'syscalls:sys_enter_read' -a ./my_application
    

    Counts how many times the read system call is made system-wide.

  • Trace specific kernel function calls and analyze with script:

    sudo perf record -e 'k:__sys_open' -o kopen.data
    sudo perf script -i kopen.data | grep "openat"
    

    Traces calls to __sys_open and then filters the raw output for openat calls.

  • Profile a running process by PID:

    sudo perf record -p $(pgrep my_daemon) -o daemon.data
    sudo perf report -i daemon.data
    

    Attaches to a running my_daemon process and profiles it.

  • Profile a running process by PID for a duration:

    sudo perf stat -p $(pgrep my_daemon) -e cycles,instructions -I 1000
    

    Monitors a running process every second for cycles and instructions.

  • Find functions causing high I/O wait (indirectly): While perf doesn’t directly measure I/O wait, you can infer it by observing high CPU usage in kernel code related to I/O (e.g., __do_page_cache_readahead, blk_mq_dispatch_request).

    sudo perf record -g -e cycles -o perf.data
    sudo perf report -g
    # Look for kernel functions in the report
    
  • Using perf with grep for specific events:

    sudo perf record -e 'trace_events:*' -o trace.data ./my_app
    sudo perf script -i trace.data | grep "sched_switch"
    

    Records all trace events and then filters for context switch events.

Gotchas

  • Permissions: Most perf commands require root privileges (sudo) to access performance counters and kernel tracepoints.
  • perf.data file size: Recording can generate very large perf.data files, especially when profiling for long durations or with many events.
  • Call Graph Accuracy: Call graph recording (-g) relies on either DWARF debugging information in the binaries or CPU’s frame pointer (FP) support. If neither is available or properly configured, call graphs can be incomplete or inaccurate.
  • Kernel Module Events: Profiling kernel modules might require loading them before perf record starts, or using kprobe events with the module name.
  • Event Availability: Not all CPUs support all hardware performance counters. perf list will show what’s available on your system.
  • Sampling Frequency: The default sampling frequency might be too low to capture short-lived events. You can adjust it with the -F flag (e.g., perf record -F 99).
  • perf top vs perf report: perf top provides real-time insights, while perf report analyzes a recorded perf.data file, offering more detailed and persistent analysis.
  • System-wide vs Process-specific: Using -a for system-wide profiling can be noisy if you’re trying to debug a specific application. Use -p <pid> or -t <tid> for targeted profiling.
  • Interpreting perf stat output: The units and meaning of events (e.g., cycles, instructions, L1-dcache-load-misses) require some understanding of CPU architecture.
  • perf script verbosity: The raw output of perf script can be overwhelming. Use formatting flags (-F) or pipe to grep/awk for targeted analysis.