What it is
Jupyter Notebook is an interactive, web-based computing environment that allows you to create and share documents containing live code, equations, visualizations, and narrative text. It’s ideal for data exploration, analysis, machine learning, and teaching.
Installation
Using pip:
pip install notebook
Using conda:
conda install notebook
Starting a Notebook Server:
Navigate to your desired working directory in your terminal and run:
jupyter notebook
This will start a local web server and open a new tab in your default browser, showing the Jupyter file browser.
Core Concepts
- Notebook: A document that contains a list of cells.
- Cell: The basic building block of a notebook. There are two main types:
- Code Cells: Contain executable code in a specific programming language (most commonly Python).
- Markdown Cells: Contain rich text, formatted using Markdown, for explanations, documentation, and narrative.
- Kernel: The "computational engine" that runs the code in your notebook. Each notebook is connected to a single kernel. For Python, this is typically the
ipykernel. - Kernel Manager: Manages the lifecycle of kernels.
- Web Application: The user interface you interact with in your browser.
Commands / Usage
Starting and Managing Notebooks
-
Start Jupyter Notebook server:
jupyter notebookStarts the server and opens the dashboard in your browser.
-
Start Jupyter Lab server:
jupyter labStarts JupyterLab, a more integrated and extensible environment.
-
List available kernels:
jupyter kernelspec listShows the names and locations of installed kernels.
-
Install a new kernel (e.g., for a different Python environment):
python -m ipykernel install --user --name=my-env --display-name="Python (my-env)"Installs the
ipykernelfor the current Python environment, making it available in Jupyter. -
Shut down a notebook server: Press
Ctrl+Ctwice in the terminal where the server is running.
Working with Cells (within the Notebook Interface)
-
Create a new cell: Press
Bwhen in command mode (cell is outlined in blue) to add a cell below the current one. PressAto add a cell above. -
Change cell type: Press
Min command mode to change the cell to Markdown. PressYin command mode to change the cell to Code. -
Run a cell: Press
Shift+Enterto run the current cell and select the next one. PressCtrl+Enterto run the current cell and keep it selected. PressAlt+Enterto run the current cell and insert a new cell below it. -
Cut/Copy/Paste cells:
Xto cut,Cto copy,Vto paste in command mode. -
Delete a cell: Press
D,D(press D twice) in command mode. -
Undo cell deletion: Press
Zin command mode. -
Move cells up/down:
Up ArrowandDown Arrowkeys in command mode. -
Merge cells: Select multiple cells (Shift + Up/Down Arrow) and press
Min command mode. -
Split cell at cursor:
Ctrl+Shift+-(while cursor is in the cell). -
Toggle line numbers: Press
Lin command mode. -
Clear output of all cells: Go to
Cell -> All Output -> Clearin the menu. -
Restart kernel: Go to
Kernel -> Restartin the menu. This re-executes all cells from the beginning. -
Interrupt kernel: Go to
Kernel -> Interruptin the menu. This stops the currently running code.
Common Keyboard Shortcuts (Command Mode - Blue border)
Esc: Enter Command ModeEnter: Enter Edit Mode (Green border)A: Insert cell aboveB: Insert cell belowM: Change cell to MarkdownY: Change cell to CodeD,D: Delete cellZ: Undo cell deletionShift+Up/Down: Select multiple cellsC: Copy selected cell(s)X: Cut selected cell(s)V: Paste cell(s) below cursorShift+V: Paste cell(s) above cursorShift+M: Merge selected cellsCtrl+Shift+-: Split cell at cursorL: Toggle line numbers for current cellShift+L: Toggle line numbers for all cellsF: Find and replace in current notebookCtrl+S: Save notebook
Common Keyboard Shortcuts (Edit Mode - Green border)
Ctrl+Enter: Run cell, keep focusShift+Enter: Run cell, select nextAlt+Enter: Run cell, insert belowTab: Code completion or indentShift+Tab: Tooltip/documentation for object under cursorCtrl+A: Select all text in cellCtrl+Z: Undo typingCtrl+/: Comment/uncomment line(s)
Common Patterns
-
Importing common data science libraries:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as snsStandard imports for data manipulation, numerical operations, and plotting.
-
Loading a CSV file with pandas:
df = pd.read_csv('data/sales_data.csv') df.head()Reads a CSV into a DataFrame and displays the first 5 rows.
-
Basic data exploration:
print(df.info()) print(df.describe()) print(df.isnull().sum())Gets summary information, descriptive statistics, and counts of missing values.
-
Creating a simple plot:
plt.figure(figsize=(10, 6)) sns.scatterplot(data=df, x='revenue', y='profit') plt.title('Revenue vs. Profit') plt.show()Generates a scatter plot using Seaborn.
-
Magic Commands (start with
%or%%):-
Displaying plots inline:
%matplotlib inlineEnsures plots are displayed directly within the notebook output.
-
Timing cell execution:
%timeit df['column'].mean()Measures the execution time of a single statement.
%%timeit for i in range(1000): passMeasures the execution time of an entire cell.
-
Running shell commands:
!ls -lExecutes a bash command and displays its output.
-
Loading code from a Python file:
%load my_script.pyLoads the content of
my_script.pyinto a code cell. -
Saving notebook output to a file:
%%writefile output.txt This is some text. It will be saved to output.txt.Writes the cell’s content to a specified file.
-
-
Exporting notebook to other formats:
jupyter nbconvert my_notebook.ipynb --to pdf jupyter nbconvert my_notebook.ipynb --to html jupyter nbconvert my_notebook.ipynb --to scriptConverts the notebook to PDF, HTML, or a Python script.
Gotchas
- Cell Execution Order: Notebooks execute cells independently. If you run cells out of order, you can get inconsistent results or errors because variables or functions might not be defined yet. Always be mindful of the execution order or restart the kernel and run all cells (
Kernel -> Restart & Run All). - Global State: Variables and imported modules persist between cell executions within the same kernel session. This can lead to unexpected behavior if you modify a variable in one cell and then re-run another cell that assumes its original value.
- Kernel Crashes: If your code enters an infinite loop or consumes too much memory, the kernel can become unresponsive or crash. You’ll need to restart it.
%runvs.import: Using%run my_script.pyexecutes the script in the current notebook’s namespace, making its variables and functions directly available. Usingimport my_scriptimports it as a module, requiring you to access its contents viamy_script.function_name.- Kernel Disconnection: If your browser is closed or the connection to the server is lost, the kernel might continue running in the background. You might need to manually shut down orphaned kernels via the Jupyter server dashboard or
jupyter notebook stop. - Large Outputs: Cells with very large outputs (e.g., printing a huge DataFrame or generating many plot elements) can slow down the notebook interface or consume significant memory. Consider truncating or summarizing large outputs.