Python dominates data science because it’s intuitive, powerful, and backed by amazing libraries like pandas, numpy, and scikit-learn. Whether you’re just starting out or need a quick refresher, this cheat sheet hopefully has everything you need.
Note
Mainly creating this post to check out the subpost feature in this astro template which I feel is great. I didn’t see this feature last time I used erudite and had to copy the structure delba uses in her portfolio blog (or maybe ‘used’). That is still the best possible way to implement though, anyways for the time being I’ll add a post in erudite and later maybe I’ll encorporate that feature as well.
Working with Files
The working directory
The working directory is where Python looks for files by default (e.g., C://file/path).
import os
# Get current working directorywd = os.getcwd() # '/current/path'
# List files in directoryos.listdir(wd)
# Change working directoryos.chdir('new/working/directory')
# Common file operationsos.rename('old.txt', 'new.txt') # Renameos.remove('file.txt') # Deleteos.mkdir('new_folder') # Create folderOperators
Operators let you perform mathematical operations, comparisons, and logical tests. Master these fundamentals first.
Arithmetic Operators
# Addition10 + 2 # 12
# Subtraction10 - 2 # 8
# Multiplication4 * 6 # 24
# Division22 / 7 # 3.142857...
# Integer division22 // 7 # 3
# Power (exponentiation)3 ** 4 # 81
# Modulo (remainder)22 % 7 # 1Assignment Operators
# Assign a valuea = 5
# Change list itemx[0] = 1Comparison Operators
# Test equality3 == 3 # True
# Test inequality3 != 3 # False
# Greater than3 > 1 # True
# Greater than or equal3 >= 3 # True
# Less than3 < 4 # True
# Less than or equal3 <= 4 # TrueLogical Operators
# Logical NOTnot (2 == 2) # False
# Logical AND(1 != 1) and (1 < 1) # False
# Logical OR(1 == 1) or (1 < 1) # TrueLists
Lists are the bread and butter of data science. They store sequences of values: numbers, text, even other lists!
Use lists when you need ordered data that you’ll iterate through or transform.
Creating Lists
# Create lists with [], elements separated by commasx = [1, 3, 2, 4]fruits = ['apple', 'banana', 'orange']mixed = [1, 'hello', 3.14, True]List Functions and Methods
# Return sorted copysorted([3, 1, 2]) # [1, 2, 3]
# Sort in placex.sort()
# Reverse orderreversed(x) # Returns reversed iterator
# Reverse in placex.reverse()
# Count elementsx.count(2) # Number of times 2 appearsSelecting List Elements
Lists are zero-indexed (first element has index 0).
x = ['a', 'b', 'c', 'd', 'e']
x[0] # 'a' (first element)x[-1] # 'e' (last element)x[1:3] # ['b', 'c'] (1st inclusive, 3rd exclusive)x[2:] # ['c', 'd', 'e'] (2nd to end)x[:3] # ['a', 'b', 'c'] (0th to 3rd exclusive)Concatenating Lists
x = [1, 3, 6]y = [10, 15, 21]
x + y # [1, 3, 6, 10, 15, 21]3 * x # [1, 3, 6, 1, 3, 6, 1, 3, 6]Dictionaries
Think of dictionaries as lookup tables. Perfect for storing structured data, survey responses, and configuration settings.
Use dictionaries when: You need fast lookups by name/key rather than position.
Creating Dictionaries
# Create a dictionary with {}student = {'name': 'Alice', 'age': 22, 'grade': 'A'}scores = {'math': 95, 'science': 87, 'history': 92}Dictionary Functions and Methods
x = {'a': 1, 'b': 2, 'c': 3}
x.keys() # dict_keys(['a', 'b', 'c'])x.values() # dict_values([1, 2, 3])x['a'] # 1 (get value by key)x.get('d', 0) # 0 (get with default)Dictionary Operations
# Add or updatestudent['gpa'] = 3.75
# Removedel student['age']
# Check if key exists'name' in student # TrueStrings
Work with text data efficiently. String manipulation is essential for cleaning data and extracting insights.
Creating Strings
In data science: You’ll parse filenames, clean text columns, extract patterns.
# Single line strings"DataCamp"'DataCamp'
# Escape quotes"He said, \"DataCamp\""
# Multi-line strings"""A Frame of DataTidy, Mine, Analyze ItNow You Have Meaning"""String Operations
str = "DataCamp"
str[0] # 'D' (first character)str[0:4] # 'Data' (substring)str.upper() # 'DATACAMP'str.lower() # 'datacamp'str.title() # 'Datacamp'str.replace('a', 'e') # 'DetCe' (replace all)Combining Strings
"Data" + "Framed" # 'DataFramed'3 * "data " # 'data data data '"beekeepers".split('e') # ['b', '', 'k', '', 'p', 'rs']Functions
Functions transform data from one shape to another. They’re the building blocks of data pipelines.
Basic Functions
def calculate_mean(numbers): """Calculate the mean of a list of numbers.""" if not numbers: return 0 return sum(numbers) / len(numbers)
# Usagetemperatures = [72, 68, 75, 82, 77]avg_temp = calculate_mean(temperatures)print(f"Average: {avg_temp}°F")Function Parameters
# Default parametersdef greet(name="Guest"): return f"Hello, {name}"
greet() # 'Hello, Guest'greet("Alice") # 'Hello, Alice'
# Multiple return valuesdef stats(data): return min(data), max(data), sum(data)/len(data)
min_val, max_val, mean = stats([1, 5, 3, 9, 2])Comprehensions
Python’s superpower for data transformations. List and dictionary comprehensions are faster and more readable than loops.
List Comprehensions
# Traditional loopsquared = []for x in range(1, 6): squared.append(x ** 2)
# List comprehensionsquared = [x ** 2 for x in range(1, 6)]
# With condition (filtering)even_squares = [x ** 2 for x in range(1, 11) if x % 2 == 0]# Result: [4, 16, 36, 64, 100]Dictionary Comprehensions
Transform data structures efficiently:
# Create dictionary of squaressquares = {x: x**2 for x in range(1, 6)}# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Filter and transformtemperatures = {'Mon': 72, 'Tue': 68, 'Wed': 75, 'Thu': 82}hot_days = {day: temp for day, temp in temperatures.items() if temp > 75}Built-in Functions
Python’s standard library has powerful functions that save you time. Learn these well.
enumerate()
Loop with both index and value together:
grades = [85, 92, 78, 96]
for index, grade in enumerate(grades): print(f"Student {index + 1}: {grade}%")zip()
Combine multiple lists:
students = ['Alice', 'Bob', 'Charlie']scores = [85, 92, 78]
for student, score in zip(students, scores): print(f"{student}: {score}")
# Create dictionarystudent_dict = dict(zip(students, scores))Error Handling
Real data is messy. Handle errors gracefully or your entire pipeline breaks.
def safe_divide(num1, num2): """Safely divide two numbers.""" try: return num1 / num2 except ZeroDivisionError: print("Cannot divide by zero") return None except TypeError: print("Both values must be numbers") return NoneModules
Organize your code into reusable modules. Essential for building larger projects.
Importing Packages
# Import without aliasimport pandas
# Import with aliasimport pandas as pd
# Import specific objectfrom pandas import DataFrameCreating Your Own Module
"""Utility functions for data science."""
def mean(data): """Calculate the mean of a dataset.""" return sum(data) / len(data)
def median(data): """Calculate the median of a dataset.""" sorted_data = sorted(data) n = len(sorted_data) if n % 2 == 0: return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2 return sorted_data[n//2]Using Modules
# Import your moduleimport data_utils
# Use functionstemperatures = [72, 68, 75, 82, 77]avg = data_utils.mean(temperatures)Standard Library Modules
Python’s standard library is a goldmine for data science:
collections
Advanced data structures for complex operations:
from collections import Counter, defaultdict
# Counter for frequency analysisvotes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob']vote_counts = Counter(votes)print(vote_counts.most_common(2))# [('Alice', 2), ('Bob', 2)]
# defaultdict for nested dictionariesstudent_scores = defaultdict(list)student_scores['Alice'].append(95)student_scores['Bob'].append(87)csv
Read and write CSV files—essential for data science:
import csv
# Reading CSV fileswith open('data.csv', 'r') as file: reader = csv.DictReader(file) for row in reader: print(row['name'], row['score'])json
Work with JSON data from APIs and files:
import json
# Convert to JSON stringdata = {'name': 'Alice', 'age': 22, 'scores': [95, 87, 92]}json_string = json.dumps(data)
# Parse from JSON stringparsed = json.loads(json_string)Lambda Functions
One-line functions for quick operations:
# Lambda for quick calculationssquare = lambda x: x ** 2
# Use with built-in functionsnumbers = [1, 2, 3, 4, 5]squared = list(map(lambda x: x ** 2, numbers))
# Sorting with custom keystudents = [ {'name': 'Alice', 'score': 85}, {'name': 'Bob', 'score': 92}, {'name': 'Charlie', 'score': 78}]
# Sort by scoresorted_students = sorted(students, key=lambda x: x['score'], reverse=True)Quick Reference Summary
Bookmark this page for instant lookup!
Data Structures Cheat Sheet
Lists & Dictionaries - Your Main Tools
# Creating collections[1, 2, 3] # List of numbers['a', 'b', 'c'] # List of strings{'name': 'Alice', 'age': 22} # Dictionaryx[0] # Access first elementx[-1] # Access last elementx[1:4] # Slice elements 1-3Quick Snippets:
len(x)- Get lengthx.append(item)- Add to listx.keys()/x.values()- Dictionary methods'key' in x- Check membership
Transformations - Pythonic Way
List Comprehensions vs Loops
# Old wayresult = []for x in data: result.append(x * 2)
# Pythonic wayresult = [x * 2 for x in data]
# With conditionevens = [x for x in data if x % 2 == 0]Dictionary Comprehensions
{x: x**2 for x in range(5)} # Create mapping{x: x for x in data if x > 0} # Filter while mappingFunctions & Control Flow
Essential Patterns
# Define functiondef clean_data(x): return x.strip()
# Lambda (one-liner)lambda x: x * 2
# Error handlingtry: result = x / yexcept ZeroDivisionError: result = 0Most Important Rules 🎯
- Lists → Ordered data, iterations
- Dictionaries → Key-value lookups
- Comprehensions → Fast transformations
- Functions → Reusable logic
- Error handling → Real-world resilience