Python Basics Cheat Sheet

A quick reference cheat sheet for Python basics for data science. Quick refresher for beginners and intermediate users.

Oct 25, 2025·8 min read·⏱ 0s··

SERIES

Data Science from Scratch

1 of 3

Python dominates data science because it's intuitive, powerful, and backed by amazing libraries like pandas, numpy, and scikit-learn. Whether you're just starting out or need a quick refresher, this cheat sheet hopefully has everything you need.

Pro tip: Keep this page bookmarked! You'll reference it constantly as you work through data science projects.

Working with Files

The Working Directory

The working directory is where Python looks for files by default (e.g., C://file/path).

1import os
2
3# Get current working directory
4wd = os.getcwd()  # '/current/path'
5
6# List files in directory
7os.listdir(wd)
8
9# Change working directory
10os.chdir('new/working/directory')
11
12# Common file operations
13os.rename('old.txt', 'new.txt')  # Rename
14os.remove('file.txt')             # Delete
15os.mkdir('new_folder')             # Create folder

1import os
2
3# Get current working directory
4wd = os.getcwd()  # '/current/path'
5
6# List files in directory
7os.listdir(wd)
8
9# Change working directory
10os.chdir('new/working/directory')
11
12# Common file operations
13os.rename('old.txt', 'new.txt')  # Rename
14os.remove('file.txt')             # Delete
15os.mkdir('new_folder')             # Create folder

Operators

Operators let you perform mathematical operations, comparisons, and logical tests. Master these fundamentals first.

Arithmetic Operators

1# Addition
210 + 2  # 12
3
4# Subtraction
510 - 2  # 8
6
7# Multiplication
84 * 6  # 24
9
10# Division
1122 / 7  # 3.142857...
12
13# Integer division
1422 // 7  # 3
15
16# Power (exponentiation)
173 ** 4  # 81
18
19# Modulo (remainder)
2022 % 7  # 1

1# Addition
210 + 2  # 12
3
4# Subtraction
510 - 2  # 8
6
7# Multiplication
84 * 6  # 24
9
10# Division
1122 / 7  # 3.142857...
12
13# Integer division
1422 // 7  # 3
15
16# Power (exponentiation)
173 ** 4  # 81
18
19# Modulo (remainder)
2022 % 7  # 1

Assignment Operators

1# Assign a value
2a = 5
3
4# Change list item
5x[0] = 1

1# Assign a value
2a = 5
3
4# Change list item
5x[0] = 1

Comparison Operators

1# Test equality
23 == 3  # True
3
4# Test inequality
53 != 3  # False
6
7# Greater than
83 > 1  # True
9
10# Greater than or equal
113 >= 3  # True
12
13# Less than
143 < 4  # True
15
16# Less than or equal
173 <= 4  # True

1# Test equality
23 == 3  # True
3
4# Test inequality
53 != 3  # False
6
7# Greater than
83 > 1  # True
9
10# Greater than or equal
113 >= 3  # True
12
13# Less than
143 < 4  # True
15
16# Less than or equal
173 <= 4  # True

Logical Operators

1# Logical NOT
2not (2 == 2)  # False
3
4# Logical AND
5(1 != 1) and (1 < 1)  # False
6
7# Logical OR
8(1 == 1) or (1 < 1)  # True

1# Logical NOT
2not (2 == 2)  # False
3
4# Logical AND
5(1 != 1) and (1 < 1)  # False
6
7# Logical OR
8(1 == 1) or (1 < 1)  # True

Lists

Lists are the bread and butter of data science. They store sequences of values: numbers, text, even other lists!

Use lists when you need ordered data that you'll iterate through or transform.

Creating Lists

1# Create lists with [], elements separated by commas
2x = [1, 3, 2, 4]
3fruits = ['apple', 'banana', 'orange']
4mixed = [1, 'hello', 3.14, True]

1# Create lists with [], elements separated by commas
2x = [1, 3, 2, 4]
3fruits = ['apple', 'banana', 'orange']
4mixed = [1, 'hello', 3.14, True]

List Functions and Methods

1# Return sorted copy
2sorted([3, 1, 2])  # [1, 2, 3]
3
4# Sort in place
5x.sort()
6
7# Reverse order
8reversed(x)  # Returns reversed iterator
9
10# Reverse in place
11x.reverse()
12
13# Count elements
14x.count(2)  # Number of times 2 appears

1# Return sorted copy
2sorted([3, 1, 2])  # [1, 2, 3]
3
4# Sort in place
5x.sort()
6
7# Reverse order
8reversed(x)  # Returns reversed iterator
9
10# Reverse in place
11x.reverse()
12
13# Count elements
14x.count(2)  # Number of times 2 appears

Selecting List Elements

Lists are zero-indexed (first element has index 0).

1x = ['a', 'b', 'c', 'd', 'e']
2
3x[0]      # 'a' (first element)
4x[-1]     # 'e' (last element)
5x[1:3]    # ['b', 'c'] (1st inclusive, 3rd exclusive)
6x[2:]     # ['c', 'd', 'e'] (2nd to end)
7x[:3]     # ['a', 'b', 'c'] (0th to 3rd exclusive)

1x = ['a', 'b', 'c', 'd', 'e']
2
3x[0]      # 'a' (first element)
4x[-1]     # 'e' (last element)
5x[1:3]    # ['b', 'c'] (1st inclusive, 3rd exclusive)
6x[2:]     # ['c', 'd', 'e'] (2nd to end)
7x[:3]     # ['a', 'b', 'c'] (0th to 3rd exclusive)

Concatenating Lists

1x = [1, 3, 6]
2y = [10, 15, 21]
3
4x + y           # [1, 3, 6, 10, 15, 21]
53 * x           # [1, 3, 6, 1, 3, 6, 1, 3, 6]

1x = [1, 3, 6]
2y = [10, 15, 21]
3
4x + y           # [1, 3, 6, 10, 15, 21]
53 * x           # [1, 3, 6, 1, 3, 6, 1, 3, 6]

Dictionaries

Think of dictionaries as lookup tables. Perfect for storing structured data, survey responses, and configuration settings.

Use dictionaries when: You need fast lookups by name/key rather than position.

Creating Dictionaries

1# Create a dictionary with {}
2student = {'name': 'Alice', 'age': 22, 'grade': 'A'}
3scores = {'math': 95, 'science': 87, 'history': 92}

1# Create a dictionary with {}
2student = {'name': 'Alice', 'age': 22, 'grade': 'A'}
3scores = {'math': 95, 'science': 87, 'history': 92}

Dictionary Functions and Methods

1x = {'a': 1, 'b': 2, 'c': 3}
2
3x.keys()        # dict_keys(['a', 'b', 'c'])
4x.values()      # dict_values([1, 2, 3])
5x['a']          # 1 (get value by key)
6x.get('d', 0)   # 0 (get with default)

1x = {'a': 1, 'b': 2, 'c': 3}
2
3x.keys()        # dict_keys(['a', 'b', 'c'])
4x.values()      # dict_values([1, 2, 3])
5x['a']          # 1 (get value by key)
6x.get('d', 0)   # 0 (get with default)

Dictionary Operations

1# Add or update
2student['gpa'] = 3.75
3
4# Remove
5del student['age']
6
7# Check if key exists
8'name' in student  # True

1# Add or update
2student['gpa'] = 3.75
3
4# Remove
5del student['age']
6
7# Check if key exists
8'name' in student  # True

Strings

Work with text data efficiently. String manipulation is essential for cleaning data and extracting insights.

Creating Strings

In data science: You'll parse filenames, clean text columns, extract patterns.

1# Single line strings
2"DataCamp"
3'DataCamp'
4
5# Escape quotes
6"He said, \"DataCamp\""
7
8# Multi-line strings
9"""
10A Frame of Data
11Tidy, Mine, Analyze It
12Now You Have Meaning
13"""

1# Single line strings
2"DataCamp"
3'DataCamp'
4
5# Escape quotes
6"He said, \"DataCamp\""
7
8# Multi-line strings
9"""
10A Frame of Data
11Tidy, Mine, Analyze It
12Now You Have Meaning
13"""

String Operations

1str = "DataCamp"
2
3str[0]           # 'D' (first character)
4str[0:4]         # 'Data' (substring)
5str.upper()      # 'DATACAMP'
6str.lower()      # 'datacamp'
7str.title()      # 'Datacamp'
8str.replace('a', 'e')  # 'DetCe' (replace all)

1str = "DataCamp"
2
3str[0]           # 'D' (first character)
4str[0:4]         # 'Data' (substring)
5str.upper()      # 'DATACAMP'
6str.lower()      # 'datacamp'
7str.title()      # 'Datacamp'
8str.replace('a', 'e')  # 'DetCe' (replace all)

Combining Strings

1"Data" + "Framed"      # 'DataFramed'
23 * "data "            # 'data data data '
3"beekeepers".split('e')  # ['b', '', 'k', '', 'p', 'rs']

1"Data" + "Framed"      # 'DataFramed'
23 * "data "            # 'data data data '
3"beekeepers".split('e')  # ['b', '', 'k', '', 'p', 'rs']

Functions

Functions transform data from one shape to another. They're the building blocks of data pipelines.

Functions keep your code DRY (Don't Repeat Yourself). Write once, use everywhere!

Basic Functions

1def calculate_mean(numbers):
2    """Calculate the mean of a list of numbers."""
3    if not numbers:
4        return 0
5    return sum(numbers) / len(numbers)
6
7# Usage
8temperatures = [72, 68, 75, 82, 77]
9avg_temp = calculate_mean(temperatures)
10print(f"Average: {avg_temp}°F")

1def calculate_mean(numbers):
2    """Calculate the mean of a list of numbers."""
3    if not numbers:
4        return 0
5    return sum(numbers) / len(numbers)
6
7# Usage
8temperatures = [72, 68, 75, 82, 77]
9avg_temp = calculate_mean(temperatures)
10print(f"Average: {avg_temp}°F")

Function Parameters

1# Default parameters
2def greet(name="Guest"):
3    return f"Hello, {name}"
4
5greet()           # 'Hello, Guest'
6greet("Alice")    # 'Hello, Alice'
7
8# Multiple return values
9def stats(data):
10    return min(data), max(data), sum(data)/len(data)
11
12min_val, max_val, mean = stats([1, 5, 3, 9, 2])

1# Default parameters
2def greet(name="Guest"):
3    return f"Hello, {name}"
4
5greet()           # 'Hello, Guest'
6greet("Alice")    # 'Hello, Alice'
7
8# Multiple return values
9def stats(data):
10    return min(data), max(data), sum(data)/len(data)
11
12min_val, max_val, mean = stats([1, 5, 3, 9, 2])

Comprehensions

Python's superpower for data transformations. List and dictionary comprehensions are faster and more readable than loops.

List Comprehensions

Comprehensions are up to 30% faster than traditional loops. Plus they're more Pythonic!

1# Traditional loop
2squared = []
3for x in range(1, 6):
4    squared.append(x ** 2)
5
6# List comprehension
7squared = [x ** 2 for x in range(1, 6)]
8
9# With condition (filtering)
10even_squares = [x ** 2 for x in range(1, 11) if x % 2 == 0]
11# Result: [4, 16, 36, 64, 100]

1# Traditional loop
2squared = []
3for x in range(1, 6):
4    squared.append(x ** 2)
5
6# List comprehension
7squared = [x ** 2 for x in range(1, 6)]
8
9# With condition (filtering)
10even_squares = [x ** 2 for x in range(1, 11) if x % 2 == 0]
11# Result: [4, 16, 36, 64, 100]

Dictionary Comprehensions

Transform data structures efficiently:

1# Create dictionary of squares
2squares = {x: x**2 for x in range(1, 6)}
3# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
4
5# Filter and transform
6temperatures = {'Mon': 72, 'Tue': 68, 'Wed': 75, 'Thu': 82}
7hot_days = {day: temp for day, temp in temperatures.items() if temp > 75}

1# Create dictionary of squares
2squares = {x: x**2 for x in range(1, 6)}
3# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
4
5# Filter and transform
6temperatures = {'Mon': 72, 'Tue': 68, 'Wed': 75, 'Thu': 82}
7hot_days = {day: temp for day, temp in temperatures.items() if temp > 75}

Built-in Functions

Python's standard library has powerful functions that save you time. Learn these well.

enumerate()

Loop with both index and value together:

1grades = [85, 92, 78, 96]
2
3for index, grade in enumerate(grades):
4    print(f"Student {index + 1}: {grade}%")

1grades = [85, 92, 78, 96]
2
3for index, grade in enumerate(grades):
4    print(f"Student {index + 1}: {grade}%")

zip()

Combine multiple lists:

1students = ['Alice', 'Bob', 'Charlie']
2scores = [85, 92, 78]
3
4for student, score in zip(students, scores):
5    print(f"{student}: {score}")
6
7# Create dictionary
8student_dict = dict(zip(students, scores))

1students = ['Alice', 'Bob', 'Charlie']
2scores = [85, 92, 78]
3
4for student, score in zip(students, scores):
5    print(f"{student}: {score}")
6
7# Create dictionary
8student_dict = dict(zip(students, scores))

Error Handling

Real data is messy. Handle errors gracefully or your entire pipeline breaks.

1def safe_divide(num1, num2):
2    """Safely divide two numbers."""
3    try:
4        return num1 / num2
5    except ZeroDivisionError:
6        print("Cannot divide by zero")
7        return None
8    except TypeError:
9        print("Both values must be numbers")
10        return None

1def safe_divide(num1, num2):
2    """Safely divide two numbers."""
3    try:
4        return num1 / num2
5    except ZeroDivisionError:
6        print("Cannot divide by zero")
7        return None
8    except TypeError:
9        print("Both values must be numbers")
10        return None

Always handle edge cases in data science. Missing values, type mismatches, and division by zero are common!

Modules

Organize your code into reusable modules. Essential for building larger projects.

Importing Packages

1# Import without alias
2import pandas
3
4# Import with alias
5import pandas as pd
6
7# Import specific object
8from pandas import DataFrame

1# Import without alias
2import pandas
3
4# Import with alias
5import pandas as pd
6
7# Import specific object
8from pandas import DataFrame

Creating Your Own Module

1# data_utils.py
2"""Utility functions for data science."""
3
4def mean(data):
5    """Calculate the mean of a dataset."""
6    return sum(data) / len(data)
7
8def median(data):
9    """Calculate the median of a dataset."""
10    sorted_data = sorted(data)
11    n = len(sorted_data)
12    if n % 2 == 0:
13        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
14    return sorted_data[n//2]

1# data_utils.py
2"""Utility functions for data science."""
3
4def mean(data):
5    """Calculate the mean of a dataset."""
6    return sum(data) / len(data)
7
8def median(data):
9    """Calculate the median of a dataset."""
10    sorted_data = sorted(data)
11    n = len(sorted_data)
12    if n % 2 == 0:
13        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
14    return sorted_data[n//2]

Using Modules

1# Import your module
2import data_utils
3
4# Use functions
5temperatures = [72, 68, 75, 82, 77]
6avg = data_utils.mean(temperatures)

1# Import your module
2import data_utils
3
4# Use functions
5temperatures = [72, 68, 75, 82, 77]
6avg = data_utils.mean(temperatures)

Standard Library Modules

Python's standard library is a goldmine for data science:

collections

Advanced data structures for complex operations:

1from collections import Counter, defaultdict
2
3# Counter for frequency analysis
4votes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob']
5vote_counts = Counter(votes)
6print(vote_counts.most_common(2))
7# [('Alice', 2), ('Bob', 2)]
8
9# defaultdict for nested dictionaries
10student_scores = defaultdict(list)
11student_scores['Alice'].append(95)
12student_scores['Bob'].append(87)

1from collections import Counter, defaultdict
2
3# Counter for frequency analysis
4votes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob']
5vote_counts = Counter(votes)
6print(vote_counts.most_common(2))
7# [('Alice', 2), ('Bob', 2)]
8
9# defaultdict for nested dictionaries
10student_scores = defaultdict(list)
11student_scores['Alice'].append(95)
12student_scores['Bob'].append(87)

csv

Read and write CSV files—essential for data science:

1import csv
2
3# Reading CSV files
4with open('data.csv', 'r') as file:
5    reader = csv.DictReader(file)
6    for row in reader:
7        print(row['name'], row['score'])

1import csv
2
3# Reading CSV files
4with open('data.csv', 'r') as file:
5    reader = csv.DictReader(file)
6    for row in reader:
7        print(row['name'], row['score'])

json

Work with JSON data from APIs and files:

1import json
2
3# Convert to JSON string
4data = {'name': 'Alice', 'age': 22, 'scores': [95, 87, 92]}
5json_string = json.dumps(data)
6
7# Parse from JSON string
8parsed = json.loads(json_string)

1import json
2
3# Convert to JSON string
4data = {'name': 'Alice', 'age': 22, 'scores': [95, 87, 92]}
5json_string = json.dumps(data)
6
7# Parse from JSON string
8parsed = json.loads(json_string)

Most modern APIs return JSON. You'll use json constantly when working with external data sources.

Lambda Functions

One-line functions for quick operations:

Use lambdas with map(), filter(), and sorted(). They're perfect for applying simple transformations.

1# Lambda for quick calculations
2square = lambda x: x ** 2
3
4# Use with built-in functions
5numbers = [1, 2, 3, 4, 5]
6squared = list(map(lambda x: x ** 2, numbers))
7
8# Sorting with custom key
9students = [
10    {'name': 'Alice', 'score': 85},
11    {'name': 'Bob', 'score': 92},
12    {'name': 'Charlie', 'score': 78}
13]
14
15# Sort by score
16sorted_students = sorted(students, key=lambda x: x['score'], reverse=True)

1# Lambda for quick calculations
2square = lambda x: x ** 2
3
4# Use with built-in functions
5numbers = [1, 2, 3, 4, 5]
6squared = list(map(lambda x: x ** 2, numbers))
7
8# Sorting with custom key
9students = [
10    {'name': 'Alice', 'score': 85},
11    {'name': 'Bob', 'score': 92},
12    {'name': 'Charlie', 'score': 78}
13]
14
15# Sort by score
16sorted_students = sorted(students, key=lambda x: x['score'], reverse=True)

Quick Reference Summary

Bookmark this page for instant lookup!

Data Structures Cheat Sheet

Lists & Dictionaries - Your Main Tools

1# Creating collections
2[1, 2, 3]                    # List of numbers
3['a', 'b', 'c']               # List of strings
4{'name': 'Alice', 'age': 22}  # Dictionary
5x[0]                          # Access first element
6x[-1]                         # Access last element
7x[1:4]                        # Slice elements 1-3

1# Creating collections
2[1, 2, 3]                    # List of numbers
3['a', 'b', 'c']               # List of strings
4{'name': 'Alice', 'age': 22}  # Dictionary
5x[0]                          # Access first element
6x[-1]                         # Access last element
7x[1:4]                        # Slice elements 1-3

Quick Snippets:

len(x) - Get length
x.append(item) - Add to list
x.keys() / x.values() - Dictionary methods
'key' in x - Check membership

Transformations - Pythonic Way

List Comprehensions vs Loops

1# Old way
2result = []
3for x in data:
4    result.append(x * 2)
5
6# Pythonic way
7result = [x * 2 for x in data]
8
9# With condition
10evens = [x for x in data if x % 2 == 0]

1# Old way
2result = []
3for x in data:
4    result.append(x * 2)
5
6# Pythonic way
7result = [x * 2 for x in data]
8
9# With condition
10evens = [x for x in data if x % 2 == 0]

Dictionary Comprehensions

1{x: x**2 for x in range(5)}  # Create mapping
2{x: x for x in data if x > 0}  # Filter while mapping

1{x: x**2 for x in range(5)}  # Create mapping
2{x: x for x in data if x > 0}  # Filter while mapping

Functions & Control Flow

Essential Patterns

1# Define function
2def clean_data(x):
3    return x.strip()
4
5# Lambda (one-liner)
6lambda x: x * 2
7
8# Error handling
9try:
10    result = x / y
11except ZeroDivisionError:
12    result = 0

1# Define function
2def clean_data(x):
3    return x.strip()
4
5# Lambda (one-liner)
6lambda x: x * 2
7
8# Error handling
9try:
10    result = x / y
11except ZeroDivisionError:
12    result = 0

Most Important Rules 🎯

Lists → Ordered data, iterations
Dictionaries → Key-value lookups
Comprehensions → Fast transformations
Functions → Reusable logic
Error handling → Real-world resilience