Crash Course for Data Science

Python dominates data science because it’s intuitive, powerful, and backed by amazing libraries like pandas, numpy, and scikit-learn. Whether you’re just starting out or need a quick refresher, this cheat sheet hopefully has everything you need.

Note

Mainly creating this post to check out the subpost feature in this astro template which I feel is great. I didn’t see this feature last time I used erudite and had to copy the structure delba uses in her portfolio blog (or maybe ‘used’). That is still the best possible way to implement though, anyways for the time being I’ll add a post in erudite and later maybe I’ll encorporate that feature as well.

Working with Files

The working directory

The working directory is where Python looks for files by default (e.g., C://file/path).

1
import os
2

3
# Get current working directory
4
wd = os.getcwd()  # '/current/path'
5

6
# List files in directory
7
os.listdir(wd)
8

9
# Change working directory
10
os.chdir('new/working/directory')
11

12
# Common file operations
13
os.rename('old.txt', 'new.txt')  # Rename
14
os.remove('file.txt')             # Delete
15
os.mkdir('new_folder')             # Create folder

Operators

Operators let you perform mathematical operations, comparisons, and logical tests. Master these fundamentals first.

Arithmetic Operators

1
# Addition
2
10 + 2  # 12
3

4
# Subtraction
5
10 - 2  # 8
6

7
# Multiplication
8
4 * 6  # 24
9

10
# Division
11
22 / 7  # 3.142857...
12

13
# Integer division
14
22 // 7  # 3
15

16
# Power (exponentiation)
17
3 ** 4  # 81
18

19
# Modulo (remainder)
20
22 % 7  # 1

Assignment Operators

1
# Assign a value
2
a = 5
3

4
# Change list item
5
x[0] = 1

Comparison Operators

1
# Test equality
2
3 == 3  # True
3

4
# Test inequality
5
3 != 3  # False
6

7
# Greater than
8
3 > 1  # True
9

10
# Greater than or equal
11
3 >= 3  # True
12

13
# Less than
14
3 < 4  # True
15

16
# Less than or equal
17
3 <= 4  # True

Logical Operators

1
# Logical NOT
2
not (2 == 2)  # False
3

4
# Logical AND
5
(1 != 1) and (1 < 1)  # False
6

7
# Logical OR
8
(1 == 1) or (1 < 1)  # True

Lists

Lists are the bread and butter of data science. They store sequences of values: numbers, text, even other lists!

Use lists when you need ordered data that you’ll iterate through or transform.

Creating Lists

1
# Create lists with [], elements separated by commas
2
x = [1, 3, 2, 4]
3
fruits = ['apple', 'banana', 'orange']
4
mixed = [1, 'hello', 3.14, True]

List Functions and Methods

1
# Return sorted copy
2
sorted([3, 1, 2])  # [1, 2, 3]
3

4
# Sort in place
5
x.sort()
6

7
# Reverse order
8
reversed(x)  # Returns reversed iterator
9

10
# Reverse in place
11
x.reverse()
12

13
# Count elements
14
x.count(2)  # Number of times 2 appears

Selecting List Elements

Lists are zero-indexed (first element has index 0).

1
x = ['a', 'b', 'c', 'd', 'e']
2

3
x[0]      # 'a' (first element)
4
x[-1]     # 'e' (last element)
5
x[1:3]    # ['b', 'c'] (1st inclusive, 3rd exclusive)
6
x[2:]     # ['c', 'd', 'e'] (2nd to end)
7
x[:3]     # ['a', 'b', 'c'] (0th to 3rd exclusive)

Concatenating Lists

1
x = [1, 3, 6]
2
y = [10, 15, 21]
3

4
x + y           # [1, 3, 6, 10, 15, 21]
5
3 * x           # [1, 3, 6, 1, 3, 6, 1, 3, 6]

Dictionaries

Think of dictionaries as lookup tables. Perfect for storing structured data, survey responses, and configuration settings.

Use dictionaries when: You need fast lookups by name/key rather than position.

Creating Dictionaries

1
# Create a dictionary with {}
2
student = {'name': 'Alice', 'age': 22, 'grade': 'A'}
3
scores = {'math': 95, 'science': 87, 'history': 92}

Dictionary Functions and Methods

1
x = {'a': 1, 'b': 2, 'c': 3}
2

3
x.keys()        # dict_keys(['a', 'b', 'c'])
4
x.values()      # dict_values([1, 2, 3])
5
x['a']          # 1 (get value by key)
6
x.get('d', 0)   # 0 (get with default)

Dictionary Operations

1
# Add or update
2
student['gpa'] = 3.75
3

4
# Remove
5
del student['age']
6

7
# Check if key exists
8
'name' in student  # True

Strings

Work with text data efficiently. String manipulation is essential for cleaning data and extracting insights.

Creating Strings

In data science: You’ll parse filenames, clean text columns, extract patterns.

1
# Single line strings
2
"DataCamp"
3
'DataCamp'
4

5
# Escape quotes
6
"He said, \"DataCamp\""
7

8
# Multi-line strings
9
"""
10
A Frame of Data
11
Tidy, Mine, Analyze It
12
Now You Have Meaning
13
"""

String Operations

1
str = "DataCamp"
2

3
str[0]           # 'D' (first character)
4
str[0:4]         # 'Data' (substring)
5
str.upper()      # 'DATACAMP'
6
str.lower()      # 'datacamp'
7
str.title()      # 'Datacamp'
8
str.replace('a', 'e')  # 'DetCe' (replace all)

Combining Strings

1
"Data" + "Framed"      # 'DataFramed'
2
3 * "data "            # 'data data data '
3
"beekeepers".split('e')  # ['b', '', 'k', '', 'p', 'rs']

Functions

Functions transform data from one shape to another. They’re the building blocks of data pipelines.

Basic Functions

1
def calculate_mean(numbers):
2
    """Calculate the mean of a list of numbers."""
3
    if not numbers:
4
        return 0
5
    return sum(numbers) / len(numbers)
6

7
# Usage
8
temperatures = [72, 68, 75, 82, 77]
9
avg_temp = calculate_mean(temperatures)
10
print(f"Average: {avg_temp}°F")

Function Parameters

1
# Default parameters
2
def greet(name="Guest"):
3
    return f"Hello, {name}"
4

5
greet()           # 'Hello, Guest'
6
greet("Alice")    # 'Hello, Alice'
7

8
# Multiple return values
9
def stats(data):
10
    return min(data), max(data), sum(data)/len(data)
11

12
min_val, max_val, mean = stats([1, 5, 3, 9, 2])

Comprehensions

Python’s superpower for data transformations. List and dictionary comprehensions are faster and more readable than loops.

List Comprehensions

1
# Traditional loop
2
squared = []
3
for x in range(1, 6):
4
    squared.append(x ** 2)
5

6
# List comprehension
7
squared = [x ** 2 for x in range(1, 6)]
8

9
# With condition (filtering)
10
even_squares = [x ** 2 for x in range(1, 11) if x % 2 == 0]
11
# Result: [4, 16, 36, 64, 100]

Dictionary Comprehensions

Transform data structures efficiently:

1
# Create dictionary of squares
2
squares = {x: x**2 for x in range(1, 6)}
3
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
4

5
# Filter and transform
6
temperatures = {'Mon': 72, 'Tue': 68, 'Wed': 75, 'Thu': 82}
7
hot_days = {day: temp for day, temp in temperatures.items() if temp > 75}

Built-in Functions

Python’s standard library has powerful functions that save you time. Learn these well.

enumerate()

Loop with both index and value together:

1
grades = [85, 92, 78, 96]
2

3
for index, grade in enumerate(grades):
4
    print(f"Student {index + 1}: {grade}%")

zip()

Combine multiple lists:

1
students = ['Alice', 'Bob', 'Charlie']
2
scores = [85, 92, 78]
3

4
for student, score in zip(students, scores):
5
    print(f"{student}: {score}")
6

7
# Create dictionary
8
student_dict = dict(zip(students, scores))

Error Handling

Real data is messy. Handle errors gracefully or your entire pipeline breaks.

1
def safe_divide(num1, num2):
2
    """Safely divide two numbers."""
3
    try:
4
        return num1 / num2
5
    except ZeroDivisionError:
6
        print("Cannot divide by zero")
7
        return None
8
    except TypeError:
9
        print("Both values must be numbers")
10
        return None

Modules

Organize your code into reusable modules. Essential for building larger projects.

Importing Packages

1
# Import without alias
2
import pandas
3

4
# Import with alias
5
import pandas as pd
6

7
# Import specific object
8
from pandas import DataFrame

Creating Your Own Module

1
"""Utility functions for data science."""
2

3
def mean(data):
4
    """Calculate the mean of a dataset."""
5
    return sum(data) / len(data)
6

7
def median(data):
8
    """Calculate the median of a dataset."""
9
    sorted_data = sorted(data)
10
    n = len(sorted_data)
11
    if n % 2 == 0:
12
        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
13
    return sorted_data[n//2]

Using Modules

1
# Import your module
2
import data_utils
3

4
# Use functions
5
temperatures = [72, 68, 75, 82, 77]
6
avg = data_utils.mean(temperatures)

Standard Library Modules

Python’s standard library is a goldmine for data science:

collections

Advanced data structures for complex operations:

1
from collections import Counter, defaultdict
2

3
# Counter for frequency analysis
4
votes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob']
5
vote_counts = Counter(votes)
6
print(vote_counts.most_common(2))
7
# [('Alice', 2), ('Bob', 2)]
8

9
# defaultdict for nested dictionaries
10
student_scores = defaultdict(list)
11
student_scores['Alice'].append(95)
12
student_scores['Bob'].append(87)

csv

Read and write CSV files—essential for data science:

1
import csv
2

3
# Reading CSV files
4
with open('data.csv', 'r') as file:
5
    reader = csv.DictReader(file)
6
    for row in reader:
7
        print(row['name'], row['score'])

json

Work with JSON data from APIs and files:

1
import json
2

3
# Convert to JSON string
4
data = {'name': 'Alice', 'age': 22, 'scores': [95, 87, 92]}
5
json_string = json.dumps(data)
6

7
# Parse from JSON string
8
parsed = json.loads(json_string)

Lambda Functions

One-line functions for quick operations:

1
# Lambda for quick calculations
2
square = lambda x: x ** 2
3

4
# Use with built-in functions
5
numbers = [1, 2, 3, 4, 5]
6
squared = list(map(lambda x: x ** 2, numbers))
7

8
# Sorting with custom key
9
students = [
10
    {'name': 'Alice', 'score': 85},
11
    {'name': 'Bob', 'score': 92},
12
    {'name': 'Charlie', 'score': 78}
13
]
14

15
# Sort by score
16
sorted_students = sorted(students, key=lambda x: x['score'], reverse=True)

Quick Reference Summary

Bookmark this page for instant lookup!

Data Structures Cheat Sheet

Lists & Dictionaries - Your Main Tools

1
# Creating collections
2
[1, 2, 3]                    # List of numbers
3
['a', 'b', 'c']               # List of strings
4
{'name': 'Alice', 'age': 22}  # Dictionary
5
x[0]                          # Access first element
6
x[-1]                         # Access last element
7
x[1:4]                        # Slice elements 1-3

Quick Snippets:

len(x) - Get length
x.append(item) - Add to list
x.keys() / x.values() - Dictionary methods
'key' in x - Check membership

Transformations - Pythonic Way

List Comprehensions vs Loops

1
# Old way
2
result = []
3
for x in data:
4
    result.append(x * 2)
5

6
# Pythonic way
7
result = [x * 2 for x in data]
8

9
# With condition
10
evens = [x for x in data if x % 2 == 0]

Dictionary Comprehensions

1
{x: x**2 for x in range(5)}  # Create mapping
2
{x: x for x in data if x > 0}  # Filter while mapping

Functions & Control Flow

Essential Patterns

1
# Define function
2
def clean_data(x):
3
    return x.strip()
4

5
# Lambda (one-liner)
6
lambda x: x * 2
7

8
# Error handling
9
try:
10
    result = x / y
11
except ZeroDivisionError:
12
    result = 0

Most Important Rules 🎯

Lists → Ordered data, iterations
Dictionaries → Key-value lookups
Comprehensions → Fast transformations
Functions → Reusable logic
Error handling → Real-world resilience