Crash Course for Data Science
Overview

Crash Course for Data Science

Crash Course for Data Science
· December 15, 2025 · 5 min read (18 min read total) · 2 parts

Python dominates data science because it’s intuitive, powerful, and backed by amazing libraries like pandas, numpy, and scikit-learn. Whether you’re just starting out or need a quick refresher, this cheat sheet hopefully has everything you need.

Note

Mainly creating this post to check out the subpost feature in this astro template which I feel is great. I didn’t see this feature last time I used erudite and had to copy the structure delba uses in her portfolio blog (or maybe ‘used’). That is still the best possible way to implement though, anyways for the time being I’ll add a post in erudite and later maybe I’ll encorporate that feature as well.

Working with Files

The working directory

The working directory is where Python looks for files by default (e.g., C://file/path).

import os
# Get current working directory
wd = os.getcwd() # '/current/path'
# List files in directory
os.listdir(wd)
# Change working directory
os.chdir('new/working/directory')
# Common file operations
os.rename('old.txt', 'new.txt') # Rename
os.remove('file.txt') # Delete
os.mkdir('new_folder') # Create folder

Operators

Operators let you perform mathematical operations, comparisons, and logical tests. Master these fundamentals first.

Arithmetic Operators

# Addition
10 + 2 # 12
# Subtraction
10 - 2 # 8
# Multiplication
4 * 6 # 24
# Division
22 / 7 # 3.142857...
# Integer division
22 // 7 # 3
# Power (exponentiation)
3 ** 4 # 81
# Modulo (remainder)
22 % 7 # 1

Assignment Operators

# Assign a value
a = 5
# Change list item
x[0] = 1

Comparison Operators

# Test equality
3 == 3 # True
# Test inequality
3 != 3 # False
# Greater than
3 > 1 # True
# Greater than or equal
3 >= 3 # True
# Less than
3 < 4 # True
# Less than or equal
3 <= 4 # True

Logical Operators

# Logical NOT
not (2 == 2) # False
# Logical AND
(1 != 1) and (1 < 1) # False
# Logical OR
(1 == 1) or (1 < 1) # True

Lists

Lists are the bread and butter of data science. They store sequences of values: numbers, text, even other lists!

Use lists when you need ordered data that you’ll iterate through or transform.

Creating Lists

# Create lists with [], elements separated by commas
x = [1, 3, 2, 4]
fruits = ['apple', 'banana', 'orange']
mixed = [1, 'hello', 3.14, True]

List Functions and Methods

# Return sorted copy
sorted([3, 1, 2]) # [1, 2, 3]
# Sort in place
x.sort()
# Reverse order
reversed(x) # Returns reversed iterator
# Reverse in place
x.reverse()
# Count elements
x.count(2) # Number of times 2 appears

Selecting List Elements

Lists are zero-indexed (first element has index 0).

x = ['a', 'b', 'c', 'd', 'e']
x[0] # 'a' (first element)
x[-1] # 'e' (last element)
x[1:3] # ['b', 'c'] (1st inclusive, 3rd exclusive)
x[2:] # ['c', 'd', 'e'] (2nd to end)
x[:3] # ['a', 'b', 'c'] (0th to 3rd exclusive)

Concatenating Lists

x = [1, 3, 6]
y = [10, 15, 21]
x + y # [1, 3, 6, 10, 15, 21]
3 * x # [1, 3, 6, 1, 3, 6, 1, 3, 6]

Dictionaries

Think of dictionaries as lookup tables. Perfect for storing structured data, survey responses, and configuration settings.

Use dictionaries when: You need fast lookups by name/key rather than position.

Creating Dictionaries

# Create a dictionary with {}
student = {'name': 'Alice', 'age': 22, 'grade': 'A'}
scores = {'math': 95, 'science': 87, 'history': 92}

Dictionary Functions and Methods

x = {'a': 1, 'b': 2, 'c': 3}
x.keys() # dict_keys(['a', 'b', 'c'])
x.values() # dict_values([1, 2, 3])
x['a'] # 1 (get value by key)
x.get('d', 0) # 0 (get with default)

Dictionary Operations

# Add or update
student['gpa'] = 3.75
# Remove
del student['age']
# Check if key exists
'name' in student # True

Strings

Work with text data efficiently. String manipulation is essential for cleaning data and extracting insights.

Creating Strings

In data science: You’ll parse filenames, clean text columns, extract patterns.

# Single line strings
"DataCamp"
'DataCamp'
# Escape quotes
"He said, \"DataCamp\""
# Multi-line strings
"""
A Frame of Data
Tidy, Mine, Analyze It
Now You Have Meaning
"""

String Operations

str = "DataCamp"
str[0] # 'D' (first character)
str[0:4] # 'Data' (substring)
str.upper() # 'DATACAMP'
str.lower() # 'datacamp'
str.title() # 'Datacamp'
str.replace('a', 'e') # 'DetCe' (replace all)

Combining Strings

"Data" + "Framed" # 'DataFramed'
3 * "data " # 'data data data '
"beekeepers".split('e') # ['b', '', 'k', '', 'p', 'rs']

Functions

Functions transform data from one shape to another. They’re the building blocks of data pipelines.

Basic Functions

def calculate_mean(numbers):
"""Calculate the mean of a list of numbers."""
if not numbers:
return 0
return sum(numbers) / len(numbers)
# Usage
temperatures = [72, 68, 75, 82, 77]
avg_temp = calculate_mean(temperatures)
print(f"Average: {avg_temp}°F")

Function Parameters

# Default parameters
def greet(name="Guest"):
return f"Hello, {name}"
greet() # 'Hello, Guest'
greet("Alice") # 'Hello, Alice'
# Multiple return values
def stats(data):
return min(data), max(data), sum(data)/len(data)
min_val, max_val, mean = stats([1, 5, 3, 9, 2])

Comprehensions

Python’s superpower for data transformations. List and dictionary comprehensions are faster and more readable than loops.

List Comprehensions

# Traditional loop
squared = []
for x in range(1, 6):
squared.append(x ** 2)
# List comprehension
squared = [x ** 2 for x in range(1, 6)]
# With condition (filtering)
even_squares = [x ** 2 for x in range(1, 11) if x % 2 == 0]
# Result: [4, 16, 36, 64, 100]

Dictionary Comprehensions

Transform data structures efficiently:

# Create dictionary of squares
squares = {x: x**2 for x in range(1, 6)}
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Filter and transform
temperatures = {'Mon': 72, 'Tue': 68, 'Wed': 75, 'Thu': 82}
hot_days = {day: temp for day, temp in temperatures.items() if temp > 75}

Built-in Functions

Python’s standard library has powerful functions that save you time. Learn these well.

enumerate()

Loop with both index and value together:

grades = [85, 92, 78, 96]
for index, grade in enumerate(grades):
print(f"Student {index + 1}: {grade}%")

zip()

Combine multiple lists:

students = ['Alice', 'Bob', 'Charlie']
scores = [85, 92, 78]
for student, score in zip(students, scores):
print(f"{student}: {score}")
# Create dictionary
student_dict = dict(zip(students, scores))

Error Handling

Real data is messy. Handle errors gracefully or your entire pipeline breaks.

def safe_divide(num1, num2):
"""Safely divide two numbers."""
try:
return num1 / num2
except ZeroDivisionError:
print("Cannot divide by zero")
return None
except TypeError:
print("Both values must be numbers")
return None

Modules

Organize your code into reusable modules. Essential for building larger projects.

Importing Packages

# Import without alias
import pandas
# Import with alias
import pandas as pd
# Import specific object
from pandas import DataFrame

Creating Your Own Module

data_utils.py
"""Utility functions for data science."""
def mean(data):
"""Calculate the mean of a dataset."""
return sum(data) / len(data)
def median(data):
"""Calculate the median of a dataset."""
sorted_data = sorted(data)
n = len(sorted_data)
if n % 2 == 0:
return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
return sorted_data[n//2]

Using Modules

# Import your module
import data_utils
# Use functions
temperatures = [72, 68, 75, 82, 77]
avg = data_utils.mean(temperatures)

Standard Library Modules

Python’s standard library is a goldmine for data science:

collections

Advanced data structures for complex operations:

from collections import Counter, defaultdict
# Counter for frequency analysis
votes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob']
vote_counts = Counter(votes)
print(vote_counts.most_common(2))
# [('Alice', 2), ('Bob', 2)]
# defaultdict for nested dictionaries
student_scores = defaultdict(list)
student_scores['Alice'].append(95)
student_scores['Bob'].append(87)

csv

Read and write CSV files—essential for data science:

import csv
# Reading CSV files
with open('data.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row['name'], row['score'])

json

Work with JSON data from APIs and files:

import json
# Convert to JSON string
data = {'name': 'Alice', 'age': 22, 'scores': [95, 87, 92]}
json_string = json.dumps(data)
# Parse from JSON string
parsed = json.loads(json_string)

Lambda Functions

One-line functions for quick operations:

# Lambda for quick calculations
square = lambda x: x ** 2
# Use with built-in functions
numbers = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x ** 2, numbers))
# Sorting with custom key
students = [
{'name': 'Alice', 'score': 85},
{'name': 'Bob', 'score': 92},
{'name': 'Charlie', 'score': 78}
]
# Sort by score
sorted_students = sorted(students, key=lambda x: x['score'], reverse=True)

Quick Reference Summary

Bookmark this page for instant lookup!

Data Structures Cheat Sheet

Lists & Dictionaries - Your Main Tools

# Creating collections
[1, 2, 3] # List of numbers
['a', 'b', 'c'] # List of strings
{'name': 'Alice', 'age': 22} # Dictionary
x[0] # Access first element
x[-1] # Access last element
x[1:4] # Slice elements 1-3

Quick Snippets:

  • len(x) - Get length
  • x.append(item) - Add to list
  • x.keys() / x.values() - Dictionary methods
  • 'key' in x - Check membership

Transformations - Pythonic Way

List Comprehensions vs Loops

# Old way
result = []
for x in data:
result.append(x * 2)
# Pythonic way
result = [x * 2 for x in data]
# With condition
evens = [x for x in data if x % 2 == 0]

Dictionary Comprehensions

{x: x**2 for x in range(5)} # Create mapping
{x: x for x in data if x > 0} # Filter while mapping

Functions & Control Flow

Essential Patterns

# Define function
def clean_data(x):
return x.strip()
# Lambda (one-liner)
lambda x: x * 2
# Error handling
try:
result = x / y
except ZeroDivisionError:
result = 0

Most Important Rules 🎯

  • Lists → Ordered data, iterations
  • Dictionaries → Key-value lookups
  • Comprehensions → Fast transformations
  • Functions → Reusable logic
  • Error handling → Real-world resilience