Backpropagation

definition.md

// chain rule applied backward through the network

Backpropagation is the algorithm that calculates how much each weight in a neural network contributed to the error, making it possible to update all weights efficiently in a single backward pass.

The core idea comes from calculus, specifically the chain rule. If the output depends on a middle layer, which depends on an earlier layer, you can calculate the overall effect by multiplying the individual effects together. Backpropagation applies this reasoning systematically, working backward through every layer from the output to the input.

The name is literal: the error signal propagates backward through the network. At each layer, the algorithm figures out how much that layer's weights contributed to the final mistake and computes exactly how they should change. The result is a gradient for every single weight in the model.

Why It Unlocked Deep Learning

Before backpropagation, training a network with many layers was essentially impractical. You could measure how wrong the output was, but there was no efficient way to figure out which weights, buried several layers deep, were responsible. The only options were slow numerical approximations that did not scale.

Backpropagation changed this by computing all gradients in a single backward pass. It made depth practical. For the first time, you could train networks with many layers without the cost exploding with each layer you added.

The algorithm was known in various forms for years, but it was the 1986 paper by Rumelhart, Hinton, and Williams that demonstrated it working on real problems and convinced the research community it was viable. That moment is often cited as the beginning of the modern neural network era.

Today, when you write loss.backward() in PyTorch or call .fit() in Keras, backpropagation is what runs. The framework builds a graph of all computations during the forward pass, then differentiates through that graph automatically. The math is the same as 1986. The scale is not.

Why It Unlocked Deep Learning