SARATH THARAYILHS.T.[W] WRITEUPSWWRITEUPS[K] CONCEPTSKCONCEPTS[P] PROJECTSPPROJECTS[A] ABOUTAABOUT
മ
/ SYSTEM

Building thoughtful software, writing notes, and shipping experiments across data, AI, and the web.

No cookies, no tracking. Preferences are stored locally in your browser. Anonymous view counts are kept server-side.

© 2026 Sarath Tharayil/IST --:--:--
← CONCEPTS

Backpropagation

The algorithm that calculates how much each weight in a network contributed to the error, making gradient descent possible at scale.

Deep LearningTrainingNeural Networks
2 MIN READ · May 11, 2025
definition.md
123
// chain rule applied backward through the network
Backpropagation is the algorithm that calculates how much each weight in a neural network contributed to the error, making it possible to update all weights efficiently in a single backward pass.

The core idea comes from calculus, specifically the chain rule. If the output depends on a middle layer, which depends on an earlier layer, you can calculate the overall effect by multiplying the individual effects together. Backpropagation applies this reasoning systematically, working backward through every layer from the output to the input.

The name is literal: the error signal propagates backward through the network. At each layer, the algorithm figures out how much that layer's weights contributed to the final mistake and computes exactly how they should change. The result is a gradient for every single weight in the model.

Why It Unlocked Deep Learning

Before backpropagation, training a network with many layers was essentially impractical. You could measure how wrong the output was, but there was no efficient way to figure out which weights, buried several layers deep, were responsible. The only options were slow numerical approximations that did not scale.

Backpropagation changed this by computing all gradients in a single backward pass. It made depth practical. For the first time, you could train networks with many layers without the cost exploding with each layer you added.

The algorithm was known in various forms for years, but it was the 1986 paper by Rumelhart, Hinton, and Williams that demonstrated it working on real problems and convinced the research community it was viable. That moment is often cited as the beginning of the modern neural network era.

Today, when you write loss.backward() in PyTorch or call .fit() in Keras, backpropagation is what runs. The framework builds a graph of all computations during the forward pass, then differentiates through that graph automatically. The math is the same as 1986. The scale is not.

/ RELATED CONCEPTS
Activation FunctionsThe non-linear functions applied after each layer that give neural networks the ability to learn complex patterns.Gradient DescentThe algorithm that teaches a neural network to get better over time by nudging its weights in the right direction.Vanishing Gradient ProblemWhy deep networks and recurrent models struggle to learn when the error signal fades out before reaching the early layers.TransformersThe architecture behind almost every modern AI model, from ChatGPT to translation to image generation.
← BACK TO CONCEPTS