gradient descent

by saju

Why do we need it : to adjust the best fit line with minimal loss
What does it do : it iterates the weight & bias values till the loss becomes minimal.

20250523_1321_Gradient Descent Illustration_remix_01jvy1xdzpe0abw9w1wy8ez6w0.png

for reference example we take the following problem→

Screenshot 2025-05-23 at 12.29.49 PM.png

Goal: To find a straight line that best represents the relationship between input features and output labels in our data.
Model Equation:

f_w,b(x) = (w * x) + b
- x: Input feature (e.g., Pounds in 1000s)
- y: Actual Label (e.g., Miles per gallon)
- f_w,b(x): Our model's prediction (the point on the line for a given x)
- w (Weight): Controls the slope or steepness of the line.
- b (Bias): Controls the y-intercept (where the line crosses the Y-axis).

Why we need it: We need a way to quantify how "wrong" our current line is. The lower the loss, the better the fit.
Formula:

Loss (MSE) = 1/M * Σ(i=1 to M) (f_w,b(x^(i)) - y^(i))^2
- M: Number of training examples
- (f_w,b(x^(i)) - y^(i)): The error for a single training example (prediction minus actual). Squared to make it positive and penalize larger errors more.
- Σ: Sum of squared errors for all examples.
- 1/M: Averages the squared errors.