Gradient Descent Learning Rate and Iterations
In this short exercise, I explored how gradient descent reduces a cost function by repeatedly updating model parameters.
Google Colab notebook: Open in Google Colab
The tutorial uses a simple linear regression example where the input values are:
x = [1, 2, 3, 4, 5]
y = [5, 7, 9, 11, 13]
This data follows the relationship:
y = 2x + 3
The goal of gradient descent is to learn values for the slope m and intercept b that minimise the cost. In this case, the cost function is mean squared error, which measures how far the model’s predictions are from the true values.
Experiment Setup
I changed two values in the tutorial:
iterations: how many times gradient descent updates the parameterslearning_rate: how large each update step is
The original tutorial uses:
iterations = 100
learning_rate = 0.08
I tested several combinations and recorded the final cost.
Results
| Experiment | Iterations | Learning Rate | Final Cost | Final m | Final b |
|---|---|---|---|---|---|
| Fewer iterations | 10 | 0.08 | 12.0468 | 1.6309 | 1.0383 |
| Default tutorial | 100 | 0.08 | 0.0041 | 2.0405 | 2.8536 |
| More iterations | 1000 | 0.08 | ~0.0000 | 2.0000 | 3.0000 |
| Small learning rate | 100 | 0.01 | 0.4804 | 2.4485 | 1.3809 |
| Moderate learning rate | 100 | 0.05 | 0.0321 | 2.1144 | 2.5870 |
| Too high learning rate | 100 | 0.10 | Diverged | Very large negative value | Very large negative value |
Observations
Increasing the number of iterations helped the model get closer to the correct line. With only 10 iterations, the cost dropped from the initial value of 89 to about 12.05, but the model was still far from the ideal values of m = 2 and b = 3.
With 100 iterations and a learning rate of 0.08, the cost decreased significantly to around 0.0041. The learned parameters were close to the expected values, with m ≈ 2.04 and b ≈ 2.85.
When I increased the number of iterations to 1000 while keeping the learning rate at 0.08, gradient descent reached the expected result almost exactly. The cost became effectively zero, and the model learned m = 2 and b = 3.
The learning rate had a major effect on the result. A smaller learning rate such as 0.01 still reduced the cost, but convergence was slower. A moderate learning rate such as 0.05 performed better, but 0.08 gave the best result among the stable values I tested.
However, increasing the learning rate too much caused the algorithm to diverge. With learning_rate = 0.10, the cost became extremely large instead of decreasing. This shows that gradient descent can overshoot the minimum when the step size is too large.
Conclusion
This exercise shows that gradient descent depends heavily on both the number of iterations and the learning rate. More iterations give the algorithm more chances to reduce the cost, but only if the learning rate is appropriate.
A learning rate that is too small makes training slow, while a learning rate that is too large can make the model unstable and cause the cost to increase. In this example, learning_rate = 0.08 with enough iterations gave the best balance between speed and stability.
