Category: Machine Learning

  • PyTorch for Neural Networks Part 6: Understanding Epochs and Loss

    PyTorch for Neural Networks Part 6: Understanding Epochs and Loss

    In the previous article, we prepared everything we need to optimize our neural network and find the ideal value for the final bias.

    Now we’ll begin the optimization process—step by step. Keep it simple. Make progress visible.


    Creating the Optimizer

    First, we create an optimizer object. We’ll use Stochastic Gradient Descent (SGD) to optimize final_bias:

    optimizer = SGD(model.parameters(), lr=0.1)
    

    To optimize final_bias, we pass model.parameters() to SGD. PyTorch will automatically optimize every parameter where requires_grad=True. In our case, only final_bias has requires_grad=True, so that is the only parameter that will be updated during training.

    Here, lr is the learning rate, set to 0.1. It controls how large each update step is during optimization.


    Understanding Epochs

    Before we continue, let’s clarify one key term: an epoch is one complete pass through the entire training dataset.

    In this example, our training data contains 3 data points. Every time all 3 points are passed through the model once, we call it one epoch.


    Running the Optimization Loop

    We can start the optimization with a loop that counts epochs:

    for epoch in range(100):
        ...
    

    This loop will run the training process 100 times. In other words, the model will see the full training dataset 100 times.


    Tracking the Loss

    Next, we initialize a variable called total_loss. This stores the loss, a measure of how well the model fits the training data.

    Here’s a simple way to see what loss reflects. In the figure below, the unoptimized model fits the training data poorly. The residuals (the difference between the model’s predictions and the true values) are large. Because the residuals are large, the loss is also relatively large.

    Now imagine the model improves and fits the training data more closely. The residuals become smaller. In this case, the loss becomes smaller because the predictions are closer to the correct values.

    So during each epoch, we use total_loss to track how well the model fits the training data. Watching it decrease helps you see learning in action.

    We will continue building the optimization process in the next article.

    Reference: View article