For each experiment we will run on two epochs and observe the training loss. We will generate the same initial seed to hopefully ensure reproducibility. We also allow different validation functions to explore how different changes affect reproducibility.
======== Training With Validation ========
[Epoch 1, Iter 2000] loss: 2.131
[Epoch 1, Iter 4000] loss: 1.837
[Epoch 1, Iter 6000] loss: 1.676
[Epoch 1, Iter 8000] loss: 1.588
[Epoch 1, Iter 10000] loss: 1.567
[Epoch 1, Iter 12000] loss: 1.532
Val [Epoch 1, 2500] loss: 1.453
[Epoch 2, Iter 2000] loss: 1.465
[Epoch 2, Iter 4000] loss: 1.457
[Epoch 2, Iter 6000] loss: 1.411
[Epoch 2, Iter 8000] loss: 1.365
[Epoch 2, Iter 10000] loss: 1.388
[Epoch 2, Iter 12000] loss: 1.359
Val [Epoch 2, 2500] loss: 1.277
Finished Training
======== Training Without Validation ========
[Epoch 1, Iter 2000] loss: 2.131
[Epoch 1, Iter 4000] loss: 1.837
[Epoch 1, Iter 6000] loss: 1.676
[Epoch 1, Iter 8000] loss: 1.588
[Epoch 1, Iter 10000] loss: 1.567
[Epoch 1, Iter 12000] loss: 1.532
[Epoch 2, Iter 2000] loss: 1.463
[Epoch 2, Iter 4000] loss: 1.458
[Epoch 2, Iter 6000] loss: 1.410
[Epoch 2, Iter 8000] loss: 1.361
[Epoch 2, Iter 10000] loss: 1.386
[Epoch 2, Iter 12000] loss: 1.359
Finished Training
We can see from the output that the training losses are consistent across the first epoch but differ after. As a result with the same experiment settings and seemingly the same hyperparameters we are getting different results.
Investigation
As discussed before, PyTorch’s RNG is consumed whenever a random function is called. My first instincts were that the model was actually causing the differing results since we are running through the forward pass of the model during validation. Specifically, the dropout layer may be the cause.
======== Training With Validation without Dropout ========
[Epoch 1, Iter 2000] loss: 2.111
[Epoch 1, Iter 4000] loss: 1.805
[Epoch 1, Iter 6000] loss: 1.639
[Epoch 1, Iter 8000] loss: 1.546
[Epoch 1, Iter 10000] loss: 1.529
[Epoch 1, Iter 12000] loss: 1.478
Val [Epoch 1, 2500] loss: 1.418
[Epoch 2, Iter 2000] loss: 1.413
[Epoch 2, Iter 4000] loss: 1.406
[Epoch 2, Iter 6000] loss: 1.346
[Epoch 2, Iter 8000] loss: 1.312
[Epoch 2, Iter 10000] loss: 1.325
[Epoch 2, Iter 12000] loss: 1.290
Val [Epoch 2, 2500] loss: 1.298
Finished Training
======== Training Without Validation without Dropout ========
[Epoch 1, Iter 2000] loss: 2.111
[Epoch 1, Iter 4000] loss: 1.805
[Epoch 1, Iter 6000] loss: 1.639
[Epoch 1, Iter 8000] loss: 1.546
[Epoch 1, Iter 10000] loss: 1.529
[Epoch 1, Iter 12000] loss: 1.478
[Epoch 2, Iter 2000] loss: 1.413
[Epoch 2, Iter 4000] loss: 1.406
[Epoch 2, Iter 6000] loss: 1.346
[Epoch 2, Iter 8000] loss: 1.312
[Epoch 2, Iter 10000] loss: 1.325
[Epoch 2, Iter 12000] loss: 1.290
Finished Training
Amazingly, the problem appears to be fixed as the results are the same! However, does this mean we can never use randomness in our models?
We of course should be able to use dropout layers in our models even if we want to have reproducible results. The issue is not with the model, rather it is with the PyTorch DataLoader itself. During validation, when we call net.eval(), the dropout layer is disabled, so the forward pass during validation should not be the issue. The same results in this case is due to the model not requiring any randomness at all! To illustrate that the problem is with the DataLoader, let’s remove the forward pass in the validation function altogether.
======== Training With Validation Skip Forward Pass ========
[Epoch 1, Iter 2000] loss: 2.131
[Epoch 1, Iter 4000] loss: 1.837
[Epoch 1, Iter 6000] loss: 1.676
[Epoch 1, Iter 8000] loss: 1.588
[Epoch 1, Iter 10000] loss: 1.567
[Epoch 1, Iter 12000] loss: 1.532
Val [Epoch 1, 1] loss: 0.000
[Epoch 2, Iter 2000] loss: 1.465
[Epoch 2, Iter 4000] loss: 1.457
[Epoch 2, Iter 6000] loss: 1.411
[Epoch 2, Iter 8000] loss: 1.365
[Epoch 2, Iter 10000] loss: 1.388
[Epoch 2, Iter 12000] loss: 1.359
Val [Epoch 2, 1] loss: 0.000
Finished Training
======== Training Without Validation ========
[Epoch 1, Iter 2000] loss: 2.131
[Epoch 1, Iter 4000] loss: 1.837
[Epoch 1, Iter 6000] loss: 1.676
[Epoch 1, Iter 8000] loss: 1.588
[Epoch 1, Iter 10000] loss: 1.567
[Epoch 1, Iter 12000] loss: 1.532
[Epoch 2, Iter 2000] loss: 1.463
[Epoch 2, Iter 4000] loss: 1.458
[Epoch 2, Iter 6000] loss: 1.410
[Epoch 2, Iter 8000] loss: 1.361
[Epoch 2, Iter 10000] loss: 1.386
[Epoch 2, Iter 12000] loss: 1.359
Finished Training
As we can see, the results still differ after the first epoch. The only difference between the different runs is the for loop over the validation DataLoader object. It is here where the reproducibility issues occur. Let’s do some further investigation into the Dataloader. If we look into the DataLoader’s __iter__() method, called during the initialization of the for loop iterator, we see that it creates a _SingleProcessDataLoaderIter or _MultiProcessingDataLoaderIter object. Both these classes derive from the base class _BaseDataLoaderIter. If we look at the __init__() method of this object, we see the following line of code.
Alas we have found the source of our problem! This call consumes PyTorch’s RNG and results in a different RNG state when we train in the next epoch. Because our model’s forward pass involves dropout and additional generations of random numbers, the different RNG state results in different elements of the input tensor being zeroed by dropout, leading to different output tensors, different losses, and overall, different results!
Fix
With the issue identified, all we have to do to fix the problem is to ensure the RNG state after we finish validation is the same as the RNG state before start the forloop during validation. We can use the torch.get_rng_state() and torch.set_rng_state() functions.
======== Training With Validation Ensure RNG State ========
[Epoch 1, Iter 2000] loss: 2.131
[Epoch 1, Iter 4000] loss: 1.837
[Epoch 1, Iter 6000] loss: 1.676
[Epoch 1, Iter 8000] loss: 1.588
[Epoch 1, Iter 10000] loss: 1.567
[Epoch 1, Iter 12000] loss: 1.532
Val [Epoch 1, 2500] loss: 1.453
[Epoch 2, Iter 2000] loss: 1.463
[Epoch 2, Iter 4000] loss: 1.458
[Epoch 2, Iter 6000] loss: 1.410
[Epoch 2, Iter 8000] loss: 1.361
[Epoch 2, Iter 10000] loss: 1.386
[Epoch 2, Iter 12000] loss: 1.359
Val [Epoch 2, 2500] loss: 1.273
Finished Training
======== Training Without Validation ========
[Epoch 1, Iter 2000] loss: 2.131
[Epoch 1, Iter 4000] loss: 1.837
[Epoch 1, Iter 6000] loss: 1.676
[Epoch 1, Iter 8000] loss: 1.588
[Epoch 1, Iter 10000] loss: 1.567
[Epoch 1, Iter 12000] loss: 1.532
[Epoch 2, Iter 2000] loss: 1.463
[Epoch 2, Iter 4000] loss: 1.458
[Epoch 2, Iter 6000] loss: 1.410
[Epoch 2, Iter 8000] loss: 1.361
[Epoch 2, Iter 10000] loss: 1.386
[Epoch 2, Iter 12000] loss: 1.359
Finished Training
With our new understanding of PyTorch RNGs and DataLoaders, we can now more confidently run reproducible experiments and derive robust conclusions!