Regressing sinusoids for a given number of iterations
On a previous series of posts I wrote about how using a small batch size led to better results on the task of regressing a sinusoid using stochastic gradient descent. https://fulkast.medium.com/sinusoid-regression-the-stats-receipts-87785297eb7c
However, I wasn't fully convinced that the smaller batch size was the sole cause of the better performance. Because in the previous formulation, a smaller batch size also meant more optimisation iterations per epoch.
In this post I am going to show that even when the number of optimisation iterations is controlled and fixed for all batch sizes, smaller batch sizes show a better performance in the task of sinusoid regression.
In my new formulation, regardless of the batch size used, the different runs of the training task are shown the same number of batches. This is different from my previous formulation in that now, training runs with bigger batch sizes will effectively "see" more samples of the data (really large batch sizes might see the entire dataset more than once). In the previous formulation, all training tasks are set up to see the full dataset only once i.e. one epoch.
I'm pretty convinced now that, controlling for all other parameters, smaller batch sizes (but not 1) yield better training results for regressing sinusoids. Next up I am going to see if this observation holds for different tasks such as regressing affine functions and eventually maybe even images!