Regressing sinusoids for a given number of iterations

2 min readFeb 14, 2021

On a previous series of posts I wrote about how using a small batch size led to better results on the task of regressing a sinusoid using stochastic gradient descent. https://fulkast.medium.com/sinusoid-regression-the-stats-receipts-87785297eb7c

However, I wasn't fully convinced that the smaller batch size was the sole cause of the better performance. Because in the previous formulation, a smaller batch size also meant more optimisation iterations per epoch.

In this post I am going to show that even when the number of optimisation iterations is controlled and fixed for all batch sizes, smaller batch sizes show a better performance in the task of sinusoid regression.

In my new formulation, regardless of the batch size used, the different runs of the training task are shown the same number of batches. This is different from my previous formulation in that now, training runs with bigger batch sizes will effectively "see" more samples of the data (really large batch sizes might see the entire dataset more than once). In the previous formulation, all training tasks are set up to see the full dataset only once i.e. one epoch.

A plot of the final regression loss (training loss) after 600 optimisation iterations, as a function of the batch size. Note that a smaller batch size yields better results than the larger batch sizes.

Histogram plots of the final training loss after 600 iterations. Each plot shows a different batch size and each batch size is tested 60 times. Note the really large batch sizes seem to have a peak at a higher final training loss.

I'm pretty convinced now that, controlling for all other parameters, smaller batch sizes (but not 1) yield better training results for regressing sinusoids. Next up I am going to see if this observation holds for different tasks such as regressing affine functions and eventually maybe even images!

Regressing sinusoids for a given number of iterations

Written by Frank Lanke Fu Tarimo