SGD without replacement
Posted on:
This article is in continuation of my previous blog, and discusses about the work by Prateek Jain, Dheeraj Nagaraj and Praneeth Netrapalli 2019 which attempts to answer Léon Bottou’s (2009) open question of understanding SGD without replacement. The authors provide tight rates for SGD without replacement for general smooth, and general smooth and strongly convex functions using the method of exchangeable pairs to bound Wasserstein distances, and techniques from optimal transport. They show that SGD without replacement on general smooth and strongly convex functions can achieve a rate of \(\mathcal{O}\left( \frac{1}{K^2} \right)\) where \(K\) is the number of passes over the data. This result requires \(K\in\mathcal{\Omega}(\kappa^2)\) where $\kappa$ is the condition number of the problem. They show that SGD without replacement matches the rate of SGD with replacement when $K$ is smaller than \(\mathcal{O}(\kappa^2)\). This is strictly better than SGD with replacement which has a rate of \(\mathcal{O}\left( \frac{1}{K} \right)\) in the similar setting. Their analysis does not require the Hessian Lipschitz condition as required by other previous works and holds for any general smooth and strongly convex function. They also show that SGD without replacement is at least as good as SGD with replacement for general smooth convex functions in the absence of strong convexity.
The complete PDF post can be viewed here.