Manuscript on analyzing Approximate SGD

Bin Hu and I have recently posted a manuscript on arxiv entitled “Analysis of Approximate Stochastic Gradient using quadratic constraints and sequential semidefinite programs”. Approximate Stochastic Gradient refers to the standard SGD algorithm, which is widely used in solving optimization problems, but with the twist that gradient information is noisy.

In this paper, we investigate the case where noise is either additive, multiplicative, or both. We also consider different cases concerning the objective functions. For example, the case where each component function is smooth (not necessarily convex) but the sum of the component functions is strongly convex. Generally, our approach allows one to specify different constraints on the individual functions vs. the sum of the functions and obtain worst-case (and average-case) performance guarantees. Each case reduces to analyzing the feasibility of a small semidefinite program that can be evaluated numerically or analytically.

Tuning the stepsize presents interesting challenges, because you generally have to choose between converging rapidly to a large ball or converging slowly to a smaller ball. We investigate what happens when using a constant stepsize and we also look at the optimal case, i.e. optimally tuning stepsize at each iteration by solving a Bellman equation. This approach gives insight into the intricate trade-offs that couple stepsize selection, convergence rate, optimization accuracy, and robustness to gradient inaccuracy.