Display Accessibility Tools

Accessibility Tools


Highlight Links

Change Contrast

Increase Text Size

Increase Letter Spacing

Dyslexia Friendly Font

Increase Cursor Size

Colloquium - Edgar Dobriban

Institution: University of Pennsylvania
Title: Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?
Date: November 23, 2021
Location: Zoom (Click here for meeting details)
Time: 10:20 AM - 11:10 AM Eastern Time

Modern methods for learning from data depend on many tuning parameters, such as the step size for optimization methods, and the regularization strength for regularized learning methods. Since performance can depend strongly on these parameters, it is important to develop comparisons between classes of methods, not just for particularly tuned ones. Here, we aim to compare classes of estimators via the relative performance of the best method in the class. This allows us to rigorously quantify the tuning sensitivity of learning algorithms. As an illustration, we investigate the statistical estimation performance of ridge regression with a uniform grid of regularization parameters, and of gradient descent iterates with a fixed step size, in the standard linear model with a random isotropic ground truth parameter.
(1) For orthogonal designs, we find the exact minimax optimal classes of estimators, showing they are equal to gradient descent with a polynomially decaying learning rate. We find the exact suboptimality of ridge regression and gradient descent with a fixed step size, showing that they decay as either 1/k or 1/k^2 for specific ranges of k estimators.
(2) For general designs with a large number of non-zero eigenvalues, we find that gradient descent outperforms ridge regression when the eigenvalues decay slowly, as a power law with exponent less than unity. If instead the eigenvalues decay quickly, as a power law with exponent greater than unity or exponentially, we find that ridge regression outperforms gradient descent.
Our results highlight the importance of tuning parameters. In particular, while optimally tuned ridge regression is the best estimator in our case, it can be outperformed by gradient descent when both are restricted to being tuned over a finite regularization grid. This is work with Dominic Richards and Patrick Rebeschini.

Edgar Dobriban is an assistant professor of statistics & computer science at the University of Pennsylvania. He obtained a PhD in statistics from Stanford University in 2017, and a BA in Mathematics from Princeton University in 2012. His research interests include the statistical analysis of large datasets, and the theoretical analysis of machine learning. He has received a Theodore W. Anderson award for the best PhD in theoretical statistics from Stanford University, and an NSF CAREER award.

Click here to view the colloquium flyer


July 22, 2021 MSU statistician attains uncommon Institute of Mathematical Statistics "Annals quadfecta" status

MSU statistician attains uncommon Institute of Mathematical Statistics "Annals quadfecta" status

June 24, 2021 STT presents William L. Harkness Award to PhD student

PhD student Nilanjan Chakraborty was recently presented with the William L. Harkness Award for outstanding teaching by a graduate student.

May 20, 2021 STT Undergraduate Student Andrew McDonald Awarded Goldwater Scholarship

Mr. Andrew McDonald, an Honors College junior majoring in Computer Science in the College of Engineering; and Statistics, and Advanced Mathematics in the College of Natural Science, has been named a recipient of the nationally competitive Barry M. Goldwater Scholarship.

upcoming events

Colloquium - Gongjun Xu

Institution: University of Michigan
Title: Identifiability and Estimation of Structured Latent Attribute Models
Date: December 7, 2021
Location: Zoom (Click here for meeting details)
Time: 10:20 AM - 11:10 AM Eastern Time