# Towards Understanding Generalization via Decomposing Excess Risk Dynamics

@article{Teng2021TowardsUG, title={Towards Understanding Generalization via Decomposing Excess Risk Dynamics}, author={Jiaye Teng and Jianhao Ma and Yang Yuan}, journal={ArXiv}, year={2021}, volume={abs/2106.06153} }

Generalization is one of the critical issues in machine learning. However, traditional methods like uniform convergence are not powerful enough to fully explain generalization because they may yield vacuous bounds even in overparameterized linear regression regimes. An alternative solution is to analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Unfortunately, the stability-based bound is still far from explaining the remarkable generalization ability of… Expand

#### References

SHOWING 1-10 OF 58 REFERENCES

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

- Computer Science, Mathematics
- ICLR
- 2020

A new framework, termed Bayes-Stability, is developed for proving algorithm-dependent generalization error bounds for learning general non-convex objectives and it is demonstrated that the data-dependent bounds can distinguish randomly labelled data from normal data. Expand

Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition

- Mathematics, Computer Science
- NeurIPS
- 2020

This work describes an interpretable, symmetric decomposition of the variance into terms associated with the randomness from sampling, initialization, and the labels, and compute the high-dimensional asymptotic behavior of this decomposition for random feature kernel regression, and analyzes the strikingly rich phenomenology that arises. Expand

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

- Mathematics, Computer Science
- COLT
- 2018

This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning. Expand

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

- Computer Science, Mathematics
- UAI
- 2017

By optimizing the PAC-Bayes bound directly, Langford and Caruana (2001) are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples. Expand

Uniform convergence may be unable to explain generalization in deep learning

- Computer Science, Mathematics
- NeurIPS
- 2019

Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. Expand

Train faster, generalize better: Stability of stochastic gradient descent

- Computer Science, Mathematics
- ICML
- 2016

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically… Expand

Stability of SGD: Tightness Analysis and Improved Bounds

- Computer Science, Mathematics
- ArXiv
- 2021

It is shown that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-conventus loss functions, and novel and improved data-dependent bounds are given. Expand

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

- Computer Science, Mathematics
- NeurIPS
- 2019

It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. Expand

Stability and Generalization of Learning Algorithms that Converge to Global Optima

- Mathematics, Computer Science
- ICML
- 2018

This work derives black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function that establish novel generalization bounds for learning algorithms that converge to global minima. Expand

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

- Computer Science, Mathematics
- ICML
- 2019

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure. Expand