Smoothness of Functions Learned by Neural Networks
Název práce v češtině: | Hladkost funkcí naučených neuronovými sítěmi |
---|---|
Název v anglickém jazyce: | Smoothness of Functions Learned by Neural Networks |
Klíčová slova: | strojové učení, neuronové sítě, hladkost, zobecňování |
Klíčová slova anglicky: | machine learning, neural networks, smoothness, generalization |
Akademický rok vypsání: | 2019/2020 |
Typ práce: | bakalářská práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | Mgr. Tomáš Musil |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 12.05.2020 |
Datum zadání: | 12.05.2020 |
Datum potvrzení stud. oddělením: | 22.05.2020 |
Datum a čas obhajoby: | 07.07.2020 09:00 |
Datum odevzdání elektronické podoby: | 04.06.2020 |
Datum odevzdání tištěné podoby: | 04.06.2020 |
Datum proběhlé obhajoby: | 07.07.2020 |
Oponenti: | RNDr. Milan Straka, Ph.D. |
Zásady pro vypracování |
Classic machine learning theory states that models suffer from the bias-variance tradeoff: if their capacity is too low, they underfit the training set and thus have poor performance on the test set as well. If it is too high, they overfit the training set displaying (near-)perfect performance on it, but again perform poorly on the test set due to finding spurious patterns in the data.
Recent research shows that this does not apply to modern neural networks (NNs): they often have the capacity to perfectly fit (interpolate) the training set, but despite being extremely “overfit” they generalize well. This suggests that there is some form of implicit regularization in the training process which biases the learned functions in a way which is good for generalization. But how this implicit regularization works is an open problem. In the thesis we will explore the hypothesis that training NNs with gradient descent tends to learn smooth functions, where “smoothness” is understood in some intuitive sense. If we also assume that smooth functions are good for generalization, this would explain why NNs generalize. We focus on the first part of the hypothesis: whether training NNs yields smooth functions. We formalize this notion of smoothness and run experiments to see under what conditions smooth functions are actually learned. Specifically, we will propose measures of function complexity (inverse of smoothness) and measure the complexity of NNs trained with various hyperparameters. We will use the simplest NN possible: a two-layer network with ReLU activation. We will compare the computed complexity of NNs and other models, such as polynomial interpolation. We will use synthetic datasets at first and later move on to simple real data (MNIST, CIFAR). This way, we will determine which training procedures lead to smooth functions being learned. The empirical study may lead to development in theory: based on the data, one may be able to formulate precise conditions under which smooth functions are learned. |
Seznam odborné literatury |
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017.
Hartmut Maennel, Olivier Bousquet and Sylvain Gelly. Gradient Descent Quantizes ReLU Network Features. arXiv preprint arXiv:1803.08367, 2018. Mikhail Belkin and Daniel Hsu and Siyuan Ma and Soumik Mandal. Reconciling modern machine learning practice and the bias-variance trade-off. arXiv preprint arXiv:1812.11118, 2018. Neyshabur, Behnam, et al. Exploring generalization in deep learning. Advances in Neural Information Processing Systems. 2017. Stuart Geman, Elie Bienenstock, and Ren Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1–58, 1992. doi: 10.1162/neco.1992.4.1.1. URL https://doi.org/10.1162/neco.1992.4.1.1 Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning, volume 1. Springer, 2001. |