Smoothness of Functions Learned by Neural Networks
Klíčová slova: | strojové učení, neuronové sítě, hladkost, zobecňování |
Klíčová slova anglicky: | machine learning, neural networks, smoothness, generalization |
Zásady pro vypracování |
Classic machine learning theory states that models suffer from the bias-variance tradeoff: if their capacity is too low, they underfit the training set and thus have poor performance on the test set as well. If it is too high, they overfit the training set displaying (near-)perfect performance on it, but again perform poorly on the test set due to finding spurious patterns in the data.
Recent research shows that this does not apply to modern neural networks (NNs): they often have the capacity to perfectly fit (interpolate) the training set, but despite being extremely “overfit” they generalize well. This suggests that there is some form of implicit regularization in the training process which biases the learned functions in a way which is good for generalization. But how this implicit regularization works is an open problem. In the thesis we will explore the hypothesis that training NNs with gradient descent tends to learn smooth functions, where “smoothness” is understood in some intuitive sense. If we also assume that smooth functions are good for generalization, this would explain why NNs generalize. We focus on the first part of the hypothesis: whether training NNs yields smooth functions. We formalize this notion of smoothness and run experiments to see under what conditions smooth functions are actually learned. Specifically, we will propose measures of function complexity (inverse of smoothness) and measure the complexity of NNs trained with various hyperparameters. We will use the simplest NN possible: a two-layer network with ReLU activation. We will compare the computed complexity of NNs and other models, such as polynomial interpolation. We will use synthetic datasets at first and later move on to simple real data (MNIST, CIFAR). This way, we will determine which training procedures lead to smooth functions being learned. The empirical study may lead to development in theory: based on the data, one may be able to formulate precise conditions under which smooth functions are learned. |
