Neighborhood components analysis and machine learning
Thesis title in Czech: | Analýza sousedních komponent a strojové učení |
---|---|
Thesis title in English: | Neighborhood components analysis and machine learning |
Key words: | KNN, NCA, FNCA, kernel trick, TSKNN, TSNCA, klasifikace |
English key words: | KNN, NCA, FNCA, kernel trick, TSKNN, TSNCA, classification |
Academic year of topic announcement: | 2017/2018 |
Thesis type: | Bachelor's thesis |
Thesis language: | angličtina |
Department: | Department of Probability and Mathematical Statistics (32-KPMS) |
Supervisor: | prof. RNDr. Jaromír Antoch, CSc. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 31.10.2017 |
Date of assignment: | 31.10.2017 |
Confirmed by Study dept. on: | 15.12.2017 |
Date and time of defence: | 13.09.2018 09:00 |
Date of electronic submission: | 17.05.2018 |
Date of submission of printed version: | 20.07.2018 |
Date of proceeded defence: | 13.09.2018 |
Opponents: | doc. RNDr. Matúš Maciak, Ph.D. |
Guidelines |
Neighbourhood components analysis (NCA) aims at "learning" a distance metric by finding a linear transformation of input data such that the average leave-one-out classification performance is maximized in the transformed space. The key insight to the algorithm is that a matrix A corresponding to the transformation can be found by defining a differentiable objective function for A, followed by use of an iterative solver such as conjugate gradient descent. One of the benefits of this algorithm is that the number of classes can be determined as a function of A, up to a scalar constant. This use of the algorithm therefore addresses the issue of model selection.
Main goals of the thesis are as follows: - to describe the basic algorithms; - to give characterization of their properties; - to compare considered algorithms with another approaches traditionally used for classification and model selection as, e.g. SVM; - based on the real nontrivial examples to illustrate advantages and disadvantages of selected approach. |
References |
1) Jacob Goldberger, Sam Roweis, Geoff Hinton, Ruslan Salakhutdinov. Neighborhood Components Analysis.
Department of Computer Science, University of Toronto Working paper 2) Wei Yanga, Kuanquan Wang, Wangmeng Zuo. Fast neighborhood component analysis. Neurocomputing 83, 2012, 31-37. 3) Chen Qina, Shiji Song, Gao Huang, Lei Zhu. Unsupervised neighborhood component analysis for clustering. Neurocomputing 168, 2015, 609-617. 4) Bernhard Scholkopf, Alexander Smola, Klaus-Robert Muller. Nonlinear component analysis as a kernel eigenvalue problem. Max-Planck-Institut fur biologische Kybernetik Working paper. 5) https://github.com/danoneata/fast-nca 6) https://wiki.math.uwaterloo.ca/statwiki/index.php?title=neighbourhood_Components_Analysis 7) Matlab 2016b 9) Everitt, B. S., Landau, S., Leese, M. and Stahl, D. Miscellaneous Clustering Methods, in Cluster Analysis, 5th Edition, John Wiley & Sons, Ltd, Chichester, UK, 2011. 10) Samworth, R.J. Optimal weighted nearest neighbor classifiers. Annals of Statistics. 40 (5): 2733-2763, 2012. 11) Bernhard Scholkopf, The kernel trick for distances, Microsoft Research technical report, Cambridge, UK. |