Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Neighborhood components analysis and machine learning

Thesis title in Czech:	Analýza sousedních komponent a strojové učení
Thesis title in English:	Neighborhood components analysis and machine learning
Key words:	KNN, NCA, FNCA, kernel trick, TSKNN, TSNCA, klasifikace
English key words:	KNN, NCA, FNCA, kernel trick, TSKNN, TSNCA, classification
Academic year of topic announcement:	2017/2018
Thesis type:	Bachelor's thesis
Thesis language:	angličtina
Department:	Department of Probability and Mathematical Statistics (32-KPMS)
Supervisor:	prof. RNDr. Jaromír Antoch, CSc.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	31.10.2017
Date of assignment:	31.10.2017
Confirmed by Study dept. on:	15.12.2017
Date and time of defence:	13.09.2018 09:00
Date of electronic submission:	17.05.2018
Date of submission of printed version:	20.07.2018
Date of proceeded defence:	13.09.2018
Opponents:	doc. RNDr. Matúš Maciak, Ph.D.

Guidelines

Neighbourhood components analysis (NCA) aims at "learning" a distance metric by finding a linear transformation of input data such that the average leave-one-out classification performance is maximized in the transformed space. The key insight to the algorithm is that a matrix A corresponding to the transformation can be found by defining a differentiable objective function for A, followed by use of an iterative solver such as conjugate gradient descent. One of the benefits of this algorithm is that the number of classes can be determined as a function of A, up to a scalar constant. This use of the algorithm therefore addresses the issue of model selection.

Main goals of the thesis are as follows:
- to describe the basic algorithms;
- to give characterization of their properties;
- to compare considered algorithms with another approaches traditionally used for classification and model selection as, e.g. SVM;
- based on the real nontrivial examples to illustrate advantages and disadvantages of selected approach.

References

1) Jacob Goldberger, Sam Roweis, Geoff Hinton, Ruslan Salakhutdinov. Neighborhood Components Analysis.
Department of Computer Science, University of Toronto Working paper

2) Wei Yanga, Kuanquan Wang, Wangmeng Zuo. Fast neighborhood component analysis. Neurocomputing 83, 2012, 31-37.

3) Chen Qina, Shiji Song, Gao Huang, Lei Zhu. Unsupervised neighborhood component analysis for clustering. Neurocomputing 168,
2015, 609-617.

4) Bernhard Scholkopf, Alexander Smola, Klaus-Robert Muller. Nonlinear component analysis as a kernel eigenvalue problem.
Max-Planck-Institut fur biologische Kybernetik Working paper.

5) https://github.com/danoneata/fast-nca

6) https://wiki.math.uwaterloo.ca/statwiki/index.php?title=neighbourhood_Components_Analysis

7) Matlab 2016b

9) Everitt, B. S., Landau, S., Leese, M. and Stahl, D. Miscellaneous Clustering Methods, in Cluster Analysis, 5th Edition,
John Wiley & Sons, Ltd, Chichester, UK, 2011.

10) Samworth, R.J. Optimal weighted nearest neighbor classifiers. Annals of Statistics. 40 (5): 2733-2763, 2012.

11) Bernhard Scholkopf, The kernel trick for distances, Microsoft Research technical report, Cambridge, UK.