Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Dance Recognition from Audio Recordings

Thesis title in Czech:	Rozpoznávání tance ze zvukových záznamů
Thesis title in English:	Dance Recognition from Audio Recordings
Key words:	ballroom, dance, genre, classification, CNN, audio, music
Academic year of topic announcement:	2018/2019
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Supervisor:	Jan Čech
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	24.04.2019
Date of assignment:	26.04.2019
Confirmed by Study dept. on:	15.05.2019
Date and time of defence:	03.02.2020 09:00
Date of electronic submission:	06.01.2020
Date of submission of printed version:	07.01.2020
Date of proceeded defence:	03.02.2020
Opponents:	Mgr. Josef Moudřík

Guidelines

Recognizing the dance (Waltz, Tango, Rumba, etc.) which fits the music being played is a challenging problem for non-professionals. To the best of our knowledge, an automatic system able to predict the matching dance categories, given a short music sample, does not exist yet. Developing such a system is realistic considering recent progress in deep convolutional neural networks.

A Related problem, recognizing the dance category from a visual recording, is largely unexplored too. The problem is interesting since it requires distinguishing fine grained human movements from sequential data with certain repetitive patterns and thus it is complementary to standard human action or activity recognition that have been well studied.

Guidelines:
-------------
(1) Review the literature on dance recognition and related problems.
(2) Collect a dataset of labelled examples, namely short audio or audio/visual recordings with annotated labels of the dance categories. Hundreds to thousands of examples will be required.
(3) Audio Recognition: Train a deep net classifier that takes music samples as an input and predicts the label of the dance category, i.e. the dance style which fits to the music. Evaluate the classifier on an independent test set.
(4) Optionally, Visual Recognition: Train a deep net classifier that takes a video stream with a dance as an input and predicts the label of the dance category. Evaluate the classifier on an independent test set.
(5) Consider a fusion of both Audio and Video classifiers.

The thesis will investigate an open research problem. The goal is to review the literature, collect an appropriate dataset, propose and train a model, a deep network based classifier, and evaluate the model on an independent test set, i.e. a portion of the dataset unseen during the training.

References

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. ISBN 9780262035613. http://www.deeplearningbook.org

Hareesh Bahuleyan. Music Genre Classification using Machine Learning Techniques. arXiv preprint arXiv:1804.01149, 2018.

Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros. Everybody dance now. arXiv preprint arXiv:1808.07371, 2018.

Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick. Mask R-CNN. In Proc. IEEE ICCV, 2017, pp. 2980--2988.

Joao Carreira, Andrew Zissermanan. Quo vadis, action recognition? A new model and the kinetics dataset. In Proc. IEEE CVPR, 2017, pp. 4724--4733.

Diogo C. Luvizon, David Picard, Hedi Tabia. 2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning. In Proc. IEEE CVPR, 2018, pp. 5137--5146.