Thesis (Selection of subject)Thesis (Selection of subject)(version: 390)
Thesis details
   Login via CAS
Restoring and improving the technical quality of audio recordings using machine learning methods
Thesis title in Czech: Restaurování a vylepšování technické kvality zvukových nahrávek metodami strojového učení
Thesis title in English: Restoring and improving the technical quality of audio recordings using machine learning methods
Key words: Hluboká neuronová síť|audio|Konvoluční neuronová síť|TensorFlow|kvalita
English key words: Deep Neural Network|audio|Convolutional Neural Network|TensorFlow|quality
Academic year of topic announcement: 2020/2021
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: Mgr. Nino Peterek, Ph.D.
Author: Mgr. Adam Lechovský - assigned and confirmed by the Study Dept.
Date of registration: 07.05.2021
Date of assignment: 07.05.2021
Confirmed by Study dept. on: 25.05.2021
Date and time of defence: 07.09.2022 09:00
Date of electronic submission:21.07.2022
Date of submission of printed version:25.07.2022
Date of proceeded defence: 07.09.2022
Opponents: Mgr. et Mgr. Ondřej Dušek, Ph.D.
 
 
 
Guidelines
The thesis will focus on the use of current artificial intelligence machine learning methods to improve the quality of variously damaged or dynamically imbalanced recordings.
Open source audio data and artificially degraded versions of the data will be used for training and evaluation.
The thesis will use evaluation procedures to objectively and subjectively capture the technical quality of audio recordings.
References
Kamath, U., Liu, J., Whitaker, J.: Deep learning for NLP and speech recognition. Springer, 2019. doi: 10.1007/978-3-030-14596-5
https://link.springer.com/book/10.1007/978-3-030-14596-5

Watanabe, Shinji, et al.: New Era for Robust Speech Recognition. Springer, 2017. doi: 10.1007/978-3-319-64680-0
https://link.springer.com/book/10.1007/978-3-319-64680-0

Jiang, L., Hu, R., Wang, X., Zhang, M.: Low bitrates audio bandwidth extension using a deep auto-encoder. In: Ho, Y.-S., Sang, J., Ro, Y.M., Kim, J., Wu, F. (eds.) PCM 2015. LNCS, vol. 9314, pp. 528–537. Springer, Heidelberg, 2015. doi: 10.1007/978-3-319-24075-6_51
https://link.springer.com/chapter/10.1007/978-3-319-24075-6_51

Mack, W. and Habets, E. A. P.: Declipping Speech Using Deep Filtering. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019, pp. 200-204. doi: 10.1109/WASPAA.2019.8937287.
https://ieeexplore.ieee.org/abstract/document/8937287

Naithani, G., Parascandolo, G., Barker, T., Pontoppidan, N. H., Virtanen, T.: Low-latency sound source separation using deep neural networks. 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2016, pp. 272-276. doi: 10.1109/GlobalSIP.2016.7905846.
https://ieeexplore.ieee.org/abstract/document/7905846
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html