Natural Language Correction With Focus on Czech
Název práce v češtině: | Automatická korekce textu se zaměřením na češtinu |
---|---|
Název v anglickém jazyce: | Natural Language Correction With Focus on Czech |
Klíčová slova: | automatická korekce textu|oprava gramatiky|generování diakritiky|datasety|zpracování přirozeného jazyka |
Klíčová slova anglicky: | natural language correction|grammatical error correction|diacritics restoration|datasets|Czech |
Akademický rok vypsání: | 2016/2017 |
Typ práce: | disertační práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Milan Straka, Ph.D. |
Řešitel: | Mgr. Jakub Náplava, Ph.D. - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 20.09.2017 |
Datum zadání: | 20.09.2017 |
Datum potvrzení stud. oddělením: | 03.10.2017 |
Datum a čas obhajoby: | 28.06.2022 13:00 |
Datum odevzdání elektronické podoby: | 01.04.2022 |
Datum odevzdání tištěné podoby: | 01.04.2022 |
Datum proběhlé obhajoby: | 28.06.2022 |
Oponenti: | Roman Grundkiewicz |
Mgr. et Mgr. Ondřej Dušek, Ph.D. | |
Zásady pro vypracování |
In recent years, deep neural networks have been used to solve complex machine-learning problems and have achieved significant state-of-the-art results in many areas. Since 2014 deep neural networks have been utilized also in natural text processing, improving state-of-the-art results in machine translation, dependency parsing, named entity recognition and in many other text processing applications.
One such interesting (and also very useful) text processing application is natural language correction, which aims to correct a variety of errors in input text, ranging from simple spelling errors and missing diacritical marks, to complex errors like syntactic grammatical errors, or even stylistic and semantic errors. Deep neural networks are (to our best knowledge) state-of-the-art in English grammatical error correction (Chollampatt et al., 2016; Ziang Xie et al., 2016) and in Czech diacritization, spelling error correction and grammatical error correction (Naplava, 2017), providing models that can be trained in end-to-end fashion and require only annotated training data and large plain-text corpus. However, many challenges remain unsolved -- for instance, devising training methods overcoming lack of annotated data, better utilization of unannotated data, designing neural network architectures capable of complex error correction (grammatical, stylistic, semantic errors) or constructing a single model capable of correcting a large variety of error types, to name a few. Furthermore, to allow practical usage, runtime performance of existing models has to be improved. The goal of the thesis is to improve the natural language correction performance, most likely by utilizing deep learning methods. |
Seznam odborné literatury |
- Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, Andrew Y. Ng: Neural Language Correction with Character-Based Attention. https://arxiv.org/abs/1603.09727
- Shamil Chollampatt, Kaveh Taghipour, Hwee Tou Ng: Neural Network Translation Models for Grammatical Error Correction. https://arxiv.org/abs/1606.00189 - Jason Lee, Kyunghyun Cho, Thomas Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation. https://arxiv.org/abs/1610.03017 - Jakub Náplava: Natural Language Correction, Master thesis, 2017. To be submitted. - Michal Richter: Advanced Czech Spellchecker, Master thesis, 2010. https://is.cuni.cz/webapps/zzp/detail/45334/ |