Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Natural Language Correction With Focus on Czech
Název práce v češtině: Automatická korekce textu se zaměřením na češtinu
Název v anglickém jazyce: Natural Language Correction With Focus on Czech
Klíčová slova: automatická korekce textu|oprava gramatiky|generování diakritiky|datasety|zpracování přirozeného jazyka
Klíčová slova anglicky: natural language correction|grammatical error correction|diacritics restoration|datasets|Czech
Akademický rok vypsání: 2016/2017
Typ práce: disertační práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: RNDr. Milan Straka, Ph.D.
Řešitel: Mgr. Jakub Náplava, Ph.D. - zadáno a potvrzeno stud. odd.
Datum přihlášení: 20.09.2017
Datum zadání: 20.09.2017
Datum potvrzení stud. oddělením: 03.10.2017
Datum a čas obhajoby: 28.06.2022 13:00
Datum odevzdání elektronické podoby:01.04.2022
Datum odevzdání tištěné podoby:01.04.2022
Datum proběhlé obhajoby: 28.06.2022
Oponenti: Roman Grundkiewicz
  Mgr. et Mgr. Ondřej Dušek, Ph.D.
 
 
Zásady pro vypracování
In recent years, deep neural networks have been used to solve complex machine-learning problems and have achieved significant state-of-the-art results in many areas. Since 2014 deep neural networks have been utilized also in natural text processing, improving state-of-the-art results in machine translation, dependency parsing, named entity recognition and in many other text processing applications.

One such interesting (and also very useful) text processing application is natural language correction, which aims to correct a variety of errors in input text, ranging from simple spelling errors and missing diacritical marks, to complex errors like syntactic grammatical errors, or even stylistic and semantic errors.

Deep neural networks are (to our best knowledge) state-of-the-art in English grammatical error correction (Chollampatt et al., 2016; Ziang Xie et al., 2016) and in Czech diacritization, spelling error correction and grammatical error correction (Naplava, 2017), providing models that can be trained in end-to-end fashion and require only annotated training data and large plain-text corpus. However, many challenges remain unsolved -- for instance, devising training methods overcoming lack of annotated data, better utilization of unannotated data, designing neural network architectures capable of complex error correction (grammatical, stylistic, semantic errors) or constructing a single model capable of correcting a large variety of error types, to name a few. Furthermore, to allow practical usage, runtime performance of existing models has to be improved.

The goal of the thesis is to improve the natural language correction performance, most likely by utilizing deep learning methods.
Seznam odborné literatury
- Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, Andrew Y. Ng: Neural Language Correction with Character-Based Attention. https://arxiv.org/abs/1603.09727

- Shamil Chollampatt, Kaveh Taghipour, Hwee Tou Ng: Neural Network Translation Models for Grammatical Error Correction. https://arxiv.org/abs/1606.00189

- Jason Lee, Kyunghyun Cho, Thomas Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation. https://arxiv.org/abs/1610.03017

- Jakub Náplava: Natural Language Correction, Master thesis, 2017. To be submitted.

- Michal Richter: Advanced Czech Spellchecker, Master thesis, 2010. https://is.cuni.cz/webapps/zzp/detail/45334/
 
Univerzita Karlova | Informační systém UK