Natural Language Correction
Název práce v češtině: | Automatická oprava pravopisu |
---|---|
Název v anglickém jazyce: | Natural Language Correction |
Klíčová slova: | oprava pravopisu, kontrola pravopisu, zpracování přirozeného jazyka, hluboké učení |
Klíčová slova anglicky: | language correction, spell checking, natural language processing, deep learning |
Akademický rok vypsání: | 2016/2017 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Milan Straka, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 13.02.2017 |
Datum zadání: | 13.02.2017 |
Datum potvrzení stud. oddělením: | 20.02.2017 |
Datum a čas obhajoby: | 07.06.2017 09:30 |
Datum odevzdání elektronické podoby: | 11.05.2017 |
Datum odevzdání tištěné podoby: | 11.05.2017 |
Datum proběhlé obhajoby: | 07.06.2017 |
Oponenti: | Mgr. Pavel Straňák, Ph.D. |
Zásady pro vypracování |
In recent years, deep neural networks have been used to solve complex machine-learning problems and have achieved significant state-of-the-art results in many areas. Since 2014 deep neural networks have been utilized also in natural text processing, improving state-of-the-art results in machine translation, dependency parsing, named entity recognition and in many other text processing applications.
Inspired by the above accomplishments, the goal of this thesis is to design, implement and evaluate a language correction tool based on deep learning approach. The language correction tool is an automatic spellchecker, which is able to correct a variety of phenomenons (i/y, diacritical marks, spelling errors, grammatical errors, etc.) depending on given training data. The design of the correction tool can be founded on (Ziang Xie et al., 2016), which is a deep-network-based state-of-the-art grammar checker for English. As usual with deep learning, the language correction tool should be trained end-to-end, requiring only training data and large plain-text corpus. The performance of the correction tool should be evaluated on Czech, utilizing the CzeSL (Czech as a Second Language) corpus with annotated spelling and grammatical errors. The accuracy of the tool should be compared to existing systems -- notably the Korektor tool developed in Master thesis "Advanced Czech Spellchecker", and other existing spellcheckers. |
Seznam odborné literatury |
- Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, Andrew Y. Ng: Neural Language Correction with Character-Based Attention. https://arxiv.org/abs/1603.09727
- Jason Lee, Kyunghyun Cho, Thomas Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation. https://arxiv.org/abs/1610.03017 - Michal Richter: Advanced Czech Spellchecker, Master thesis. https://is.cuni.cz/webapps/zzp/detail/45334/ |