Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Natural Language Correction

Název práce v češtině:	Automatická oprava pravopisu
Název v anglickém jazyce:	Natural Language Correction
Klíčová slova:	oprava pravopisu, kontrola pravopisu, zpracování přirozeného jazyka, hluboké učení
Klíčová slova anglicky:	language correction, spell checking, natural language processing, deep learning
Akademický rok vypsání:	2016/2017
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	RNDr. Milan Straka, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	13.02.2017
Datum zadání:	13.02.2017
Datum potvrzení stud. oddělením:	20.02.2017
Datum a čas obhajoby:	07.06.2017 09:30
Datum odevzdání elektronické podoby:	11.05.2017
Datum odevzdání tištěné podoby:	11.05.2017
Datum proběhlé obhajoby:	07.06.2017
Oponenti:	Mgr. Pavel Straňák, Ph.D.

Zásady pro vypracování

In recent years, deep neural networks have been used to solve complex machine-learning problems and have achieved significant state-of-the-art results in many areas. Since 2014 deep neural networks have been utilized also in natural text processing, improving state-of-the-art results in machine translation, dependency parsing, named entity recognition and in many other text processing applications.

Inspired by the above accomplishments, the goal of this thesis is to design, implement and evaluate a language correction tool based on deep learning approach. The language correction tool is an automatic spellchecker, which is able to correct a variety of phenomenons (i/y, diacritical marks, spelling errors, grammatical errors, etc.) depending on given training data.

The design of the correction tool can be founded on (Ziang Xie et al., 2016), which is a deep-network-based state-of-the-art grammar checker for English. As usual with deep learning, the language correction tool should be trained end-to-end, requiring only training data and large plain-text corpus.

The performance of the correction tool should be evaluated on Czech, utilizing the CzeSL (Czech as a Second Language) corpus with annotated spelling and grammatical errors. The accuracy of the tool should be compared to existing systems -- notably the Korektor tool developed in Master thesis "Advanced Czech Spellchecker", and other existing spellcheckers.

Seznam odborné literatury

- Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, Andrew Y. Ng: Neural Language Correction with Character-Based Attention. https://arxiv.org/abs/1603.09727

- Jason Lee, Kyunghyun Cho, Thomas Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation. https://arxiv.org/abs/1610.03017

- Michal Richter: Advanced Czech Spellchecker, Master thesis. https://is.cuni.cz/webapps/zzp/detail/45334/