Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Identification of typical features of machine translation

Název práce v češtině:	Identifikace typických rysů strojového překladu
Název v anglickém jazyce:	Identification of typical features of machine translation
Klíčová slova:	strojový překlad\|neuronové sítě\|deep learning\|strojové učení\|NLP\|zpracování přirozeného jazyka
Klíčová slova anglicky:	machine translation\|neural networks\|deep learning\|machine learning\|natural language processing
Akademický rok vypsání:	2022/2023
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	Mgr. Jindřich Libovický, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	24.10.2022
Datum zadání:	24.10.2022
Datum potvrzení stud. oddělením:	27.03.2023
Datum a čas obhajoby:	03.06.2025 09:00
Datum odevzdání elektronické podoby:	29.04.2025
Datum odevzdání tištěné podoby:	30.04.2025
Datum proběhlé obhajoby:	03.06.2025
Oponenti:	Mgr. Martin Popel, Ph.D.

Zásady pro vypracování

In some domains and under limited circumstances, machine translation reaches such an output quality that it is hardly possible for human evaluators to distinguish what is human and what is machine translation. On the other hand, training a machine learning model that distinguishes authentic and generated text is relatively simple. Recently, interpretable text classifiers were developed that can tell what parts of the sentence were the decisions based on. This will be the starting point of the thesis.

The thesis will proceed in two steps. The first goal of this thesis is to develop a classifier for distinguishing between human and machine translation when using high-quality machine translation. The models will be probably based on pre-trained multilingual Transformer models, such as XLM-R. The second step will be using model interpretability methods (such as integrated gradient saliency) to analyze what features allow the models to distinguish generated text from authentic ones. The analysis will to hypotheses, on why this might be difficult for human evaluators.

Seznam odborné literatury

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Sundararajan, M., Taly, A., & Yan, Q. (2017, July). Axiomatic attribution for deep networks. In International conference on machine learning (pp. 3319-3328). PMLR.

Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2020, July). Automatic Detection of Generated Text is Easiest when Humans are Fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1808-1822).

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2020, July). Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8440-8451).