Identifikace typických rysů strojového překladu
Název práce v češtině: | Identifikace typických rysů strojového překladu |
---|---|
Název v anglickém jazyce: | Identification of typical features of machine translation |
Klíčová slova: | strojový překlad|neuronové sítě|deep learning|strojové učení|NLP|zpracování přirozeného jazyka |
Klíčová slova anglicky: | machine translation|neural networks|deep learning|machine learning|natural language processing |
Akademický rok vypsání: | 2022/2023 |
Typ práce: | diplomová práce |
Jazyk práce: | |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | Mgr. Jindřich Libovický, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 24.10.2022 |
Datum zadání: | 24.10.2022 |
Datum potvrzení stud. oddělením: | 27.03.2023 |
Zásady pro vypracování |
In some domains and under limited circumstances, machine translation reaches such an output quality that it is hardly possible for human evaluators to distinguish what is human and what is machine translation. On the other hand, training a machine learning model that distinguishes authentic and generated text is relatively simple. Recently, interpretable text classifiers were developed that can tell what parts of the sentence were the decisions based on. This will be the starting point of the thesis.
The thesis will proceed in two steps. The first goal of this thesis is to develop a classifier for distinguishing between human and machine translation when using high-quality machine translation. The models will be probably based on pre-trained multilingual Transformer models, such as XLM-R. The second step will be using model interpretability methods (such as integrated gradient saliency) to analyze what features allow the models to distinguish generated text from authentic ones. The analysis will to hypotheses, on why this might be difficult for human evaluators. |
Seznam odborné literatury |
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Sundararajan, M., Taly, A., & Yan, Q. (2017, July). Axiomatic attribution for deep networks. In International conference on machine learning (pp. 3319-3328). PMLR. Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2020, July). Automatic Detection of Generated Text is Easiest when Humans are Fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1808-1822). Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2020, July). Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8440-8451). |