Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Porozumění textu pomocí hlubokého strojového učení
Thesis title in Czech: Porozumění textu pomocí hlubokého strojového učení
Thesis title in English: Deep Language Understanding by Deep Learning
Key words: strojové učení, porozumění textu, hluboké neuronové sítě, čeština, angličtina
English key words: machine learning, deep text understanding, deep neural networks, Czech Language, English Language
Academic year of topic announcement: 2022/2023
Thesis type: dissertation
Thesis language:
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: prof. RNDr. Jan Hajič, Dr.
Author:
Guidelines
Despite applications like Machine Translation, true language understanding is still elusive. The focus of the thesis will be to analyze plain text in Czech and English to a knowledge representation graph by using supervised training with DNNs. Basic tools are available (up to natural language syntax), but semantic and knowledge extraction part is unsolved and will be the main problem to tackle. Datasets are available for at least two meaning/knowledge graph types (trees/DAGs). Deep learning will be used as the main tool for learning the relation between text and the selected meaning representation, for both Czech and English languages. Properly designed experiments will be used to test various system configurations, and results will be evaluated by standard metrics used in the area of meaning representation and language understanding. Evaluation will be also extended to downstream applications, such as IE or entailment or question-answering, using the meaning representation as the formal means of representing knowledge.
References
Banarescu, L. et al. (2013). Abstract Meaning Representation for Sembanking, 7th LAW workshop, Sophia, Bulgaria, https://aclanthology.info/papers/W13-2322/w13-2322
Jonathan May and Jay Priyadarshi (2017). SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). (https://aclanthology.info/papers/S17-2090/s17-2090)
https://catalog.ldc.upenn.edu/LDC2017T10 (AMR Release 2.0)
https://ufal.mff.cuni.cz/pdt3.5 (Prague Dependency Treebank v3.5), detailed documentation at https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/t-layer/html/index.html
UFAL's course NPFL117 (formerly NPFL114, http://ufal.mff.cuni.cz/courses/npfl114/1718-summer) by Milan Straka, online at https://slideslive.com/s/milan-straka-10654
Preliminary scope of work
Přesto, že dnes již existují aplikace využívající strojové učení v oblasti analýzy přirozených jazykú, jako např. strojový překlad, skutečné porozumění významy jazyka (textu) je stále daleko. Jádrem této dizertace bude tedy vyvinout metody a konkrétní modely pro analýzu českého a anglického textu a jeho vyjádření ve zvolené formě reprezentace znalostí. Základní nástroje pro jazykovou analýzu budou pro doktoranda k dispozici, důraz bude tedy na vývoj metod pro sémantickou analýzu a převod do reprezentace významu (tj. znalostí). K dispozici jsou rovněž data (manuálně analyzované texty) pro hluboké učení a evaluaci, používající různé typy formální reprezentace (buď stromy nebo DAGs). Nepředpokládá se znalost žádné z lingvistických teorií.
Preliminary scope of work in English
Despite applications like Machine Translation, true language understanding is still elusive. The focus of the thesis will be to analyze plain text in Czech and English to a knowledge representation graph by using supervised training with DNNs. Basic tools are available (up to natural language syntax), but semantic and knowledge extraction part is unsolved and will be the main problem to tackle. Datasets are available for at least two meaning/knowledge graph types (trees/DAGs).
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html