Mining texts at the discourse level
Název práce v češtině: | Dolování textu na úrovni diskursu |
---|---|
Název v anglickém jazyce: | Mining texts at the discourse level |
Klíčová slova: | dobývání informací z textu, výstavba diskurzu, formální konceptuální analýza |
Klíčová slova anglicky: | text mining, discourse structure, formal concept analysis |
Akademický rok vypsání: | 2013/2014 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | doc. RNDr. Pavel Pecina, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 13.02.2014 |
Datum zadání: | 13.02.2014 |
Datum potvrzení stud. oddělením: | 26.02.2014 |
Datum a čas obhajoby: | 08.09.2014 00:00 |
Datum odevzdání elektronické podoby: | 30.07.2014 |
Datum odevzdání tištěné podoby: | 31.07.2014 |
Datum proběhlé obhajoby: | 08.09.2014 |
Oponenti: | Mgr. Michal Novák, Ph.D. |
Zásady pro vypracování |
The goal of this thesis is to set the basis of a new approach to text mining in order to extract knowledge in a given domain. This approach combines two formal methods, one on discourse modelling coming from Natural Language Processing (NLP) and the other on Formal Concept Analysis, a classification method used in Data Mining (DM). It aims at showing that there exist alternatives to current numerical methods based on a low-semantic representation of texts (bag of words ...) widely used in Text Mining, in Information Retrieval or in Knowledge Extraction from Texts. It should favour “deep” semantic methods so to be able to synthetise the content of a set of texts. The domain of experiment could be the study of Rare Disease. Thus, the result of the process could be considered as a summary of a collection of texts.
This thesis subject is aimed at mining a collection of textual documents on a given domain for discovering recurrent parts of documents that could be used for completing and enriching domain knowledge. Texts or part of texts should be represented by a set of discourse representations. Classification of texts should be performed using pattern structures in formal concept analysis where similarity between two texts is defined in accordance with an algebra on discourse relations. |
Seznam odborné literatury |
Amblard, M., Pogodalla, S. Modeling the Dynamic Effects of Discourse: Principles and Frameworks. In Rebuschi, M.; BATT, M.; Heinzmann, G.; Lihoreau, F.; Musiol, M.; Trognon, A. (Eds.) Interdisciplinary Works in Logic, Epistemology, Psychology and Linguistics, Dialogue, Rationality, and Formalism, Logic, Argumentation & Reasoning, Vol. 3, Dordrecht, Springer. 2014
Charlotte Roze Towards a Discourse Relation Algebra for Comparing Discourse Structures, Constraints In Discourse (CID 2011), Agay, France. 2011. M. Kaytoue-Uberall, S.O. Kuznetsov, A. Napoli, and S. Duplessis. Mining Gene Expression Data with Pattern Structures in Formal Concept Analysis. Information Science, 2010. Adrien Coulet, Florent Domenach, Mehdi Kaytoue and Amedeo Napoli: Using Pattern Structures for Analyzing Ontology-based Annotations. In ICFCA 2013 |