Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Mining texts at the discourse level

Název práce v češtině:	Dolování textu na úrovni diskursu
Název v anglickém jazyce:	Mining texts at the discourse level
Klíčová slova:	dobývání informací z textu, výstavba diskurzu, formální konceptuální analýza
Klíčová slova anglicky:	text mining, discourse structure, formal concept analysis
Akademický rok vypsání:	2013/2014
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	doc. RNDr. Pavel Pecina, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	13.02.2014
Datum zadání:	13.02.2014
Datum potvrzení stud. oddělením:	26.02.2014
Datum a čas obhajoby:	08.09.2014 00:00
Datum odevzdání elektronické podoby:	30.07.2014
Datum odevzdání tištěné podoby:	31.07.2014
Datum proběhlé obhajoby:	08.09.2014
Oponenti:	Mgr. Michal Novák, Ph.D.

Zásady pro vypracování

The goal of this thesis is to set the basis of a new approach to text mining in order to extract knowledge in a given domain. This approach combines two formal methods, one on discourse modelling coming from Natural Language Processing (NLP) and the other on Formal Concept Analysis, a classification method used in Data Mining (DM). It aims at showing that there exist alternatives to current numerical methods based on a low-semantic representation of texts (bag of words ...) widely used in Text Mining, in Information Retrieval or in Knowledge Extraction from Texts. It should favour “deep” semantic methods so to be able to synthetise the content of a set of texts. The domain of experiment could be the study of Rare Disease. Thus, the result of the process could be considered as a summary of a collection of texts.

This thesis subject is aimed at mining a collection of textual documents on a given domain for discovering recurrent parts of documents that could be used for completing and enriching domain knowledge. Texts or part of texts should be represented by a set of discourse representations. Classification of texts should be performed using pattern structures in formal concept analysis where similarity between two texts is defined in accordance with an algebra on discourse relations.

Seznam odborné literatury

Amblard, M., Pogodalla, S. Modeling the Dynamic Effects of Discourse: Principles and Frameworks. In Rebuschi, M.; BATT, M.; Heinzmann, G.; Lihoreau, F.; Musiol, M.; Trognon, A. (Eds.) Interdisciplinary Works in Logic, Epistemology, Psychology and Linguistics, Dialogue, Rationality, and Formalism, Logic, Argumentation & Reasoning, Vol. 3, Dordrecht, Springer. 2014

Charlotte Roze Towards a Discourse Relation Algebra for Comparing Discourse Structures, Constraints In Discourse (CID 2011), Agay, France. 2011.

M. Kaytoue-Uberall, S.O. Kuznetsov, A. Napoli, and S. Duplessis. Mining Gene Expression Data with Pattern Structures in Formal Concept Analysis. Information Science, 2010.

Adrien Coulet, Florent Domenach, Mehdi Kaytoue and Amedeo Napoli: Using Pattern Structures for Analyzing Ontology-based Annotations. In ICFCA 2013