Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Semantic relation extraction from unstructured data in the business domain

Název práce v češtině:	Extrakce sémantických vztahů z nestrukturovaných dat v komerční sféře
Název v anglickém jazyce:	Semantic relation extraction from unstructured data in the business domain
Klíčová slova:	Nestrukturovaná Data, Získavání informací, Určování vztahů mezi entitami, Textová analytika, Distant Supervision, Snowball
Klíčová slova anglicky:	Unstructured data, Information Retrieval, Relation Extraction, Text Analytics, Distant Supervision, Snowball
Akademický rok vypsání:	2014/2015
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	doc. RNDr. Pavel Pecina, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	27.03.2015
Datum zadání:	27.03.2015
Datum potvrzení stud. oddělením:	15.07.2015
Datum a čas obhajoby:	08.06.2016 09:00
Datum odevzdání elektronické podoby:	13.05.2016
Datum odevzdání tištěné podoby:	13.05.2016
Datum proběhlé obhajoby:	08.06.2016
Oponenti:	doc. RNDr. Vladislav Kuboň, Ph.D.

Zásady pro vypracování

Text Analytics in the business domain is a growing field of research. A significant part of databases of large companies is in the form of raw unstructured text (e.g. clients' emails) often linked to some structured data (e.g. clients' profiles). Nevertheless, this information is rarely utilized. In order to take advantage of the textual information, it has to be first converted to a structured form. One way to accomplish this is by extracting named entities in the text and relations between them. The extracted relations could be then used in predictive models and play a role in the process of decision-making.

The goal of the thesis is to apply methods for semantic relation extraction to enrich available structured data with relations extracted from unstructured textual data. The work will include comparison of several methods for relation extraction, such as semi-supervised Snowball [1] or DIPRE [2] and unsupervised Distant Supervision [3] and their modification to fit the use-case (e.g. by utilization the existing structured data). The work will be applied on a set of Czech texts from a business domain provided by an industrial partner and evaluated on manually annotated sample of the data.

Seznam odborné literatury

[1] Agichtein, E., & Gravano, L. (2000). Snowball: Extracting relations from large plain-text collections. In Proceedings of the fifth ACM conference on Digital libraries. ACM.

[2] Brin S. (1998). Extracting patterns and relations from the World-Wide Web. In Proceedings of the 1998 International Workshop on the Web and Databases (WebDB’98), March 1998.

[3] Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 (pp. 1003-1011). Association for Computational Linguistics.