Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Optimization and Refinement of XML Schema Inference Approaches
Thesis title in Czech: Optimization and Refinement of XML Schema Inference Approaches
Thesis title in English: Optimization and Refinement of XML Schema Inference Approaches
Key words: XML, XML schéma, odvozování schématu, odvozování regulárních výrazů z pozitivních příkladů
English key words: XML, XML schema, schema inference, inference of regular expressions from positive examples
Academic year of topic announcement: 2010/2011
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: doc. RNDr. Irena Holubová, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 04.11.2010
Date of assignment: 04.11.2010
Date and time of defence: 05.09.2011 09:30
Date of electronic submission:04.08.2011
Date of submission of printed version:05.08.2011
Date of proceeded defence: 05.09.2011
Opponents: Mgr. Jakub Stárka, Ph.D.
 
 
 
Guidelines
Currently there exist several works which focus on the problem of (semi)automatic inference of XML schemas for a given set of XML documents. Even though most of the approaches focus on inference of correct and optimal regular expressions, the results they output are still quite complex and unnatural.
The aim of this work is a research on various aspects of the problem. Firstly, it is necessary to analyze the existing solutions and compare and discuss their outputs. The core of the work is a proposal and implementation of own method focusing on optimization and refinement of existing approaches to obtain more realistic and natural schemas. For this purpose the approach can exploit, e.g., detailed analyses of the input data, user interaction, various metrics etc. The work will include suitable experimental results.
References
Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006. http://www.w3.org/TR/REC-xml

W3C. W3C Technical Reports and Publications. http://www.w3.org/TR/

Mlýnková, I. - Pokorný, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML. Skripta. Karlova Univerzita, Praha, Česká republika, září 2006.

Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Diplomová práce, MFF UK, 2005. http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vosta.pdf

Vyhnanovská, J.: Automatic Construction of an XML Schema for a Given Set of XML Documents. Diplomová práce, MFF UK, 2009. http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vyhnanovska.pdf

Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996.

Christoph Neumann. Converting deterministic finite automata to regular expressions. 2005.

Yo-Sub Han and Derick Wood. Obtaining shorter regular expressions from finite-state automata. Theor. Comput. Sci., 370(1-3):110?120, 2007.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html