Optimization and Refinement of XML Schema Inference Approaches
Thesis title in Czech: | Optimization and Refinement of XML Schema Inference Approaches |
---|---|
Thesis title in English: | Optimization and Refinement of XML Schema Inference Approaches |
Key words: | XML, XML schéma, odvozování schématu, odvozování regulárních výrazů z pozitivních příkladů |
English key words: | XML, XML schema, schema inference, inference of regular expressions from positive examples |
Academic year of topic announcement: | 2010/2011 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Department of Software Engineering (32-KSI) |
Supervisor: | doc. RNDr. Irena Holubová, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 04.11.2010 |
Date of assignment: | 04.11.2010 |
Date and time of defence: | 05.09.2011 09:30 |
Date of electronic submission: | 04.08.2011 |
Date of submission of printed version: | 05.08.2011 |
Date of proceeded defence: | 05.09.2011 |
Opponents: | Mgr. Jakub Stárka, Ph.D. |
Guidelines |
Currently there exist several works which focus on the problem of (semi)automatic inference of XML schemas for a given set of XML documents. Even though most of the approaches focus on inference of correct and optimal regular expressions, the results they output are still quite complex and unnatural.
The aim of this work is a research on various aspects of the problem. Firstly, it is necessary to analyze the existing solutions and compare and discuss their outputs. The core of the work is a proposal and implementation of own method focusing on optimization and refinement of existing approaches to obtain more realistic and natural schemas. For this purpose the approach can exploit, e.g., detailed analyses of the input data, user interaction, various metrics etc. The work will include suitable experimental results. |
References |
Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006. http://www.w3.org/TR/REC-xml
W3C. W3C Technical Reports and Publications. http://www.w3.org/TR/ Mlýnková, I. - Pokorný, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML. Skripta. Karlova Univerzita, Praha, Česká republika, září 2006. Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Diplomová práce, MFF UK, 2005. http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vosta.pdf Vyhnanovská, J.: Automatic Construction of an XML Schema for a Given Set of XML Documents. Diplomová práce, MFF UK, 2009. http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vyhnanovska.pdf Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996. Christoph Neumann. Converting deterministic finite automata to regular expressions. 2005. Yo-Sub Han and Derick Wood. Obtaining shorter regular expressions from finite-state automata. Theor. Comput. Sci., 370(1-3):110?120, 2007. |