Automatic Construction of an XML Schema for a Given Set of XML Documents
Thesis title in Czech: | Automatic Construction of an XML Schema for a Given Set of XML Documents |
---|---|
Thesis title in English: | Automatic Construction of an XML Schema for a Given Set of XML Documents |
Academic year of topic announcement: | 2007/2008 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Department of Software Engineering (32-KSI) |
Supervisor: | doc. RNDr. Irena Holubová, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 22.10.2007 |
Date of assignment: | 22.10.2007 |
Date and time of defence: | 25.05.2009 00:00 |
Date of electronic submission: | 25.05.2009 |
Date of proceeded defence: | 25.05.2009 |
Opponents: | doc. Mgr. Martin Nečaský, Ph.D. |
Guidelines |
Statistical analyses of real-world XML data show that a significant portion of XML documents do not have an appropriate XML schema. And even if they have, the XML Schema language is exploited even less. It is probably caused by the fact that manual construction of an XML schema is not an easy task and that the XML Schema language is relatively complex.
The aim of this work is a research on various aspects of the problem of automatic construction of an XML schema for a given set of XML documents. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of automatic construction focusing on the found disadvantages and shortcomings. A possible approach can focus on new XML Schema constructs such as, e.g., inheritance, global and local items, groups of elements and attributes, etc. in combination with user interaction. The work will include suitable experimental results. |
References |
Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006. http://www.w3.org/TR/REC-xml
W3C. W3C Technical Reports and Publications. http://www.w3.org/TR/ Mlýnková, I. - Pokorný, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML. Skripta. Karlova Univerzita, Praha, Česká republika, září 2006. Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Diplomová práce, MFF UK, 2005. http://kocour.ms.mff.cuni.cz/~mlynkova/dp/Vosta.ps Moh, C.-H. - Lim, E.-P. - Ng, W.-K.: Re-engineering Structures from Web Documents. In DL '00: Proc. of the 5th ACM Conf. on Digital Libraries, pages 67-76, New York, NY, USA, 2000. ACM Press. Garofalakis, M. - Gionis, A. - Rastogi, R. - Seshadri, S. - Shim K.: XTRACT: a System for Extracting Document Type Descriptors from XML Documents. In SIGMOD '00: Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data, pages 165-176, New York, NY, USA, 2000. ACM Press. Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996. |