Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Inference of Advanced Schemas for XML Documents Using Non-W3C Schema Languages
Thesis title in Czech:
Thesis title in English: Inference of Advanced Schemas for XML Documents Using Non-W3C Schema Languages
Key words: XML, schema, inference, XML schema languages
English key words: XML, schema, inference, XML schema languages
Academic year of topic announcement: 2010/2011
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: RNDr. Jakub Klímek, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 12.11.2010
Date of assignment: 12.11.2010
Guidelines
Inference of XML schemas from sets of XML documents is not a new concept. However, with the development of recent, more effective and user friendlier languages such as Schematron and Relax NG, it may be possible to infer XML schemas with advanced properties. Firstly, it is necessary to analyze potential benefits of more powerful schema languages and discuss their advantages and disadvantages in contrast to existing approaches. The aim of this work is a research on potential improvements over current (semi)automatic methods of schema inference with regard to more accurate expression of data coherence. The core of the work is a proposal and implementation of a new method focusing on better expression of constraints and supplementation of weak points of grammar-based schema languages. The work will include suitable experimental results.
References
[1] ISO/IEC 19757-3:2006: Information technology -- Document Schema Definition Language (DSDL) -- Part 3: Rule-based validation -- Schematron http://standards.iso.org/ittf/PubliclyAvailableStandards/c040833_ISO_IEC_19757-3_2006(E).zip

[2] ISO/IEC 19757-2:2008: Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG http://standards.iso.org/ittf/PubliclyAvailableStandards/c052348_ISO_IEC_19757-2_2008(E).zip

[3] XML Path Language (XPath) 2.0, W3C Recommendation 23 January 2007. http://www.w3.org/TR/xpath20/

[4] Mlynkova, I.: An Analysis of Approaches to XML Schema Inference. SITIS'08, Bali, Indonesia, November/December 2008. IEEE Computer Society Press, 2008.

[5] Opocenska, K. - Kopecky, M.: Incox - a Language for XML Integrity Constraints Description. In DATESO'08, pages 1-12. CEUR-WS.org, 2008.

[6] Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Diplomová práce, MFF UK, 2005. http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vosta.pdf

[7] Vyhnanovská, J.: Automatic Construction of an XML Schema for a Given Set of XML Documents. Diplomová práce, MFF UK, 2009. http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vyhnanovska.pdf

[8] Moh, C.-H. - Lim, E.-P. - Ng, W.-K.: Re-engineering Structures from Web Documents. In DL '00: Proc. of the 5th ACM Conf. on Digital Libraries, pages 67-76, New York, NY, USA, 2000. ACM Press.

[9] Fassetti. F. - Fazzinga, B.: FOX: Inference of Approximate Functional Dependencies from XML Data. In DEXA'07, pages 10-14, Washington, DC, USA, 2007. IEEE.

[10] Fan, W.: XML Constraints: Specification, Analysis, and Applications. In DEXA'05, pages 805-809, IEEE, 2005.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html