Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Extraction and representation of unified metadata from files and file systems based on data formats
Thesis title in Czech: Extrakce a reprezentace jednotných metadat ze souborů a souborových systémů na základě datových formátů
Thesis title in English: Extraction and representation of unified metadata from files and file systems based on data formats
Key words: RDF|formáty souborů|analýza formátu souborů|média|metadata|extrakce informací
English key words: RDF|file formats|file format analysis|media|metadata|information extraction
Academic year of topic announcement: 2021/2022
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: RNDr. Jakub Klímek, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 03.03.2022
Date of assignment: 03.03.2022
Confirmed by Study dept. on: 29.03.2022
Date and time of defence: 06.06.2023 09:00
Date of electronic submission:24.04.2023
Date of submission of printed version:09.05.2023
Date of proceeded defence: 06.06.2023
Opponents: RNDr. Martin Svoboda, Ph.D.
 
 
 
Guidelines
Many Internet archives of digital resources, such as the Internet Archive [1] or Wikimedia Commons [2], provide ways of annotating the data but do not offer automated means of extracting and representing structures stored within the data itself, for example, the contents of file archives, image or music metadata, or resources within executable files, in a non-proprietary form.
The student will get familiar with the RDF data model [3] and the standards for representation of media types [4] and identification of resources on the Internet [5][6][7].
The student will design, implement, document, evaluate and test an extensible tool for representing and describing data structures and metadata obtained via analysis of files based on their file format and content, supporting selected file formats. The result of the analysis will be represented in RDF, with emphasis on standardized or prevalent vocabularies [8]. The thesis will also include a couple of use cases for such a representation of the contents of files.
References
[1] Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine, https://archive.org/
[2] Wikimedia Commons, https://commons.wikimedia.org/
[3] RDF 1.1 Concepts and Abstract Syntax, W3C, https://www.w3.org/TR/rdf11-concepts/
[4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, DOI 10.17487/RFC2046, November 1996, <https://www.rfc-editor.org/info/rfc2046>.
[5] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005, <https://www.rfc-editor.org/info/rfc3986>.
[6] Masinter, L., "The "data" URL scheme", RFC 2397, DOI 10.17487/RFC2397, August 1998, <https://www.rfc-editor.org/info/rfc2397>.
[7] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., and P. Hallam-Baker, "Naming Things with Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, <https://www.rfc-editor.org/info/rfc6920>.
[8] Schema.org, https://schema.org/
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html