Extraction and representation of unified metadata from files and file systems based on data formats
Thesis title in Czech: | Extrakce a reprezentace jednotných metadat ze souborů a souborových systémů na základě datových formátů |
---|---|
Thesis title in English: | Extraction and representation of unified metadata from files and file systems based on data formats |
Key words: | RDF|formáty souborů|analýza formátu souborů|média|metadata|extrakce informací |
English key words: | RDF|file formats|file format analysis|media|metadata|information extraction |
Academic year of topic announcement: | 2021/2022 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Department of Software Engineering (32-KSI) |
Supervisor: | RNDr. Jakub Klímek, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 03.03.2022 |
Date of assignment: | 03.03.2022 |
Confirmed by Study dept. on: | 29.03.2022 |
Date and time of defence: | 06.06.2023 09:00 |
Date of electronic submission: | 24.04.2023 |
Date of submission of printed version: | 09.05.2023 |
Date of proceeded defence: | 06.06.2023 |
Opponents: | RNDr. Martin Svoboda, Ph.D. |
Guidelines |
Many Internet archives of digital resources, such as the Internet Archive [1] or Wikimedia Commons [2], provide ways of annotating the data but do not offer automated means of extracting and representing structures stored within the data itself, for example, the contents of file archives, image or music metadata, or resources within executable files, in a non-proprietary form.
The student will get familiar with the RDF data model [3] and the standards for representation of media types [4] and identification of resources on the Internet [5][6][7]. The student will design, implement, document, evaluate and test an extensible tool for representing and describing data structures and metadata obtained via analysis of files based on their file format and content, supporting selected file formats. The result of the analysis will be represented in RDF, with emphasis on standardized or prevalent vocabularies [8]. The thesis will also include a couple of use cases for such a representation of the contents of files. |
References |
[1] Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine, https://archive.org/
[2] Wikimedia Commons, https://commons.wikimedia.org/ [3] RDF 1.1 Concepts and Abstract Syntax, W3C, https://www.w3.org/TR/rdf11-concepts/ [4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, DOI 10.17487/RFC2046, November 1996, <https://www.rfc-editor.org/info/rfc2046>. [5] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005, <https://www.rfc-editor.org/info/rfc3986>. [6] Masinter, L., "The "data" URL scheme", RFC 2397, DOI 10.17487/RFC2397, August 1998, <https://www.rfc-editor.org/info/rfc2397>. [7] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., and P. Hallam-Baker, "Naming Things with Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, <https://www.rfc-editor.org/info/rfc6920>. [8] Schema.org, https://schema.org/ |