Efficient representation of k-mer sets
Thesis title in Czech: | Efektivní reprezentace množin k-merů |
---|---|
Thesis title in English: | Efficient representation of k-mer sets |
Key words: | množiny k-merů|nejkratší nadřetězec|bioinformatika|hladový algoritmus |
English key words: | k-mer sets|shortest superstring|bioinformatics|greedy algorithm |
Academic year of topic announcement: | 2021/2022 |
Thesis type: | Bachelor's thesis |
Thesis language: | angličtina |
Department: | Computer Science Institute of Charles University (32-IUUK) |
Supervisor: | Mgr. Pavel Veselý, Ph.D. |
Author: | hidden![]() |
Date of registration: | 27.07.2022 |
Date of assignment: | 04.08.2022 |
Confirmed by Study dept. on: | 15.02.2023 |
Date and time of defence: | 07.09.2023 09:00 |
Date of electronic submission: | 12.07.2023 |
Date of submission of printed version: | 12.07.2023 |
Date of proceeded defence: | 07.09.2023 |
Opponents: | doc. Mgr. Petr Kolman, Ph.D. |
Advisors: | Karel Břinda |
Guidelines |
This thesis will focus on efficient representations of k-mer sets, which are substrings of length k obtained from a DNA sequence. The student will study state-of-the-art methods from the literature (e.g., simplitigs) and experimentally compare them to approximation algorithms for the shortest superstring problem. The aim is also to generalize existing concepts for representing k-mer sets into an overarching definition.
The thesis is a continuation of the student's Individual Software Project. |
References |
K Břinda, M Baym, G Kucherov: Simplitigs as an efficient and scalable representation of de Bruijn graphs. Genome biology, 2021.
A Rahman, P Medvedev: Representation of k-mer Sets Using Spectrum-Preserving String Sets. International Conference on Research in Computational Molecular Biology, 2020. S Schmidt, S Khan, J Alanko, AI Tomescu: Matchtigs: minimum plain text representation of kmer sets. bioRxiv, 2021. D Gusfield: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997. |