Thesis (Selection of subject)Thesis (Selection of subject)(version: 348)
Assignment details
   Login via CAS
Efficient representation of k-mer sets
Thesis title in Czech: Efektivní reprezentace množin k-merů
Thesis title in English: Efficient representation of k-mer sets
Key words: množiny k-merů|nejkratší nadřetězec|bioinformatika|hladový algoritmus
English key words: k-mer sets|shortest superstring|bioinformatics|greedy algorithm
Academic year of topic announcement: 2021/2022
Type of assignment: Bachelor's thesis
Thesis language: angličtina
Department: Computer Science Institute of Charles University (32-IUUK)
Supervisor: Mgr. Pavel Veselý, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 27.07.2022
Date of assignment: 04.08.2022
Confirmed by Study dept. on: 15.02.2023
Advisors: Karel Břinda
This thesis will focus on efficient representations of k-mer sets, which are substrings of length k obtained from a DNA sequence. The student will study state-of-the-art methods from the literature (e.g., simplitigs) and experimentally compare them to approximation algorithms for the shortest superstring problem. The aim is also to generalize existing concepts for representing k-mer sets into an overarching definition.

The thesis is a continuation of the student's Individual Software Project.
K Břinda, M Baym, G Kucherov: Simplitigs as an efficient and scalable representation of de Bruijn graphs. Genome biology, 2021.
A Rahman, P Medvedev: Representation of k-mer Sets Using Spectrum-Preserving String Sets. International Conference on Research in Computational Molecular Biology, 2020.
S Schmidt, S Khan, J Alanko, AI Tomescu: Matchtigs: minimum plain text representation of kmer sets. bioRxiv, 2021.
D Gusfield: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
Charles University | Information system of Charles University |