SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Machine Learning in Bioinformatics - NAIL107
Title: Strojové učení v bioinformatice
Guaranteed by: Department of Software and Computer Science Education (32-KSVI)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: summer
E-Credits: 5
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Guarantor: RNDr. František Mráz, CSc.
Is incompatible with: NAIX107
Is interchangeable with: NAIX107
Annotation -
Last update: G_I (23.05.2014)
Traditional computer science techniques and algorithms fail to solve complex biological problems. However, machine learning techniques can be applied to analyse and process huge volume of biological data. The lecture presents several areas where machine learning is used to process biological data. The students of the course are supposed to know basics of bioinformatics, which they can learn by passing the course Bioinformatics Algorithms NTIN084, or some similar course at another school.
Course completion requirements -
Last update: RNDr. František Mráz, CSc. (17.02.2020)

A) The seminar

Step by step, in an accompanying Moodle course there will be published assignments and quizzes.

Assignments:

Each assignment has a deadline till which the assignment should be submitted for grading. A draft solution of an assignment can be edited at any time, but the time of submission is the time you click the button "Submit solution". After clicking this button you cannot edit your submission anymore, but you can ask (per e-mail) your teacher to return the assignment back into the draft state. Each submitted assignment will be graded by the teacher with 0-10 points. During the semester, you will solve 4 assignments.

A typical solution for an assignment will consist of a text - a description of the solution - and a code of a program/script used for solving the assignment. Submit your texts as a PDF-file or alternatively as an RTF-file, the source codes should be submitted as plain ASCII files. Alternatively, it is possible to submit description and code in a single file in the form of a Jupyter notebook.

Warning: If N≥2 participants of the course will submit solutions which are very similar or identical, all these solutions will be considered as a single solution. The solution will be graded by B points according to its quality and all students who submitted it will obtain only the integer part of the value B/N points.

Quizzes:

Besides the assignments, you will solve several on-line quizzes. During the term, there will be assigned several short quizzes for at most 10 points altogether. Each quiz will have set up also a deadline. In contrast to assignments, it will be not possible to solve any quiz after its deadline.

For obtaining credits for the seminar it is necessary:

  1. To solve all the assignments and to obtain at least 1 point for each solution. WARNING: late submission of a solution will be penalized by 1 point decrease for each started week of the delay after the deadline.
  2. To prepare and to present a term project in a seminar in the last week of this term or on a date (during the following exam period) which will be set-up on a seminar within the last week of this term. The subject for the project will be discussed in a seminar in the middle of the term. Each project will be graded up to 15 points according to its quality.

The quizzes are not among the necessary conditions for obtaining credits for the seminar. During seminars, it is possible to obtain additional points

  • for demonstrating a solution of a problem assigned during a seminar - 1 point,
  • for demonstrating a solution submitted as a solution for an assignment in Moodle (after its deadline) - the integer part of the half of the number of points awarded for the solution (after grading by the teacher)

All points obtained during the seminars will be accounted for up to 40% of the final score of the exam.

Continuous work throughout the whole term is required to obtain the credits, therefore there will be no additional possibilities to acquire them later.

B) The lecture

As already mentioned above, points acquired within the seminar will account for up to 40% of the final score for the exam. The exam at the end of this term will add up to the remaining 60% to the final score. The following table gives the final grade according to the achieved score:

grade 1 grade 2 grade 3 failure
100%–86% 85%–71% 70%–56% less than 56%

Literature -
Last update: RNDr. František Mráz, CSc. (09.09.2015)

[1] Mitchell, T.: Machine Learning, McGraw Hill, 1997.

[2] Kinser, J.: Python for bioinformatics, Jones and Bartlett Publishers, Sudbury, Massachusetts, 2009

[3] Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., Lozano, J.A.: Machine learning: an indispensable tool in bioinformatics. Methods Mol Biol. 2010;593:25-48.

[4] Yang, Z. R.: Machine learning approaches to bioinformatics. Science, Engineering, and Biology Informatics - Vol. 4. World scientific, 2010

[5] Zhang, Y., Rajapakse, J. C.: Machine learning in bioinformatics. Wiley series on bioinformatics, Wiley, Hoboken, N.J., 2009

[6] Alpaydin, E.: Introduction to machine learning. 3rd ed., The MIT Press, 2014

Syllabus -
Last update: G_I (23.05.2014)

1. Data preprocessing.

2. How to compare machine learning algorithms.

3. Methods of supervised learning: classification (decision trees, Bayesian

classifiers, logistic regression, discriminant analysis, nearest neighbour, support vector machines, neural networks, combination of classifiers - boosting) and their applications in genomics, proteomics and system biology.

4. Methods of unsupervised learning: clustering (partition clustering, k-means, hierarchical clustering, validation of clustering) and its application in bioinformatics.

5. Probabilistic graphical models (Bayesian networks, Gaussian networks) and their applications (in genomics and system biology).

6. Optimization and its application in bioinformatics.

The lecture is accompanied by a seminary, where the methods from the lecture will be applied to real and artificial biological data. For implementing the algorithms there will be used mainly an interactive language Python with libraries for machine learning and processing of biological data. The seminary is completed by student projects.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html