Cpt S 471/571: LECTURE NOTES
Topic | Instructor Scribes/Lecture Notes | Other References |
Course introduction |
PDF PPT Please also go through the course policies published on the course website. |
|
Intro. Comp/bio & Bioinformatics |
Molecular Biology Primer |
Additional Slides: PDF PPT |
Inexact/Approximate
Matching |
||
Sequence alignment introduction & Global and Local Alignments using Dynamic Programming |
Sequence
alignment: Global Alignment: Needleman-Wunsch algorithm PDF Local Alignment: Smith-Waterman algorithm PDF |
Additional
notes: PDF
Example alignment: PPT Handbook of Comp. Mol. Bio - Chapter 1 |
Alignment with Affine Gap Penalty
function |
Affine Gap Alignment: Modeling and Algorithm PDF | Handbook of Comp. Mol. Bio - Chapter 1 |
Space-Optimal Global Alignment | Linear space optimal alignment: Hirschberg's technique: PDF Forward propagation technique: PDF |
|
Semi-Global algorithm | Motivating the case for identifying
end-to-end overlaps: Genome Assembly Identifying end-gap-free alignments: Semi-global Alignment PDF |
|
K-band algorithm |
k-banded algorithm for runtime
reduction PDF |
|
Edit Distance |
Edit distance: algorithm and
properties PDF |
|
Exact
Matching |
||
Exact Matching overview |
Motivation for identifying
exact matches PDF - A filtering based approach and longest common substring problem & relation to genome assembly |
|
Look-up tables |
Fixed-length
exact matching (using k-mers and lookup tables) PDF Database search using BLAST : PDF PPT |
Handbook
of Comp. Mol. Bio - Chapter 5 |
String "Trie" data structures |
Tries and PATRICIA trees
PDF |
Handbook
of Comp. Mol. Bio - Chapter 5 |
Suffix Tree data structure |
Suffix Trees - Definition and properties PDF | Handbook
of Comp. Mol. Bio - Chapter 5,6 |
Suffix Trees: Basic Applications |
Suffix tree applications:
PDF - Pattern matching - Longest repeat problem - Longest Common Substring problem and Generalized suffix trees - Detecting palindromes for genomic applications - Suffix-Prefix Matches |
Handbook of Comp. Mol. Bio - Chapter 5,6 |
Suffix Trees: Construction | Naive Algorithm and Suffix Links, and
McCreight's algorithm for linear time suffix tree construction Example for S = BANANA$ PDF Example for S = MISSISSIPPI$ PDF Example for S=AAAAAAA$ slides Complexity analysis for McCreight's algorithm (showing that it is linear time) PDF |
Handbook of Comp. Mol. Bio - Chapter 5,6 |
Lowest Common Ancestor Algorithm |
Bender-Farach algorithm for constant
time LCA querying PDF |
Handbook of Comp. Mol. Bio - Chapter 5,6 |
Suffix Trees: More Applications |
Maximal matches | Handbook
of Comp. Mol. Bio - Chapters 5,6 |
Coronavirus genome analysis (Project
3) |
- DNA fingerprinting and similarity matrix computation PDF | |
Suffix Arrays and Burrows Wheeler
Transform (BWT) |
BWT Index PDF |
Handbook
of Comp. Mol. Bio - Chapter 7 |
Probabilistic
Modeling for Biological Sequence Analysis |
||
Probabilistic Modeling, Markov
Chains, Hidden Markov Models (HMMs) |
Introduction to probabilistic
modeling PDF Probabilities primer and Markov Chains HMM definition, Viterbi's decoding, Forward and Backward algorithms |
Durbin et al. - Chapters 1-4 |
Genome-scale
problems |
||
Genome assembly: algorithms and data
structures |
Handbook of Comp. Mol. Bio - Chapters 8,9, 13 | |
Phylogenetic tree reconstruction |
Gusfield - Chapter 17 | |
Course and test review | review |
GENERAL READING AND REFERENCES
- Selected chapters from the Handbook of Computational Molecular Biology (Aluru, 2006): Chapter 1, Chapter 5, Chapter 6
- Here is a good position paper by Sean Eddy about the general direction of computational biology & bioinformatics.
- A list of course-relevant journals in the area of bioinformatics and computational biology:
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Journal of Computational Biology