Cpt S 471/571:    LECTURE NOTES

PDF links to lecture notes/scribes will be updated as the course progresses.

Topic Instructor Scribes/Lecture Notes Other References
Course introduction
PDF PPT
Please also go through the course policies
published on the course website.


Intro. Comp/bio & Bioinformatics Molecular Biology Primer
 
Additional Slides: PDF  PPT
Inexact/Approximate Matching
Sequence alignment introduction
& Global and Local Alignments
using Dynamic Programming
 Sequence alignment:
    Global Alignment: Needleman-Wunsch algorithm
    PDF
    Local Alignment: Smith-Waterman algorithm  PDF
Additional notes: PDF   
Example alignment: PPT

Handbook of Comp. Mol. Bio - Chapter 1
Alignment with Affine Gap Penalty function
Affine Gap Alignment: Modeling and Algorithm  PDF Handbook of Comp. Mol. Bio - Chapter 1
Space-Optimal Global Alignment Linear space optimal alignment:
     Hirschberg's technique:  PDF
     Forward propagation technique: PDF

Semi-Global algorithm Motivating the case for identifying end-to-end overlaps:
     Genome Assembly
Identifying end-gap-free alignments: Semi-global Alignment     PDF

K-band algorithm
k-banded algorithm for runtime reduction  PDF


Edit Distance
Edit distance: algorithm and properties PDF

Exact Matching
Exact Matching overview
 Motivation for identifying exact matches   PDF
   - A filtering based approach and longest common substring problem
          & relation to genome assembly   


Look-up tables
Fixed-length exact matching (using k-mers and lookup tables)   PDF
    
Database search using BLAST :  PDF    PPT
   

Handbook of Comp. Mol. Bio - Chapter 5
String "Trie" data structures
Tries and PATRICIA trees   PDF
Handbook of Comp. Mol. Bio - Chapter 5
Suffix Tree  data structure
Suffix Trees - Definition and properties   PDF Handbook of Comp. Mol. Bio - Chapter 5,6
Suffix Trees: Basic Applications
Suffix tree applications:   PDF
   - Pattern matching
   - Longest repeat problem
   - Longest Common Substring problem and Generalized suffix trees
   - Detecting palindromes for genomic applications
    - Suffix-Prefix Matches


Handbook of Comp. Mol. Bio - Chapter 5,6
Suffix Trees: Construction Naive Algorithm and Suffix Links, and McCreight's algorithm for
     linear time suffix tree construction   
     PDF

Example for S = BANANA$    PDF
Example for S = MISSISSIPPI$   PDF
Example for S=AAAAAAA$ slides 

Complexity analysis for McCreight's algorithm (showing that it is linear time)  PDF
Handbook of Comp. Mol. Bio - Chapter 5,6
Lowest Common Ancestor Algorithm
Bender-Farach algorithm for constant time LCA querying  PDF
Handbook of Comp. Mol. Bio - Chapter 5,6
Suffix Trees: More Applications
Maximal matches  Handbook of Comp. Mol. Bio - Chapters 5,6
Coronavirus genome analysis (Project 3)
   - DNA fingerprinting and similarity matrix computation  PDF
Suffix Arrays and Burrows Wheeler Transform (BWT)
BWT Index PDF
Handbook of Comp. Mol. Bio - Chapter 7
Probabilistic Modeling for Biological Sequence Analysis
Probabilistic Modeling, Markov Chains, Hidden Markov Models (HMMs)
Introduction to probabilistic modeling      PDF
Probabilities primer and Markov Chains
HMM definition, Viterbi's decoding, Forward and Backward algorithms   
Durbin et al. - Chapters 1-4
Genome-scale problems
Genome assembly: algorithms and data structures

Handbook of Comp. Mol. Bio - Chapters 8,9, 13
Phylogenetic tree reconstruction

Gusfield - Chapter 17
Course and test review  review


GENERAL READING AND REFERENCES

    - Selected chapters from the Handbook of Computational Molecular Biology (Aluru, 2006): Chapter 1, Chapter 5, Chapter 6

  - Here is a good position paper by Sean Eddy about the general direction of computational biology & bioinformatics.

    - A list of course-relevant journals in the area of bioinformatics and computational biology:

            Bioinformatics

            BMC Bioinformatics

            Genome Research

            IEEE/ACM Transactions on Computational Biology and Bioinformatics

            Journal of Computational Biology

            Nucleic Acids Research

            PLoS Computational Biology