CPT S 580: FUNDAMENTAL ALGORITHMS IN COMPUTATIONAL GENOMICS
SPRING 2008
LOCATION: Sloan 32
TIME: MW 16:15-17:30
INSTRUCTOR: ANANTH KALYANARAMAN, EME 237, Pullman
CONTACT EMAIL: ananth@eecs.wsu.edu
ANNOUNCEMENTS
- (4/16) Today' class is in SLOAN 9.
- (4/3) Survey presentation schedules are posted below. The location is SLOAN 175.
- (4/1) Presentation Dates: 4/21 and 4/23. Final project report due on 5/2 by 9am.
- (3/19) Survey project topics and guidelines have been posted here
- (3/19) HW3 has been posted below
PREREQUISITES
* Familiarity with design and analysis of algorithms is required (CptS 450 or
equivalent).
* Familiarity with basic probability concepts.
* NO prior background in biology is assumed. Basic concepts required to understand and appreciate the importance of a problem will be described on a need-to-know basis, while the primary emphasis will be on the design of algorithmic techniques and applying them on real data.
* Some programming experience in modular/OOP languages (eg., C/C++). This is not a requirement although it is one that may prove handy for describing algorithms in a pseudocode format.
Interested students who do not meet the above criteria are highly encouraged to contact the instructor to discuss the possibility of enrolling into the course. The course is also open to CptS/CptE undergraduate seniors satisfying the above criteria, with prior permission from the instructor.
REFERENCE TEXTS
The primary course materials will be lecture slides, handouts and class notes. The following texts are suggested for further reference and reading.
* Handbook of Computational Molecular Biology, Srinivas Aluru (Editor), 2005, ISBN: 1584884061
* Biological Sequence Analysis: Probabilistic Models of Protein and Nucleic Acids, Durbin et al. 1999, ISBN: 0521629713
* Algorithms on strings, trees and sequences: Computer Science and Computational Biology, Dan Gusfield, 1997, ISBN: 0521585198
COURSE POLICY
The grade policy will be: 15% midterm exam, 45% homeworks, 35% survey project, 5% class participation.
Homework policy: All homeworks and assignments should be done individually unless it is explicitly stated that collaboration is allowed in the homework problem set. "Collaboration" is defined as a constructive discussion involving a maximum of 3 individuals, with the discussion aimed at obtaining better understanding of the problem and general/broad approach ideas that can lead to a solution. No sharing of answers is permitted and the writing should be your own. Collaboration should be acknowledged in the answer sheet and no points will be deducted for that. Any deviation from this policy will be treated as cheating and will be subject to academic dishonesty code.
Exam policy: Closed-book, closed-notes, comprehensive
HOMEWORKS
- HW1 PDF
- HW2 PDF
- HW3 PDF
SURVEY PROJECTS
Survey project topics and guidelines have been posted here
Final Presentation Schedule: (Sloan 175)
4/21/2007 ********* 16:15-16:35 - Linmin Yang "Gene Regulatory Networks" 16:40-17:00 - Vandhana Krishnan "Repeat Identification"
17:05-17:25 - Gaurav Kulkarni "De novo Peptide Identification"
4/23/2007 *********
16:15-16:35 - Pawan Agarwal "Haplotype Inference" 16:40-17:00 - Geetika Singla "Machine Learning for Bioinformatics"
COURSE SYLLABUS
Course introduction (PPT )
Introduction to computational biology and bioinformatics (PPT)
- Sequence alignments
- dynamic programming (Handout of an example alignment)
- alignment models: global/local/semi-global/affine gap penalty
- space-optimal global alignment
- K-band algorithm
- Reading: Aluru - Chapter 1
Probabilistic sequence alignment
- Hidden Markov Models
- Viterbi decoding, Forward and Backward algorithms
- HMM-based sequence alignment
- Reading: Durbin et al. - Chapters 1-4
Exact Matching
- Look-up tables and string indices
- Suffix trees, construction algorithms (McCreights, Weiner, Ukkonen)
- Suffix tree applications: Pattern matching, longest common substring, LCA (Bender-Farach), RNAi elements
- Suffix array and LCP array, space efficiency
- Reading: Aluru - Chapters 5, 6
Genome scale problems
- Sequence assembly, genome sequencing, Lander-Waterman model
- Comparative genomics, syntenic alignments, MUMer
- Reading: Aluru - Chapters 8,9, 13
Phylogenetics
- Evolutionary trees, ultrametric trees
- The perfect phylogeny problem
- Reading: Gusfield - Chapter 17
GENERAL READING AND REFERENCES
- Here is a good position paper by Sean Eddy about the general direction of computational biology & bioinformatics.
- A hardly comprehensive but course-relevant list of Journals in the area:
Journals:
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Journal of Computational Biology
ARCHIVE OF OLD COURSE WEBPAGES
- (2/20) HW2 has been posted below
- (1/30) HW1 has been posted below
- (1/18) LOCATION CHANGED:
The new course location is Sloan 32 for the rest of this semester.