CPT S 571 SURVEY PROJECT (Spring 2017)

(applicable only for graduate students - i.e., students registered for CPT S 571 credits)

Important Dates:

April 1, 2017: All Survey Projects decided in consultation with instructor

April {19, 21, 24, 26, 28}: Oral presentations

Presentation Schedule

April 19 (Wednesday):    Lei Cai, Arnab Mishra

Di Lena, Pietro, Ken Nagata, and Pierre Baldi. "Deep architectures for protein contact map prediction." Bioinformatics 28, no. 19 (2012): 2449-2457.

Borgwardt, Karsten M., Cheng Soon Ong, Stefan Schönauer, S. V. N. Vishwanathan, Alex J. Smola, and Hans-Peter Kriegel. "Protein function prediction via graph kernels." Bioinformatics 21, no. suppl 1 (2005): i47-i56.

April 21 (Friday):    Hongyang Gao, Saghan Mudhbari, Hao Yuan


Saeys, Yvan, Iñaki Inza, and Pedro Larrañaga. "A review of feature selection techniques in bioinformatics." bioinformatics 23, no. 19 (2007): 2507-2517.


Müller, Hans-Michael, Eimear E. Kenny, and Paul W. Sternberg. "Textpresso: an ontology-based information retrieval and extraction system for biological literature." PLoS Biol 2, no. 11 (2004): e309.


Wang, Sheng, Jian Peng, Jianzhu Ma, and Jinbo Xu. "Protein secondary structure prediction using deep convolutional neural fields." Scientific reports 6 (2016).

April 24 (Monday):    Ankita Tanwar, Zhengyang Wang, Aditi Thuse

Rodriguez-Esteban, Raul. "Biomedical text mining and its applications." PLoS Comput Biol 5, no. 12 (2009): e1000597.


Alipanahi, Babak, Andrew Delong, Matthew T. Weirauch, and Brendan J. Frey. "Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning." Nature biotechnology 33, no. 8 (2015): 831-838.

Abuín, José M., Juan C. Pichel, Tomás F. Pena, and Jorge Amigo. "SparkBWA: speeding up the alignment of high-throughput DNA sequencing data." PloS one 11, no. 5 (2016): e0155461.
April 26 (Wednesday):    James Irwin, Md. Omar Faruk, Md. Rakib Islam

Chuang, Li-Yeh, Hsiu-Chen Huang, Ming-Cheng Lin, and Cheng-Hong Yang. "Particle swarm optimization with reinforcement learning for the prediction of cpg islands in the human genome." PloS one 6, no. 6 (2011): e21036.
Slides PDF slides

Kalaitzis, Alfredo A., and Neil D. Lawrence. "A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression." BMC bioinformatics 12, no. 1 (2011): 180.



Moore, Jason H., Folkert W. Asselbergs, and Scott M. Williams. "Bioinformatics challenges for genome-wide association studies." Bioinformatics 26, no. 4 (2010): 445-455.

April 28 (Friday):    Reza Chowdhury,  Dan Jinguji, Alex Joens


Choi, Jeong-Hyeon, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert, and John K. Colbourne. "A machine-learning approach to combined evidence validation of genome assemblies." Bioinformatics 24, no. 6 (2008): 744-750.

Sastry, Anand, Jonathan Monk, Hanna Tegel, Mathias Uhlén, Bernhard O. Palsson, Johan Rockberg, and Elizabeth Brunk. "Machine Learning in Computational Biology to Accelerate High-Throughput Protein Expression." Bioinformatics (Oxford, England) (2017).
Slides  PPT   PDF  

Survey Project Report Details:

May 4, 2017: Final Survey Report due via OSBLE+ (by NOON, Thursday, May 4, 2017).

For this survey project, you will work as individuals, pick one paper that is representative of a subtopic in the broader area of bioinformatics and computational biology, and conduct a thorough evaluation of the paper. More specifically, there will be two components for this project. One is an oral presentation in the class, and the other is a written report. Details follow:

Component 1) Oral presentation (schedule of presentations):

The oral presentation should provide an overview of the whole paper to the audience. Although I don't want to enforce a rigid format, I expect all presentations to contain the following SIX core components (preferably in this order; some degree of freedom allowed):
I) Introduction and scientific mot ivation, II) A  clear statement of contributions made in the paper, III) Related body of work and their limitations, IV)  A clear statement of the problem definition (what is the input, what is the output), V) Main ideas of the solution(s) proposed (I understand that it may not be possible to cover low level algorithmic details in the time provided, but make sure you describe at least the meat of the algorithmic ideas in the paper), VI) Experimental Results and any related assessment.

Presentations will be scheduled to happen in-class during the weeks of April 18 and April 25. The exact schedule will be decided and announced early April, once all paper choices have been made.

The allotted duration for each presentation will be 15 minutes (13 mins talk + 2 mins Q&A). Presentations need to complete exactly in their times provided. The presentation itself should be in either PowerPoint or PDF and should be emailed to me (ananth@eecs.wsu.edu) by 5PM of the *previous day* of your respective presentations. I will then make them available online from the course website. For presentations please bring your own laptop. If you will need me to bring a laptop for you please let me know at least by a day in advance.

Some general tips to make your presentation effective:

- The slide material should be prepared keeping in mind that the audience is this class of students. Use the knowledge of what was covered and what was not covered in the course so far as a way to judge how to present the material to the class.

- Be creative in the way you organize your slides. E.g., don't waste time discussing an outline of your talk since this is a short talk, and instead use the time to motivate the scientific problem. There also needs to be a logical flow going from one slide to the next. Abrupt transitions often lead to more questions than answers.  Also, don't read out directly from the slides. Use the slide material to convey something to support your verbal statements.

- Use figures instead of text wherever possible. Slides crowded with text are highly discouraged. Also, use animations to guide your presentation wherever possible.

- The main goal of the presentation is to educate the students and instructor about the paper you read. There will not be anytime to go into the critical evaluation of the paper during the oral presentation and so focus instead on the details of describing the problem and its methods.

Component 2) Written Report (PDF only, SUBMIT in OSBLE+'s dropbox by NOON, THURSDAY, MAY 4, 2017):

The written report should not exceed 3 single-space 11-point font pages. This includes space for the main text alongside figures and tables. For the bibliography (alphabetically listed), you can use up to an extra 2 pages. 

The written report should be entirely yours (no text taken from the paper you read), and should primarily contain a critique of the paper you reviewed with the following Section layout addressing the stated points within them:

Section I: Significance of the contributions: In what ways do you think the paper's contributions are important? How did it advance the state of the art at its time of publication? Are the contributions still relevant today (i.e., do you see evidence of the method being still used and/or cited), or have they been superseded by newer/more recent methods? What are the primary computational challenges of the underlying problem? Is the complexity of the proposed method justified given the computational challenge(s) of the problem (or do you think there could be a much simplified solution)? Is the article technically sound (i.e., correct) or do you think there are some errors/flaws? Are there any constraints or assumptions underlying the method that would limit its applicability in practice? If it makes it any easier, you can organize this particular section as one of "Strengths" vs. "Weaknesses". In any case, make sure you touch upon all the above points in the writeup for this section.

Section II: A brief survey of related algorithms: What other algorithms are there on this topic? How are they different from this paper's method - e.g., in scope or effectiveness or performance?

Section III:Availability of software and tools:  Are there any software or implementation toolkit available for the method proposed in this paper? If so, in what form - i.e., web-based, standalone, languages/platform? Provide a web link here if available.

Section IV:Current and future trends: In your assessment of the paper what do you think are still the outstanding set of challenges on this problem? Do you have any ideas to propose potential improvements or developmental pathways for this line of research?

Section V: References: Compile a comprehensive bibliography (in any standard scientific format of your choice, i.e., IEEE, ACM, Nature, Science, etc.). The list should include *all and only those* papers that you have referred to from the main body of text.  And make sure you refer to those citations from the main body of the text in an appropriate way. For example, if you are using the IEEE format, you will be numbering the references and citing those numbers in the main texts.


Grading will be a combination of both your talk (60%) and report (40%).  It will factor in how well you have addressed all the points mentioned above.