RESEARCH
My primary research interests lie at the intersection of computer science and biology, more specifically computational genomics. I am primarily interested in the design and development of efficient algorithms and scalable software tools for the analysis of genomic data. The breadth of problem areas and applications of interest are as follows:
(i) Genome and repetitive pattern discovery: Assembling a genome from its numerous shreds and fragments is a computationally challenging task. Plant genomes are particularly challenging because of their highly complex genomic structure and evolutionary history. We have developed a software system called PaCE, which can efficiently exploit thousands of processors and their memory for the clustering and assembly of millions of genomic fragments. The software has been successfully applied for maize gene-enriched assembly and in the maize genome sequencing consortium (news release). The time to solution is drastically reduced from tens of days to a matter of hours. It is also used in the clustering of millions of Expressed Sequence Tags. I am also involved in the development of pattern discovery tools for the de novo identification of structurally categorized and unknown (novel) repetitive substructures within genomes.
(ii) Comparative genomics: Comparing multiple genomes and multiple genomic loci provides valuable insights into the genomic differentiators and similarities across organisms. I am interested in studying synteny and genome rearrangements among genomes from a diverse set of species. In collaboration with the Dr. Amit Dhingra’s (WSU) laboratory, we are developing new comparative techniques in the context of enabling PCR-based sequencing for organellar genomes. An on-going project in this collaboration is the apple genome initiative (news release).
(iii) Gene to function (association) mapping: Identifying the gene(s) responsible for a key functional trait is a fundamental problem in genomics. In collaboration with Dr. Kulvinder Gill’s (WSU) laboratory, we have been looking at wheat marker data to identify statistically significant correlations that may exist between genes/marker data and observed functional traits.
(iv) Metagenomic analysis: Metagenome is a collective term representing the pool of microbial genomes collected from environment samples. I am interested in developing new analytical and computational capabilities that would enable the profiling and understanding the genomic content of community data.
(v) High-performance computing: With every new breakthrough in sequencing and other wetlab technologies, there has been an avalanche of biological data deposited in public databases. Computational tools are therefore becoming an indispensable resource for automated hypothesis testing, modeling and discovery. If analysis has to keep pace with the data generation then the development of high-performance computing (HPC) solutions becomes imperative. To this end, a general emphasis in my research is to develop HPC solutions suited for exploiting the high compute power and memory capacities of the state-of-the-art supercomputing technologies.
Currently, my research is supported in parts by WSU Office of Research, WSU Foundation, and the School of EECS.