Assefaw Gebremedhin, Papers on Bioinformatics

Title: RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
Authors: H.N. Catanese, K.A. Brayton, A.H. Gebremedhin
Status: BMC Genomics 2016 17:422.


Background: Short-sequence repeats (SSRs) occur in both prokaryotic and eukaryotic DNA, inter- and intragenically, and may be exact or inexact copies. When heterogeneous SSRs are present in a given locus, we can take advantage of the pattern of different repeats to genotype strains based on the SSRs. Cataloguing and tracking these repeats can be difficult as diverse groups of researchers are involved in the identification of the repeats. Additionally, the task is error-prone when done manually.

Results: We developed RepeatAnalyzer, a new software tool capable of tracking, managing, analysing and cataloguing SSRs and genotypes using Anaplasma marginale as a model species. RepeatAnalyzer's analysis capability includes novel metrics for measuring regional genetic diversity (corresponding to variety and regularity of SSR occurrence). As a part of its visualization capabilities, RepeatAnalyzer produces high quality maps of the geographic distribution of genotypes or SSRs over a region of interest. RepeatAnalyzer's repeat identification functionality was validated for all SSRs and genotypes reported in 21 publications, using 380 A. marginale isolates gathered from the five publications within that list that provided access to their isolates. The tool produced accurate genotyping results in every case. In addition, it uncovered a number of errors in the published literature: 11 cases where SSRs were misreported, 5 cases where two different SSRs had been given the same name, and 16 cases where two or more names had been given to a single SSR. The analysis and visualization functionalities of the tool are demonstrated using several examples.

Conclusions: RepeatAnalyzer is a robust software tool that can be used for storing, managing, and analysing short-sequence repeats for the purpose of strain identification. The tool can be used for any set of SSRs regardless of species. When applied to A. marginale, our test case, we show that genotype lengths for a given region follow a normal distribution, while SSR frequencies follow a power-law-like distribution. Further, we find that over 90 % of repeats are 28 to 29 amino acids long, which is in agreement with conventional wisdom. Lastly, our analysis reveals that the most common edit distance is five or six, which is counter-intuitive since we expected that result to be closer to one, resulting from the simplest change from one repeat to another.

Title: Characterization of Anaplasma marginale subspecies centrale using msp1aS genotyping reveals wildfire reservoir
Authors: Z.T.H. Khumalo, H.N. Catanese, N. Leisching, P. Hove, N.E. Collins, M.E. Chaisit, A.H. Gebremedhin, M.C. Oosthuizen and K.A. Brayton
Status: Journal of Clinical Microbiology, 2016 54:10, 2503-2512.


Bovine anaplasmosis caused by the intraerythrocytic rickettsial pathogen Anaplasma marginale is endemic in South Africa. Anaplasma marginale subspecies centrale (A. centrale) also infects cattle, however, it causes a milder form of anaplasmosis and is used as a live vaccine against A. marginale. There has been less interest in the epidemiology of A. centrale, and, as a result, there are few reports detecting natural infections of this organism. When detected in cattle, it is often assumed that it is due to vaccination, and in most cases it is reported as co-infection with A. marginale without characterization of the strain. In this study a total of 380 blood samples from wild ruminant species and cattle collected from Biobanks, National Parks, and other regions of South Africa were used in duplex real-time PCR assays to simultaneously detect A. marginale and A. centrale. PCR results indicated high occurrence of A. centrale infections ranging from 25-100% in National Parks. Samples positive for A. centrale were further characterized using the msp1aS gene, a homolog of msp1a of A. marginale which contains repeats at the 5' end that are useful for genotyping strains. A total of 47 Msp1aS repeats were identified which corresponded to 32 A. centrale genotypes detected in cattle, buffalo and wildebeest. RepeatAnalyzer was used to examine strain diversity. Our results demonstrate a diversity of A. centrale strains from cattle and wildlife hosts from South Africa and indicate the utility of msp1aS as a genotypic marker for A. centrale strain diversity.