Computational Approach for Mining Simple Sequence Repeats in Expressed Sequence Tags
Abstract
Expressed sequence tags, the short sequences of cDNA are mined for identifying and characterizing simple sequence repeats for studying genetic variations. Web-based tools due to lack of server maintenance, become unusable; also few available stand-alone tools lack processing adequateness. Therefore with the intent to process multiple expressed sequence tag files without size limitations, proper validations, and the ability to retrieve more genome-related features; a simple to use, speed efficient portable standalone tool has been developed. The algorithm is implemented in Java using microsatellite search algorithm, with dictionary-based approach MISA – Perl script, called via command line for data mining. Another parallel module retrieves additional information from GenBank files. In the pipeline primer 3 was invoked for designing batch primers. This algorithm with an extended interface in Java Net Beans provides naïve users with a simple interactive tool for mining microsatellites, statistical analysis, and primer designing on one platform in the form of a stand-alone application. The number of repeats/ interruptions parameters can be reset through the graphical interface. This tool has interactive modules with proper validations; batch processing and cost-effective analysis of tandem repeats as compared to peers, the source code can be upgraded in the future.