Open Access Open Access  Restricted Access Subscription Access

Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(Bits/Base) of Genomes


Affiliations
1 DMSSVH College of Engineering, India
2 Jawaharlal Nehru Technological University, India
 

We present a Compression Tool , GenBit Compress", for genetic sequences based on our new proposed "GenBit Compress Algorithm". Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better compression results show that GenBit compress algorithm is the best among the remaining Genome compression algorithms for non-repetitive DNA sequences in Genomes. The standard Compression algorithms such as gzip or compress cannot compress DNA sequences but only expand them in size. In this paper we consider the problem of DNA compression. It is well known that one of the main features of DNA Sequences is that they contain substrings which are duplicated except for a few random Mutations. For this reason most DNA compressors work by searching and encoding approximate repeats. We depart from this strategy by searching and encoding only exact repeats. our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. As long as 8 lakh characters can be given as input While achieving the best compression ratios for DNA sequences, our new GenBit Compress program significantly improves the running time of all previous DNA compressors. Assigning binary bits for fragments of DNA sequence is also a unique concept introduced in this program for the first time in DNA compression.

Keywords

Compression, Biocompress, Gencompress, Compression Ratio, Encode, Decode.
User
Notifications
Font Size

Abstract Views: 192

PDF Views: 121




  • Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(Bits/Base) of Genomes

Abstract Views: 192  |  PDF Views: 121

Authors

P. Raja Rajeswari
DMSSVH College of Engineering, India
Allam Appa Rao
Jawaharlal Nehru Technological University, India

Abstract


We present a Compression Tool , GenBit Compress", for genetic sequences based on our new proposed "GenBit Compress Algorithm". Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better compression results show that GenBit compress algorithm is the best among the remaining Genome compression algorithms for non-repetitive DNA sequences in Genomes. The standard Compression algorithms such as gzip or compress cannot compress DNA sequences but only expand them in size. In this paper we consider the problem of DNA compression. It is well known that one of the main features of DNA Sequences is that they contain substrings which are duplicated except for a few random Mutations. For this reason most DNA compressors work by searching and encoding approximate repeats. We depart from this strategy by searching and encoding only exact repeats. our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. As long as 8 lakh characters can be given as input While achieving the best compression ratios for DNA sequences, our new GenBit Compress program significantly improves the running time of all previous DNA compressors. Assigning binary bits for fragments of DNA sequence is also a unique concept introduced in this program for the first time in DNA compression.

Keywords


Compression, Biocompress, Gencompress, Compression Ratio, Encode, Decode.