Login

Join for Free!
112304 members
table of contents table of contents

Biology Articles » Bioinformatics » Introductory Workbook on Perl for Biology Students » 6. Perl program for calculating the length, total bases, GC and AT counts

6. Perl program for calculating the length, total bases, GC and AT counts
- Introductory Workbook on Perl for Biology Students

The objective of the program is to calculate the length of the given DNA sequence and to find out the total number of dinucleotides’ sequences viz GCs and ATs in the DNA.

 

At first we assign the value of the DNA to a scalar variable called $DNA. The length of the DNA could be obtained by just using a command called “length”, result of which is stored in $length.   The number of ‘A’ bases in the DNA are calculated by calculating the number of translations occurred by translating the ‘A’ to NULL in line 6.  $DNA=~tr/A// translates the ‘A’ to NULL and by putting it in brackets and assigning it to another variable gives the number of occurrences. Similarly we get the number of ‘C’, ‘G’, ‘T’ bases. We then can total all the bases with the use of ‘+’ as in normal arithmetic calculations. It is same with subtraction, multiplication and division. To count the number of GCs in the DNA (line 13) we use substitute the GC with GC itself as we just need the number of occurrences but do not want to distort the DNA. As substitution would change the DNA values but the translation does not, it only translates the value to a new one. We did not use the translation in line 13 as we have more than one letter that is a word. Similarly we would also obtain the number of ATs.

 

The percentage of GCs is got by using the percentage formula.

 

  1. #Calculating the length, total nucleotides, dinucleotide sequence GC and AT counts
  2. $DNA="TACCGTGTAAGCTGCGTATGCGATCGTACGCGTGTGCGGT";
  3. #length of DNA
  4. ($length=length$DNA);
  5. print"the length of DNA $length\n";
  6. $a=($DNA=~tr/A//);
  7. $b=($DNA=~tr/C//);
  8. $c=($DNA=~tr/G//);
  9. $d=($DNA=~tr/T//);
  10. $Total=$a+$b+$c+$d;
  11. print"total bases in DNA $Total:\n";
  12. #count of GC
  13. $GC=($DNA=~s/GC/GC/g);
  14. print"the total number of dinucleotide GC in DNA :$GC:\n";
  15. #count of AT
  16. $AT=($DNA=~s/AT/AT/g);
  17. print"the total number of dinucleotide AT in DNA:$AT:\n";
  18. #percentage of GC
  19. $GCper=($GC/($Total)*100);
  20. print"the percentage of GC: $GCper:\n";
  21. exit;

 

RESULTS:

 

the length of DNA 40

total bases in DNA 40:

the total number of dinucleotide GC in DNA :5:

the total number of dinucleotide AT in DNA:2:

the percentage of GC:12.5:


rating: 3.71 from 112 votes | updated on: 30 Jan 2009 | views: 90575 |

Rate article:







excellent!bad…