table of contents table of contents

Home » Biology Articles » Bioinformatics » Introductory Workbook on Perl for Biology Students » 12. In silico Salient features of DNA

12. In silico Salient features of DNA
- Introductory Workbook on Perl for Biology Students

This program aims to demonstrate a few salient features of DNA got from a file. We first store the file name in a scalar variable $DNA. We open the file and read the first line of the file and store it $DNA2. As it is scalar it holds only one value at a time and hence the other lines of the file cannot be stored. We again open the same file but this time store the contents of the file in an array so that all the contents can be read and stored as values of the array @DNA.


We do a chomp on the array (@DNA) to remove the “enter” character. The “chop” command is used to remove the last character of a particular value. We demonstrate here the use of chop. When we use chop on the array @DNA (line 16), the last character of each value of the array is deleted or removed.


We use the “join” command to make one complete string of DNA from the array values. In line 22, a complementary of the DNA string ($DNA3) is got by translating A to T, C to G, G to C and T to A and saved again in $DNA3. A point to note here is that the translation works only letter by letter and not with words. We then again translate T to U to get the RNA. We now find the length of RNA by using the command called “length”. We then find out the total number of bases or nucleotides in the string by calculating the number of occurrences occurred while translating each of the nucleotides to a null value (see line 31 to 34) and totaling all the individual values.


We find out the GC percentage by calculating the number of occurrences while substituting the GC with itself, dividing it by total and multiplying it by 100. We substitute it by itself as we do not want to disturb the DNA structure but would want to know how many GCs are present. The number of adenines are already calculated in the line 31 we just do a copy to $A. We then use the substitute command to substitute the AUG and UAG to start and stop respectively.


  1. #file handler for single strand
  2. $DNA="sample.seq";
  3. open (FILE,$DNA);
  4. $DNA2=<FILE>;
  5. close FILE;
  6. print"the sequence of single strand of DNA:$DNA2\n";
  7. #file handler for multiple strands of DNA
  8. open (FILE,$DNA);
  9. @DNA=<FILE>;
  10. close FILE;
  11. print"the sequences of multiple strands of DNA:@DNA\n";
  12. #Chomp the multiple strands of DNA'
  13. chomp (@DNA);
  14. print"the result of chomp:@DNA\n";
  15. #chop the  sequences
  16. chop (@DNA);
  17. print"the result of chop function:@DNA\n";
  18. #join the two strands
  19. $DNA3=join('',@DNA);
  20. print"the result of join:$DNA3\n";
  21. #substitute for making a complementary copy of the joined strand
  22. $DNA3=~(tr/ACGT/TGCA/);
  23. print"the complementary result:$DNA3\n";
  24. #to make a transcribed copy of the string
  25. $DNA3=~(tr/T/U/);
  26. print"the transcribed RNA:$DNA3\n";
  27. #total length of RNA
  28. $length=length($DNA3);
  29. print"the length of RNA:$length\n";
  30. #total number of nucleotides
  31. $a=($DNA3=~tr/A//);
  32. $b=($DNA3=~tr/C//);
  33. $c=($DNA3=~tr/G//);
  34. $d=($DNA3=~tr/U//);
  35. $Total=$a+$b+$c+$d;
  36. print"the total nucleotides:in RNA:$Total\n";
  37. #percent GC count
  38. $GCcount=($DNA3=~s/GC/GC/g);
  39. print"the total number of GC in DNA :$GCcount:\n";
  40. $GCper=($GCcount/($Total)*100);
  41. print"the GC percentage:$GCper\n";
  42. #number of A
  43. $A=($a);
  44. print"the total number of Adenines:$A\n";
  45. #substitute the start and stop codons
  46. $DNA3=~(s/AUG/start/g);
  47. $DNA3=~(s/UAG/Stop/g);
  48. print"the start and stop codon:$DNA3\n";
  49. #chop the terminal nucleotide of RNA
  50. chop ($DNA3);
  51. print"the result of chop:$DNA3\n";
  52. exit;





the sequence of single strand of DNA:



the sequences of multiple strands of DNA:





the result of join:


the result of complementary:


the transcribed RNA:


the length of RNA:64

the total nucleotides:in RNA:64

the total number of GC in DNA :7:

the GC percentage:10.9375

the total number of Adenines:18

the start and stop codon:


the result of chop:


rating: 3.70 from 116 votes | updated on: 30 Jan 2009 | views: 129327 |

Rate article: