table of contents table of contents

Home » Biology Articles » Bioinformatics » Introductory Workbook on Perl for Biology Students » I. Introduction

I. Introduction
- Introductory Workbook on Perl for Biology Students

Perl script is designed to visualize virtually the features of nucleotide strings. In the present endeavour, a few exercises are conceived to unravel the inherent features of DNA using perl script. The perl programme could be downloaded from internet and further the same can be installed in UNIX (kubuntu) platform. In the present book-let, the exercises began with the elementary arithmetic calculations and extended the same to the DNA string.  Incidentally, DNA string is one of the best biological core materials to adopt perl programme to unfold its salient features, whose vast array of information have been deciphered over the years through various biochemical tools. The language of DNA infused with four nucleotides and in a unique species-specific combination constitutes genomic DNA and the same also happens to be the source for proteins and disease causing malformed proteins. Hence, the central dogma of transcription of gene and translation of gene products are the crux for the computer languages to percolate into the biological systems. Moreover, the number of nucleotides in any species is beyond arithmetic proportion and the number of variations i.e., polymorphism within the genes and the foot prints for the transcription factors and enzymatic machinery - are all tending to increase the complexity of the function of genomic DNA in vivo.  In such a scenario, the computer programme such as perl has come to the rescue of biologists to unravel the mysteries of creation. The present book-let deals with a few exercises viz., in silico evaluation of DNA properties such as complementary strand, transcription of RNA, identification of start and stop signals, finding out percent GC and length of the each strand, concatenating two strings, joining two stings and chopping the terminals of RNA and translation of DNA genetic code into a protein string using the syntax of perl programme.


Perl is a scripting language, developed by Larry Wall in 1987, who designed perl language for UNIX environmental system. Perl is an acronym (precisely, retronym) and stands for Practical Extraction and Report Language.  In the jargon of computer science, the scripting languages are often called interpreted languages.  However, perl is both a compiled and interpreted language and hence facilitates to modify the perl scripts instantaneously than in any other programming languages. Perl is ubiquitous and a powerful language which assists to write structured programmes, advanced data structures and object oriented programme.


Unix (kubuntu) administrator is one of the best choices to write perl script.  The first line of the script is called ‘shebang' with hash and exclamation mark (#!/usr/bin/per w).  The symbol ‘#' at the start of the line indicates that the respective line constitutes a comment. The programme lines prefixed with $ and @ constitute either commands or arguments. Two strings of random nucleotides' sequences are written in a text file as shown in Appendix.  Later, executable text files are named as ‘', i.e., individual exercises.  The retrieving of strings of nucleotides from the text file is done using scalar variable commands, followed by ‘chomp' and ‘join' functions to make the array of strings as a single string.  The present conceived exercises are designed to work with one string and also array of variables.  They represent scalar data and list data respectively.  The simplest example chosen here is the two strings of nucleotides written in a text file "krupa.seq". A few short steps in perl programme are designed to retrieve one string to begin with and later two strings using ‘$' and ‘@' array commands respectively.  The two strings are brought into one continuous line and made them joined using perl commands such as ‘chomp' and ‘join' respectively.  Considering the joined string as a template, the following parameters of DNA are virtually derived viz., length, complementary strand, transcribed strand, substitution of ‘start' and ‘stop' codons, total count of nucleotides in the transcribed string, GC count, and percent of GC and chop function. The last exercise deals with the retrieving of nucleotide sequences of the gene of our choice from internet using Web sources. These exercises invariably provide an impetus for the beginners to practice the perl script to visualize in silico features of DNA.

rating: 3.70 from 116 votes | updated on: 30 Jan 2009 | views: 128226 |

Rate article: