table of contents
An overview, benefits and methodology of HGP
Biology Articles » Genetics » Human Genome Project » Methodology
- Human Genome Project
Shotgun sequencing is a method used in genetics for sequencing long DNA strands. Since the chain termination method of DNA sequencing can only be used for fairly short strands, it is necessary to divide longer sequences up and then assemble the results to give the overall sequence. In chromosome walking, this division is done by progressing through the entire strand, piece by piece; shotgun sequencing uses a faster, but more complex, process to assemble random pieces of the sequence. In shotgun sequencing, DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a contiguous sequence.For example, consider the following two rounds of shotgun reads:
Original strand : AGCATGCTGCAGTCATGCTTAGGCTA
First round of shotgun reads : AGCATGCTGCAG
Second round of shotgun reads : TTAGGCTA
In this extremely simplified example, the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally aggravated by the great abundance of repetitive sequence, meaning similar short reads could come from completely different parts of the sequence.Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater coverage; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome.
Whole genome shotgun sequencingHigh-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an end-read or read and two reads from the same clone are referred to as mate pairs. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads(N), and the average read length(L) as NL/G. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x coverage.
HistoryThe idea for the shotgun technique came from the use of an algorithm that combined sequence information from many small fragments of DNA to reconstruct a genome. This technique was pioneered by Frederick Sanger to sequence the genome of the Phage Φ-X174, a tiny virus called a bacteriophage that was the first fully sequenced genome in 1977. The technique was called shotgun sequencing because the genome was broken into millions of pieces as if it had been blasted with a shotgun. In order to scale up the method, both the sequencing and genome assembly had to be automated, as they were in the 1980s.The modern whole genome shotgun technique came into its own with the sequencing of the first free-living organism, the 1.8 million base pair genome of the bacteria Haemophilus influenzae in 1995. It involved the use of automated sequencers, longer individual sequences using approximately 500 base pairs at that time. Paired sequences separated by a fixed distance of around 2000 base pairs. Which were critical elements enabling the development of the first genome assembly programs for reconstruction of this bacterial genome.Three years later, in 1998, the announcement by the newly-formed Celera Genomics that it would scale up the shotgun sequencing method to the human genome was greeted with much skepticism in some circles.
How it was accomplishedThe Celera group used the technique denoted as the “whole-genome shotgun” technique. The shotgun technique breaks the DNA into fragments of various sizes, ranging from 2,000 to 150,000 base pairs in length, forming what is called a DNA "library". Using an automated DNA sequencer the DNA is read in 800bp lengths from both ends of each fragment. This method became a standard approach to the sequencing and assembly of bacterial genomes beginning in 1995, when the first bacterial genome, Haemophilus influenzae, was sequenced. Using a complex genome assembly algorithm and a powerful computer, the pieces are combined and the genome can be reconstructed from the millions of short, 800 base pair fragments.
SequencingIn genetics and biochemistry, sequencing means to determine the primary structure (or primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.
DNA sequencingDNA sequencing is the process of determining the nucleotide order of a given DNA fragment. Currently, most DNA sequencing is performed using the chain termination method developed by Frederick Sanger. This technique uses sequence-specific termination of a DNA synthesis reaction using modified nucleotide substrates. However, new sequencing technologies such as 454 Sequencing and Pyrosequencing are gaining an increasing share of the sequencing market.The sequence of DNA encodes the necessary information for living things to survive and reproduce. Determining the sequence is therefore useful in 'pure' research into why and how organisms live, as well as in applied subjects. Because of the key nature of DNA to living things, knowledge of DNA sequence may come in useful in practically any biological research. For example, in medicine it can be used to identify, diagnose and potentially develop treatments for genetic diseases. Similarly, research into pathogens may lead to treatments for contagious diseases. Biotechnology is a burgeoning discipline, with the potential for many useful products and services.
Sanger sequencingIn chain terminator sequencing (Sanger sequencing), extension is initiated at a specific site on the template DNA by using a short oligonucleotide 'primer' complementary to the template at that region. The oligonucleotide primer is extended using a DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain terminating nucleotide (most commonly a di-deoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used. The fragments are then size-separated by electrophoresis in a slab polyacrylamide gel, or more commonly now, in a narrow glass tube (capillary) filled with a viscous polymer.
Dye terminator sequencingAn alternative to the labeling of the primer is to label the terminators instead, commonly called 'dye terminator sequencing'. The major advantage of this approach is the complete sequencing set can be performed in a single reaction, rather than the four needed with the labeled-primer approach. This is accomplished by labeling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different wavelength. This method is easier and quicker than the dye primer approach, but may produce more uneven data peaks (different heights), due to a template dependent difference in the incorporation of the large dye chain-terminators. This problem has been significantly reduced with the introduction of new enzymes and dyes that minimize incorporation variability.This method now used for the vast majority of sequencing reactions as it is both simpler and cheaper. The major reason for this is that the primers do not have to be separately labeled (which can be a significant expense for a single-use custom primer), although this less of a concern with frequently used 'universal' primers.
454 Sequencing454 Sequencing, which was developed for commercial use in the early 2000s by 454 Life Sciences, uses a technique similar to pyrosequencing to sequence roughly 20 megabases in a 4.5-hour run of their sequencing machine. In this method, single-stranded DNA is annealed to beads and amplified via emPCR. These DNA-bound beads are then placed into wells on a fiber-optic chip along with enzymes which produce light in the presence of ATP. When free nucleotides are washed over this chip, light is produced as ATP is generated when nucleotides join with their complementary base pairs. Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded by the CCD camera in the instrument. The signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow.
RNA sequencingRNA is less stable in the cell, and also more prone to nuclease attack experimentally. As RNA is generated by transcription from DNA, the information is already present in the cell's DNA. However, it is sometimes desirable to sequence RNA molecules. In particular, in Eukaryotes RNA molecules are not necessarily co-linear with their DNA template, as introns are excised. To sequence RNA, the usual method is first to reverse transcribe the sample to generate DNA fragments. This can then be sequenced as described above.
rating: 3.54 from 39 votes | updated on: 25 Feb 2008 | views: 33088 |