[phpBB Debug] PHP Warning: in file /home/biologyonline/public_html/kb/print.php on line 19: include(../common.php): failed to open stream: No such file or directory
[phpBB Debug] PHP Warning: in file /home/biologyonline/public_html/kb/print.php on line 19: include(): Failed opening '../common.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php')
Print - Biology-Online

Human Genome Project


By Sachin Chorge

Submitted on January 2008

Accepted on February 2008 

The Human Genome Project (HGP) is a project to map and sequence the 3 billion nucleotides contained in the human genome and to identify all the genes present in it. There are currently two human genome projects: the first is the international HGP which is being produced by a group of international government bodies and organisations, and the second by a private company Celera Genomics.The Human Genome Project (HGP) was one of the great feats of exploration in history - an inward voyage of discovery rather than an outward exploration of the planet or the cosmos; an international research effort to sequence and map all of the genes - together known as the genome - of members of our species, Homo sapiens. Completed in April 2003, the HGP gave us the ability to, for the first time, to read nature's complete genetic blueprint for building a human being.

Human Genome

The human genome is the genome of Homo sapiens, which is composed of 24 distinct chromosomes (22 autosomal + X + Y) with a total of approximately 3 billion DNA base pairs containing an estimated 20,000-25,000 genes. The Human Genome Project produced a reference sequence of the euchromatic human genome, which is used worldwide in biomedical sciences. The human genome is much more gene-sparse than was initially predicted at the outset of the Human Genome Project, with only about 1.5% of the total length serving as protein-coding exons.There are estimated 20,000-25,000 human protein-coding genes. The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and gene finding methods have improved, and could continue to drop further. Protein-coding sequences (specifically, coding exons) comprise less than 1.5% of the human genome. Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the human genome size. Much of this is comprised of repeat elements, transposons, and pseudogenes, but there is also a large amount of sequence that does not fall under any known classification.Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as "junk" DNA. First, it is important to realize that the central regions of each chromosome, known as centromeres, are highly repetitive DNA sequences that are difficult to sequence using current technology. The centromeres are millions (possibly tens of millions) of base pairs long, and for the most part these are entirely unsequenced. Second, the ends of the chromosomes, called telomeres, are also highly repetitive, and for most of the 46 chromosome ends these too are incomplete. We do not know precisely how much sequence remains before we reach the telomeres of each chromosome, but as with the centromeres, current technology does not make it easy to get there. It is likely that the centromeres and telomeres will remain unsequenced until new technology is developed that allows us to sequence them. Other than these regions, there remain a few dozen gaps scattered around the genome, some of them rather large, but there is hope that all these will be closed in the next couple of years.

In summary: our best estimates of total genome size indicate that we have completed about 92% of the genome. Most of the remaining DNA is highly repetitive and unlikely to contain genes, but we cannot truly know until we sequence all of it. Understanding the functions of all the genes and their regulation is far from complete. The roles of junk DNA, the evolution of the genome, the differences between individuals and races, and many other questions are still the subject of intense study by laboratories all over the world.

International HGP

The Project was launched in 1986 by Charles DeLisi, who was then Director of the US Department of Energy's Health and Environmental Research Programs. He was later awarded the Citizen's medal by President Clinton for his seminal role in the Project. The goals and general strategy of the Project were outlined in a two-page memo to the Assistant Secretary in April 1986, which helped garner support from the DOE, the United States Office of Management and Budget (OMB) and the United States Congress, especially Senator Pete Domenici. A series of Scientific Advisory meetings, and complex negotiations with senior Federal officials resulted in a line item for the Project in the 1987 Presidential budget submission to the Congress.Initiation of the Project was the culmination of several years of work supported by the US Department of Energy, in particular a feasibility workshop in 1986 and a subsequent detailed description of the Human Genome Initiative in a report that led to the formal sanctioning of the initiative by the Department of Energy. This 1987 report stated boldly, "The ultimate goal of this initiative is to understand the human genome" and "Knowledge of the human genome is as necessary to the continuing progress of medicine and other health sciences as knowledge of human anatomy has been for the present state of medicine." Candidate technologies were already being considered for the proposed undertaking at least as early as 1985. James D. Watson was Head of the National Center for Human Genome Research at the National Institutes of Health (NIH) in the United States starting from 1988. Largely due to his disagreement with his boss, Bernadine Healy, over the issue of patenting genes, he was forced to resign in 1992. He was replaced by Francis Collins in April 1993 and the name of the Center was changed to the National Human Genome Research Institute (NHGRI) in 1997.

National Institutes of Health

The National Institutes of Health (NIH) is an agency of the United States Department of Health and Human Services and is the primary agency of the United States government responsible for biomedical research. The Institutes are responsible for 28% - about $28 billion - of the total biomedical research funding spent annually in the U.S, with most of the rest coming from industry. Due to widespread international cooperation and advances in the field of genomics (especially in sequence analysis), as well as huge advances in computing technology, a 'rough draft' of the genome was finished in 2000 (announced jointly by then US president Bill Clinton and British Prime Minister Tony Blair on June 26, 2000). Ongoing sequencing led to the announcement of the essentially complete genome in April 2003, five years earlier than planned. In May 2006, another milestone was passed on the way to completion of the project, when the sequence of the last chromosome was published in the journal Nature.

Celera Genomics & HGP

In 1998, an identical, privately funded quest was launched by the American researcher Craig Venter and his firm Celera Genomics. The $300 million Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion publicly-funded project.Celera Genomics was established in May 1998 by the Perkin-Elmer Corporation (and was later purchased by Applera Corporation), with Dr. J. Craig Venter from The Institute for Genomic Research (TIGR) as its first president. While at TIGR, Venter and Hamilton Smith led the first successful effort to sequence an entire organism's genome, that of the Haemophilus influenzae bacterium. Celera was formed for the purpose of generating and commercializing genomic information to accelerate the understanding of biological processes.The rise and fall of Celera as an ambitious competitor of the Human Genome Project is the main subject of the book The Genome War by James Shreeve, who takes a strong pro-Venter point of view. (He followed Venter around for two years in the process of writing the book.) A view from the public effort's side is that of Nobel laureate Sir John Sulston in his book The Common Thread: A Story of Science, Politics, Ethics and the Human Genome.Celera used a newer, riskier technique called whole genome shotgun sequencing, which had been used to sequence bacterial genomes up to 6 million base pairs in length, but not for anything nearly as large as the 3 billion base pair human genome.Celera initially announced that it would seek patent protection on "only 200-300" genes, but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100-300 targets. Contrary to its public promises, the firm eventually filed patent applications on 6,500 whole or partial genes.Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published details of their drafts. Special issues of Nature (which published the publicly-funded project's scientific paper) and Science (which published Celera's paper) described the methods used to produce the draft sequence and offered analysis of the sequence. These drafts covered about 90% of the genome, with much of the remaining 10% filled in later. In February 2001, at the time of the joint publications, press releases announced that the project had been completed by both groups. Improved drafts were announced in 2003 and again in 2005, filling in roughly 8% of the remaining sequence.HGP is the most well known of many international genome projects aimed at sequencing the DNA of a specific organism. While the human DNA sequence offers the most tangible benefits, important developments in biology and medicine are predicted as a result of the sequencing of model organisms, including mice, fruit flies, zebrafish, yeast, nematodes, plants, and many microbial organisms and parasites.In 2005, researchers from the International Human Genome Sequencing Consortium (IHGSC) of the HGP announced a new estimate of 20,000 to 25,000 genes in the human genome. Previously 30,000 to 40,000 had been predicted, while estimates at the start of the project reached up to as high as 2,000,000. The number continues to fluctuate and it is now expected that it will take many years to agree on a precise value for the number of genes in the human genome.


The goals of the original HGP were not only to determine all 3 billion base pairs in the human genome with a minimal error rate, but also to identify all the genes in this vast amount of data. This part of the project is still ongoing although a preliminary count indicates about 30,000 genes in the human genome, which is far fewer than predicted by most scientists.Another goal of the HGP was to develop faster, more efficient methods for DNA sequencing and sequence analysis and the transfer of these technologies to industry.The sequence of the human DNA is stored in databases available to anyone on the Internet. The U.S. National Center for Biotechnology Information (and sister organizations in Europe and Japan) house the gene sequence in a database known as Genbank, along with sequences of known and hypothetical genes and proteins. Other organizations such as the University of California, Santa Cruz, and ENSEMBL present additional data and annotation and powerful tools for visualizing and searching it. Computer programs have been developed to analyze the data, because the data themselves are difficult to interpret without them.The process of identifying the boundaries between genes and other features in raw DNA sequence is called genome annotation and is the domain of bioinformatics. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language, using concepts from computer science such as formal grammars.Another, often overlooked, goal of the HGP is the study of its ethical, legal, and social implications. It is important to research these issues and find the most appropriate solutions before they become large dilemmas whose effect will manifest in the form of major political concerns.All humans have unique gene sequences, therefore the data published by the HGP does not represent the exact sequence of each and every individual's genome. It is the combined genome of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences among individuals. Most of the current effort in identifying differences among individuals involves single nucleotide polymorphisms and the HapMap.

How it was accomplished

The publicly funded groups NIH, the Sanger Institute in Great Britain, and numerous groups from around the world broke the genome into larger pieces; approximately 150,000 base pairs in length. These pieces are called "bacterial artificial chromosomes", or BACs, because they can be inserted into bacteria where they are copied by the bacterial replication machinery. Each of these pieces was then sequenced separately as a small "shotgun" project and then assembled. The larger, 150,000 base pair chunks were then stitched together to create chromosomes. This is known as the "hierarchical shotgun" approach, because the genome is first broken into relatively large chunks, which are then mapped to chromosomes before being selected for sequencing. The whole-genome shotgun (WGS) method is faster and cheaper, and by 2003 - thanks to the availability of clever assembly algorithms - it had become the standard approach to sequencing most mammalian genomes.

Whose genome was sequenced?

In the international public-sector Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Dr. Pieter J. de Jong. It has been informally reported, and is well known in the genomics community, that much of the DNA for the public HGP came from a single anonymous male donor from the state of New York.Technically, it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done. Using sperm does provide all chromosomes for study, including equal numbers of sperm with the X (female) or Y (male) sex chromosomes. HGP scientists also used white cells from the blood of female donors so as to include female-originated samples. One minor technical issue is that sperm samples contain only half as much DNA from the X and Y chromosomes as from the other 22 chromosomes (the autosomes); this happens because each sperm cell contains only one X or one Y chromosome, but not both. Thus in 100 sperm cells, on average there will be 50 X and 50 Y chromosomes, as compared to 100 copies of each of the other chromosomes.Although the main sequencing phase of the HGP has been completed, studies of DNA variation continue in the International HapMap Project, whose goal is to identify patterns of SNP groups (called haplotypes, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese in Tokyo; Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisme Humain (CEPH) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe.In the Celera Genomics private-sector project, DNAs from five different individuals were used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter, later acknowledged (in a public letter to the journal Science) that his DNA was one of those in the pool.


Shotgun sequencing

Shotgun sequencing is a method used in genetics for sequencing long DNA strands. Since the chain termination method of DNA sequencing can only be used for fairly short strands, it is necessary to divide longer sequences up and then assemble the results to give the overall sequence. In chromosome walking, this division is done by progressing through the entire strand, piece by piece; shotgun sequencing uses a faster, but more complex, process to assemble random pieces of the sequence. In shotgun sequencing, DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a contiguous sequence.For example, consider the following two rounds of shotgun reads:

Original strand               : AGCATGCTGCAGTCATGCTTAGGCTA
First round of shotgun reads  : AGCATGCTGCAG
Second round of shotgun reads :                         TTAGGCTA
In this extremely simplified example, the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally aggravated by the great abundance of repetitive sequence, meaning similar short reads could come from completely different parts of the sequence.Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater coverage; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome. 

Whole genome shotgun sequencing

High-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an end-read or read and two reads from the same clone are referred to as mate pairs. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads(N), and the average read length(L) as NL/G. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x coverage.


The idea for the shotgun technique came from the use of an algorithm that combined sequence information from many small fragments of DNA to reconstruct a genome. This technique was pioneered by Frederick Sanger to sequence the genome of the Phage Φ-X174, a tiny virus called a bacteriophage that was the first fully sequenced genome in 1977. The technique was called shotgun sequencing because the genome was broken into millions of pieces as if it had been blasted with a shotgun. In order to scale up the method, both the sequencing and genome assembly had to be automated, as they were in the 1980s.The modern whole genome shotgun technique came into its own with the sequencing of the first free-living organism, the 1.8 million base pair genome of the bacteria Haemophilus influenzae in 1995. It involved the use of automated sequencers, longer individual sequences using approximately 500 base pairs at that time. Paired sequences separated by a fixed distance of around 2000 base pairs. Which were critical elements enabling the development of the first genome assembly programs for reconstruction of this bacterial genome.Three years later, in 1998, the announcement by the newly-formed Celera Genomics that it would scale up the shotgun sequencing method to the human genome was greeted with much skepticism in some circles.

How it was accomplished

The Celera group used the technique denoted as the “whole-genome shotgun” technique. The shotgun technique breaks the DNA into fragments of various sizes, ranging from 2,000 to 150,000 base pairs in length, forming what is called a DNA "library". Using an automated DNA sequencer the DNA is read in 800bp lengths from both ends of each fragment. This method became a standard approach to the sequencing and assembly of bacterial genomes beginning in 1995, when the first bacterial genome, Haemophilus influenzae, was sequenced. Using a complex genome assembly algorithm and a powerful computer, the pieces are combined and the genome can be reconstructed from the millions of short, 800 base pair fragments.


In genetics and biochemistry, sequencing means to determine the primary structure (or primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.

DNA sequencing

DNA sequencing is the process of determining the nucleotide order of a given DNA fragment. Currently, most DNA sequencing is performed using the chain termination method developed by Frederick Sanger. This technique uses sequence-specific termination of a DNA synthesis reaction using modified nucleotide substrates. However, new sequencing technologies such as 454 Sequencing and Pyrosequencing are gaining an increasing share of the sequencing market.The sequence of DNA encodes the necessary information for living things to survive and reproduce. Determining the sequence is therefore useful in 'pure' research into why and how organisms live, as well as in applied subjects. Because of the key nature of DNA to living things, knowledge of DNA sequence may come in useful in practically any biological research. For example, in medicine it can be used to identify, diagnose and potentially develop treatments for genetic diseases. Similarly, research into pathogens may lead to treatments for contagious diseases. Biotechnology is a burgeoning discipline, with the potential for many useful products and services.

Sanger sequencing

In chain terminator sequencing (Sanger sequencing), extension is initiated at a specific site on the template DNA by using a short oligonucleotide 'primer' complementary to the template at that region. The oligonucleotide primer is extended using a DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain terminating nucleotide (most commonly a di-deoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used. The fragments are then size-separated by electrophoresis in a slab polyacrylamide gel, or more commonly now, in a narrow glass tube (capillary) filled with a viscous polymer.

Dye terminator sequencing

An alternative to the labeling of the primer is to label the terminators instead, commonly called 'dye terminator sequencing'. The major advantage of this approach is the complete sequencing set can be performed in a single reaction, rather than the four needed with the labeled-primer approach. This is accomplished by labeling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different wavelength. This method is easier and quicker than the dye primer approach, but may produce more uneven data peaks (different heights), due to a template dependent difference in the incorporation of the large dye chain-terminators. This problem has been significantly reduced with the introduction of new enzymes and dyes that minimize incorporation variability.This method now used for the vast majority of sequencing reactions as it is both simpler and cheaper. The major reason for this is that the primers do not have to be separately labeled (which can be a significant expense for a single-use custom primer), although this less of a concern with frequently used 'universal' primers.

454 Sequencing

454 Sequencing, which was developed for commercial use in the early 2000s by 454 Life Sciences, uses a technique similar to pyrosequencing to sequence roughly 20 megabases in a 4.5-hour run of their sequencing machine. In this method, single-stranded DNA is annealed to beads and amplified via emPCR. These DNA-bound beads are then placed into wells on a fiber-optic chip along with enzymes which produce light in the presence of ATP. When free nucleotides are washed over this chip, light is produced as ATP is generated when nucleotides join with their complementary base pairs. Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded by the CCD camera in the instrument. The signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow.

RNA sequencing

RNA is less stable in the cell, and also more prone to nuclease attack experimentally. As RNA is generated by transcription from DNA, the information is already present in the cell's DNA. However, it is sometimes desirable to sequence RNA molecules. In particular, in Eukaryotes RNA molecules are not necessarily co-linear with their DNA template, as introns are excised. To sequence RNA, the usual method is first to reverse transcribe the sample to generate DNA fragments. This can then be sequenced as described above.


The work on interpretation of genome data is still in its initial stages. It is anticipated that detailed knowledge of the human genome will provide new avenues for advances in medicine and biotechnology. Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer, disorders of hemostasis, cystic fibrosis, liver diseases and many others. Also, the etiologies for cancers, Alzheimer's disease and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.There are also many tangible benefits for biological scientists. For example, a researcher investigating a certain form of cancer may have narrowed down his search to a particular gene. By visiting the human genome database on the worldwide web, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, diseases associated with this gene or other datatypes.Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.The analysis of similarities between DNA sequences from different organisms is also opening new avenues in the study of the theory of evolution. In many cases, evolutionary questions can now be framed in terms of molecular biology; indeed, many major evolutionary milestones (the emergence of the ribosome and organelles, the development of embryos with body plans, the vertebrate immune system) can be related to the molecular level. Many questions about the similarities and differences between humans and our closest relatives (the primates, and indeed the other mammals) are expected to be illuminated by the data from this project.The Human Genome Diversity Project, spin-off research aimed at mapping the DNA that varies between human ethnic groups, which was rumored to have been halted, actually did continue and to date has yielded new conclusions. In the future, HGDP could possibly expose new data in disease surveillance, human development and anthropology. HGDP could unlock secrets behind and create new strategies for managing the vulnerability of ethnic groups to certain diseases (see race in biomedicine). It could also show how human populations have adapted to these vulnerabilities.

What's Turning Genomics Vision Into Reality

In "A Vision for the Future of Genomics Research," published in the April 24, 2003 issue of the journal Nature, the National Human Genome Research Institute (NHGRI) details a myriad of research opportunities in the genome era. This backgrounder describes a few of the more visible, large-scale opportunities.

The International HapMap Project

Launched in October 2002 by NHGRI and its partners, the International HapMap Project has enlisted a worldwide consortium of scientists with the goal of producing the "next-generation" map of the human genome to speed the discovery of genes related to common illnesses such as asthma, cancer, diabetes and heart disease.Expected to take three years to complete, the "HapMap" will chart genetic variation within the human genome at an unprecedented level of precision. By comparing genetic differences among individuals and identifying those specifically associated with a condition, consortium members believe they can create a tool to help researchers detect the genetic contributions to many diseases. Whereas the Human Genome Project provided the foundation on which researchers are making dramatic genetic discoveries, the HapMap will begin building the framework to make the results of genomic research applicable to individuals.

ENCyclopedia Of DNA Elements (ENCODE)

This NHGRI-led project is designed to develop efficient ways of identifying and precisely locating all of the protein-coding genes, non-protein-coding genes and other sequence-based, functional elements contained in the human DNA sequence. Creating this monumental reference work will help scientists mine and fully utilize the human sequence, gain a deeper understanding of human biology, predict potential disease risk, and develop new strategies for the prevention and treatment of disease.The ENCODE project will begin as a pilot, in which participating research teams will work cooperatively to develop efficient, high-throughput methods for rigorously and fully analyzing a defined set of target regions comprising approximately 1 percent of the human genome. Analysis of this first 30 megabases (Mb) of human genome sequence will allow the project participants to test and compare a variety of existing and new technologies to find the functional elements in human DNA.

Chemical Genomics

NHGRI is exploring the acquisition and/or creation of publicly available libraries of organic chemical compounds, also referred to as small molecules, for use by basic scientists in their efforts to chart biological pathways. Such compounds have a number of attractive features for genome analysis, including their wide structural diversity, which mirrors the diversity of the genome; their ability in many cases to enter cells readily; and the fact that they can often serve as starting points for drug development. The use of these chemical compounds to probe gene function will complement more conventional nucleic acid approaches.This initiative offers enormous potential. However, it is a fundamentally new approach to genomics, and largely new to basic biomedical research as a whole. As a result, substantial investments in physical and human capital will be needed. NHGRI is currently planning for these needs, which will include large libraries of chemical compounds (500,000 - 1,000,000 total); capacity for robotic-enabled, high-throughput screening; and medicinal chemistry to convert compounds identified through such screening into useful biological tools.

Genomes to Life

The Department of Energy's "Genomes to Life" program focuses on single-cell organisms, or microbes. The fundamental goal is to understand the intricate details of the life processes of microbes so well that computational models can be developed to accurately describe and predict their responses to changes in their environment."Genomes to Life" aims to understand the activities of single-cell organisms on three levels: the proteins and multi-molecular machines that perform most of the cell's work; the gene regulatory networks that control these processes; and microbial associations or communities in which groups of different microbes carry out fundamental functions in nature. Once researchers understand how life functions at the microbial level, they hope to use the capabilities of these organisms to help meet many of our national challenges in energy and the environment.

Structural Genomics Consortium

Structural genomics is the systematic, high-throughput generation of the three-dimensional structure of proteins. The ultimate goal for studying the structural genomics of any organism is the complete structural description of all proteins encoded by the genome of that organism. Such three-dimensional structures will be crucial for rational drug design, for diagnosis and treatment of disease, and for advancing our understanding of basic biology. A broad collection of structures will provide valuable biological information beyond that which can be obtained from individual structures.


Sequencing the genome has brought us unprecedented knowledge. We've identified genes that cause debilitating conditions. We can understand genetic mechanisms behind cancer and heart disease. The technologies that allow us to sequence genes also allow us to match DNA samples in criminal cases and paternity suits - and to know if a child will be born with Tay-Sachs or sickle-cell anemia. Anthropologists have a new tool for tracing the ancient migrations of people around the globe, and by comparing our DNA to the DNA of mice, worms, and plants; we've been able to quantify the interrelatedness of all life on the planet. In March 2000, President Clinton announced that the genome sequence could not be patented, and should be made freely available to all researchers.




Human Chromosome

Click to enlarge image

The goals of HGP are:
• Identify all the approximately 20,000-25,000 genes in human DNA,
• Determine the sequences of the 3 billion chemical base pairs that make up human DNA,
• Store this information in databases,
• Improve tools for data analysis,
• Transfer related technologies to the private sector, and
• Address the ethical, legal, and social issues (ELSI) that may arise from the project.

Click to enlarge image

Part of a radioactively labelled sequencing gel.

Click to enlarge image

View of the start of an example dye-terminator read

Click to enlarge image

ss-DNA immobilized on a bead

Click to enlarge image

DNA-bound beads placed into wells

Click to enlarge image

The Genome Sequencer 20(GS20) for 454 Sequencing