The human genome is the genome of Homo sapiens, which is composed of 24 distinct chromosomes (22 autosomal + X + Y) with a total of approximately 3 billion DNA base pairs containing an estimated 20,000-25,000 genes. The Human Genome Project produced a reference sequence of the euchromatic human genome, which is used worldwide in biomedical sciences. The human genome is much more gene-sparse than was initially predicted at the outset of the Human Genome Project, with only about 1.5% of the total length serving as protein-coding exons.There are estimated 20,000-25,000 human protein-coding genes. The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and gene finding methods have improved, and could continue to drop further. Protein-coding sequences (specifically, coding exons) comprise less than 1.5% of the human genome. Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the human genome size. Much of this is comprised of repeat elements, transposons, and pseudogenes, but there is also a large amount of sequence that does not fall under any known classification.Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as "junk" DNA. First, it is important to realize that the central regions of each chromosome, known as centromeres, are highly repetitive DNA sequences that are difficult to sequence using current technology. The centromeres are millions (possibly tens of millions) of base pairs long, and for the most part these are entirely unsequenced. Second, the ends of the chromosomes, called telomeres, are also highly repetitive, and for most of the 46 chromosome ends these too are incomplete. We do not know precisely how much sequence remains before we reach the telomeres of each chromosome, but as with the centromeres, current technology does not make it easy to get there. It is likely that the centromeres and telomeres will remain unsequenced until new technology is developed that allows us to sequence them. Other than these regions, there remain a few dozen gaps scattered around the genome, some of them rather large, but there is hope that all these will be closed in the next couple of years.
In summary: our best estimates of total genome size indicate that we have completed about 92% of the genome. Most of the remaining DNA is highly repetitive and unlikely to contain genes, but we cannot truly know until we sequence all of it. Understanding the functions of all the genes and their regulation is far from complete. The roles of junk DNA, the evolution of the genome, the differences between individuals and races, and many other questions are still the subject of intense study by laboratories all over the world.