High-throughput analysis of gene expression offers a powerful means of studying how genes work and of uncovering the secrets encoded in genome sequences. Differential gene expression, which plays a key role in various cellular processes, can be quantified by analyzing a large number of transcription products. To do so, several large-scale transcript detection technologies have been developed, chief among which are variants of microarray technology [1,2], expressed sequence tags (ESTs) , and serial analysis of gene expression (SAGE) . Although each of these has its own limitations [5-10], combined with bioinformatics and statistical analysis, they have been successful in revealing genes expressed differentially in different tissues or in different physiological or phenotypical states and in yielding unprecedented insights into the complicated interactions of expressed genes and their cellular functions [10-12].
In this work, the EST database for human and mouse was analyzed to identify tissue-specific and differentially expressed genes. ESTs are "single-pass" sequences of randomly selected clones of expressed genes from specific tissues, organs, or cell types . Because EST clone frequency is, in principle, proportional to the expression level of its corresponding gene in the sampled tissue, tissue-specific or differentially expressed genes can be identified by their significantly different number of EST transcripts seen in unbiased cDNA libraries from different tissues [13,14]. Data on ESTs have been accumulating in the public domain for more than a decade and, at the present time, there are more than 5.3 million entries for human and more than 3.9 million for mouse. ESTs are also well-organized in UniGene clusters, which are linked to other types of information , allowing gene-centered analysis.
Several EST-based tools have been developed to extract gene expression profiles. BodyMap  uses its own standardized and non-normalized EST libraries exclusively for high-quality expression profiling, but its sample size of less than half a million EST sequences from 64 human and 39 mouse tissues may not give a complete picture of genome-wide gene expression [17,18]. TissueInfo  and ExQuest (Expressional Quantification of ESTs)  are similar to each other in that they both compare EST sequences against dbEST  using MegaBlast  to extract the tissue information associated with each matching EST. However, they do not provide quantified expression profiles for genes identified as differentially expressed under a specified statistical cut-off.
The present work adopted a gene-centered strategy, taking advantage of the well-annotated and widely used UniGene clusters , in which ESTs are grouped in units of genes. This allows searching of genes, eliminates the need for sequence comparison, a computationally expensive procedure given the number of ESTs accumulated in the database, and avoids difficulties in matching and distinguishing between homologous genes.
Because some of the EST libraries were derived from unspecified tissues or under artificially modified expression conditions, we removed 1,898 such human libraries (out of 8,145; 23.3%) and 211 such mouse libraries (out of 841; 25.1%) from our analysis (see Methods) and organized the rest into a hierarchy of manually curated tissue/organ classes. These EST data were then subjected to the statistical test of Audic and Claverie , known as the A-C test, which has been shown to perform better than several other statistical tests for pairwise comparison of gene expression data in tag sampling experiments . In all, genes preferentially expressed in different tissues at various levels of specificity in 157 human and 108 mouse tissues were identified. The results were evaluated by comparison with microarray results for 17 tissues  and with the reported expression of several genes in different tissues and the genes reported to be expressed in a given tissue [24-29]. The expression profiles of human-mouse orthologous genes that were differentially expressed in normal tissues were also compared and analyzed.