The release of the human genome sequences and accumulating databases
as well as technological development for detecting large-scale gene
expression profile have enabled integrated genome-wide study on
complicated pathological states, such as cancers. Tissue or
organ-specific gene expression patterns as well as genome mapping have
been reported via database mining including dbEST and SAGEMap [12,18].
These computational methods were proved to be effective for comparisons
among tissues under different physiological or pathological states .
We thus developed an integrated dbEST mining procedure, GetUni. In this
study, we utilized GetUni for genome-wide transcriptome analysis on
normal mucosa, cancers and related precancerous lesions of colon. The
efficiency of this software package was cross validated by the profiles
of UniGene cDNA based libraries using the Xprofile online tool.
However, we did not focus on differential gene expression patterns
among these states since ESTs from dbEST were made by various methods
by contributors. To our knowledge, this is the first report on
computational genome-wide transcriptome analysis.
In the initial analysis of these 4 sets of transcriptome, we noticed
a prominent phenomenon that all genes or gene transcripts from N, IBD
and A were found in cancers. Besides the fact that the total ESTs in T
were much more than those in the other 3 libraries, one of the
underlying paradoxical interpretations was the complicated
heterogeneity of colorectal cancers with regard to histogenesis,
morphology and molecular genetics, etc. ESTs were highly redundant cDNA
fragments derived from either the 3'- or 5'- regions of human genes.
For example, there were 20,370 and 279,163 ESTs, and finally 4,108 and
14,879 genes in N and T library, respectively. The average ESTs per
gene between T and N (18.7622, 279,163/14,879 versus 4.9586,
20,370/4,108) was approximately 5 and amounted to 7 after normalization
of EST clustering efficiency (51.34% in N versus 71.86% ESTs in T which
could be clustered into UniGenes, respectively). Considering that there
are about 13 times more ESTs in T than in N, it is reasonable to
believe that at least 50% increased ESTs in T library could be ascribed
to redundancy, suggesting that increased EST was not the unique answer
for the broad pattern of gene expression in cancers. Multiple pathways
contributing to colorectal cancers were well established including
adenoma-carcinoma sequence, inflammatory bowel
disease-dysplasia-cancer, hyperplastic polyposis-intraepithelial
neoplasia (IEN)-cancer or juvenile polyposis-IEN-cancer as well as
direct malignant transformation from normal mucosa (de novo) .
As for morphological features, various histological subtypes with
varied differentiation were fully accepted including tubular
adenocarcinoma, mucinous carcinoma, signet-ring cell carcinoma,
medullary carcinoma, undifferentiated carcinoma, etc. A long standing
concept in cancer biology is that tumours arise and grow as a result of
"tumour stem cells" or "stem cells" (multipotent progenitor cells with
the capacity for self-renewal) with multiple additional mutations .
Consistent with this idea, a defined minority of these cells might be
able to proliferate, differentiate, dedifferentiate and
transdifferentiate, resulting in heterogenous gene expression patterns
This study also found that transcript variants were quite common
since 2,355 out of 14,879 genes had at least 2 transcripts. And, we
believe that with the ever increasing ESTs in dbEST, more transcript
variants would be discovered in these colonic tissues. In the finished
human genome, a big surprise is that there are not more than 25,000
genes in the human genome, barely more than the worm Caenorhabditis elegans .
Considering the myriad cellular processes that keep our body
functioning, a clear and reinforcing realization is that many genes
encode more than one protein, a theory replacing the old notion of one
gene one protein. One way that human genome performs such complex
functions with so few genes is alternative splicing, which plays
important roles in development, physiology, and disease. A genome
survey of human alternative pre-mRNA splicing indicated that at least
74% of human multi-exon genes are alternatively spliced .
Furthermore, another intriguing phenomenon in this preliminary study is
that the average transcripts of individual gene in T are higher than
those in the other profiles, indicating that increased alternative
splicing might be an optimal option for colorectal cancer considering
their more complicated biological behaviours and functions than those
of normal or benign lesions in colon tissue.
Uexpectedly, the enriched genes of ribosome in KEGG pathway were
highest in 2 precancerous lesions, A and IBD, and lowest in cancer as
suggested by GOTM analysis when we compared among these 4 libraries.
Ribosome proteins, the major components of ribosome which is the
protein synthesis center in a cell, play critical roles in
physiological and pathological situations. The fundamental
physiological function of colonocytes is secretion, which involves many
ribosomes, as evidenced by electronic microscopy. In addition, there is
highly active renewal and proliferation in crypt cells. But in cancer,
this secretary capability was lost or or impaired due to
dedifferentiation, resulting in the limited enrichment of ribosome
genes in cancers despite that other oncogenic proteins might be
actively synthesized with the absolute number of ribosome genes
remaining high. This hypothesis was supported by a recent
immunohistochemical study, in which 10 of 12 ribosome proteins were
stained stronger in normal mucosa than in colon cancers .
As in adenoma and IBD, 2 intermediate states between normal mucosa and
cancers, we hypothesized that the ability of secretion largely remained
and that there was additional new protein synthesis to maintain the
transformed phenotype in IEN or dysplasia. In our previous analysis on
differentially expressed genes between colonic adenoma-normal mucosa, 6
out of 62 differentially expressed genes were ribosome proteins .
Ribosome protein S11 and L7 were upregulated in colonic adenomas rather
than in normal mucosa or cancers as indicated by microarray .
An early study also demonstrated that increased mRNA levels of several
ribosomal proteins were present in colorectal tumors and polyps .
All these findings implicated that increased synthesis of ribosomes
might be an important indicator of precursor lesions of colorectal
Other findings in this study were also intriguing. Genes in the
process of the KEGG Glycolysis/Gluconeogenesis pathway were
significantly more enriched in adenomas and IBD than in cancers. This
is consistent with a recent report on ApcMin/+ mouse model of colon
tumours (26). Aberrant glucose metabolism might be emerged in the
precancerous stage, earlier than before expected. Five thioredoxin
family members were present in library A. These enriched genes might
play important roles in anti-oxidative injury, inhibition of apoptosis,
cell proliferation and differentiation [27,28].
Totally, 111 7-transmembrane receptor (rhodopsin family) superfamily
members were found in cancers. Small GTP-binding protein-coupled
receptors, endocrinal or neuroendocrinal receptors, and cytokine
receptors were included in this catalogue. Their role in colon cancer
was well documented recently [29,30].
Finally, we analyzed the expression of 8 putative genes in
colorectal cancers. The preliminary data suggested that all these genes
were variably expressed in cancer tissues and cell lines. Particularly,
SOX9 was upregulated in most colorectal cancers. It is in consistent
with our latest immunostaining results (unpublished data), indicating
that SOX9 may play oncogenic roles and serve as an independent adverse
prognosticator in colorectal cancers. A recent study revealed that SOX9
was an intestinal crypt specific transcription factor and downstream
target of β-catenin . SOX9 may thus be a potential target gene for prognostic assessment and therapeutic intervention in colorectal cancers.