Login

Join for Free!
19188 members
table of contents table of contents

Their analysis suggests that miRNA genes, which are about 1% of all human …


Biology Articles » Biochemistry » Nucleic Acid Biochemistry » Human MicroRNA Targets » Results

Results
- Human MicroRNA Targets

Prediction of miRNA Targets

Using currently known mammalian miRNA sequences, we scanned 3′ untranslated regions (UTRs) from the human (Homo sapiens), mouse (Mus musculus), and rat (Rattus norvegicus) genomes for potential target sites. The scanning algorithm was based on sequence complementarity between the mature miRNA and the target site, binding energy of the miRNA–target duplex, and evolutionary conservation of the target site sequence and target position in aligned UTRs of homologous genes. We identified as conserved across mammals a total of 2,273 target genes with more than one target site at 90% conservation of target site sequence (Tables S2 and S3) and 660 target genes at 100% conservation. We also scanned the zebrafish (Danio rerio) and fugu (Fugu rubripes) fish genomes for potential targets using known and predicted miRNAs (Figure 1; Tables S4 and S5) and identified 1,578 target genes with two or more conserved miRNA sites between the two fish species.

In addition to the analysis of 3′ UTRs, we also scanned all protein-coding regions for high-scoring miRNA target sites. For convenience, these results are reported both as hits in cDNAs (coding plus noncoding; Table S6) and as hits in coding regions (Table S7), with cross-references in the UTR target tables (number of hits in the coding region for each UTR in Tables S2 and S3).

The algorithm and cutoff parameters were chosen to provide a flexible mechanism for position-specific constraints and to capture what is currently known about experimentally verified miRNA target sites: (1) nonuniform distribution of the number of sequence-complementary target sites for different miRNAs; (2) 5′–3′ asymmetry (the complementary pairing of about ten nucleotides at the 5′ end is more important than that of the ten nucleotides at the 3′ end [Doench and Sharp 2004], and the matches near the 3′ end can to a limited extent compensate for weaker 5′ binding); and (3) influence of G:U wobbles on binding (Doench and Sharp 2004). In choosing these parameters, we drew on experience from careful analysis of target predictions in Drosophila (Enright et al. 2003) as well as proposed human targets of virus-encoded miRNAs (Pfeffer et al. 2004).

To facilitate evaluation of predicted targets and design of new experiments, we provide methods and results in a convenient and transparent form. We make the miRanda software freely available under an open-source license, so that researchers can adjust the algorithm, numerical parameters, and position-specific rules. We also provide web resources, including a viewer for browsing potential target sites, conserved with or without positional constraints, on aligned UTRs, with periodic updates (http://www.microrna.org, as well as links to these targets from the miRNA registry site RFAM (http://www.sanger.ac.uk; Griffiths-Jones 2004). We provide both high-scoring targets, as strong candidates for validation experiments, and lower-scoring targets, which may have a role in broader background regulation of protein dose. Expression information (see Table S3) for miRNAs and mRNAs provides an additional filter for validation experiments, in addition ranking target sites by complementarity and evolutionary conservation.

Validation of Target Predictions

Only a small number of target sites of target genes regulated by miRNAs have been experimentally verified, so we sought direct and indirect evidence to help validate or invalidate the proposed set of mammalian targets. (1) We compared predicted targets with experimentally verified targets in mammals, C. elegans, and D. melanogaster, as well as their mammalian homologs. (2) We compared predicted target numbers from real and shuffled miRNA sequences and estimated the rate of false-positive predictions. (3) We assessed the enrichment of miRNA targets in mRNAs that are known cargoes of FMRP, an RNA-binding protein known to be involved in translational regulation.

Agreement with known targets We previously used known miRNA sites for the let-7 and lin-4 miRNAs in Drosophila to develop the target prediction method and check for consistency (Enright et al. 2003). More recent experimental target identification provides independent control data. Recent work in C. elegans (Vella et al. 2004) has narrowed the originally reported list of six target sites for let-7 in the UTR of lin-41 down to three elements, two target sites, and a 27-nt intervening sequence (a possible binding site for another factor). The surviving two target sites have high alignment scores, S = 115 and S = 110, while the other four sites are below threshold (Enright et al. 2003), fully consistent with the experimental results. As one of the confirmed sites has a single-residue bulge, target prediction methods that require a perfect run of base pairs near the 5′ end of the miRNA would not detect it, while our method does. lsy-6, a recently experimentally identified miRNA in C. elegans, controls left–right neuronal asymmetry via cog-1, an Nkx-type homeobox gene; the cog-1 gene has a target site in its 3′ UTR, which also has a high score (S = 125) and passes the conservation filter.

Experiments in D. melanogaster have identified six new miRNA–target gene pairs: miR-7 targets the notch signaling genes HLHm3, HLHm4, and hairy, and miR-2b targets the genes reaper, grim, and sickle (Stark et al. 2003). Consistent with these experiments, our target predictions in D. melanogaster (Enright et al. 2003) ranked HLHm3, hairy, and HLHm4 at positions 1, 3, and 7, respectively, in the list of 143 target genes for miR-7 (Enright et al. 2003). Similarly, our predictions ranked reaper, grim, and sickle at positions 3, 11, and 19, respectively, among the other 120 predicted target genes for miR-2c. We also predicted miR-6 to target this group of pro-apoptotic genes, with sites that have lower scores than the miR-2 family but are conserved in D. pseudoobscura. Unfortunately, one cannot in general use these predicted and then validated target sites (Stark et al. 2003) for the derivation of new prediction rules, as the set of targets tested is limited to the type predicted and is not exhaustive.

Indirect validation comes from the prediction that mammalian orthologs of some of the known miRNA targets in C. elegans and D. melanogaster are miRNA targets. An example is the proposed conservation of the miRNA–target relationship lin-4:lin-28 (we use the notation miRNA:mRNA for a miRNA–target pair), first discovered in worm (Moss and Tang 2003): we detect target sites in human lin-28 for the lin-4 miRNA homolog miR-125. We also confirm the human analog of a let-7:lin-28 relation predicted in C. elegans (Reinhart et al. 2000). In summary, the predicted target sites on human lin-28 are miR-125 (1 site), let-7b (2 sites; Moss and Tang 2003), miR-98 (2 sites), and miR-351 (1 site). Another known lin-4 and let-7 target in C. elegans is lin-41. The human homolog of lin-41 (sequence provided by F. J. Slack, personal communication) and another closely related gene (encoding Tripartite motif protein 2) are predicted as high-ranking targets of let-7 and miR-125 (the human homolog of lin-4) (see Tables S2 and S3). Another known instance of miRNA target regulation in worms is the regulation of cog-1 by the lsy-6 miRNA (Johnston and Hobert 2003). Although there is no obvious homolog of lsy-6 in mammals, the vertebrate homolog of the target gene cog-1, nkx-6.1, is a conserved target for five different miRNAs in our predictions (see Table S2).

The comparison of our results with known targets shows that our method can detect most (but not all) known target sites and target genes at reasonably high rank. However, given the small number of experimentally verified miRNA–target pairs, additional validation tests are desirable, such as statistical tests using randomization of miRNA sequences to estimate false positives.

Estimate of false positives As a computational control of the validity of the prediction method, one can perform a statistical test that attempts to estimate the probability that a predicted site is incorrect. Here, a “false positive” is a predicted target site of a real miRNA on a real mRNA that has passed all relevant thresholds but is incorrect in that it is not biologically meaningful. The statement “not biologically meaningful” is rarely clearly defined, but can reasonably be taken to mean that no functionally effective miRNA:mRNA interaction occurs under conditions of co-expression at physiological concentration, where “functionally effective” is defined in terms of detectable changes of phenotypic attributes.

Technically, an estimate of the false-positive rate can be obtained by computing (directly or via randomization) the background distribution of scores for biologically non-meaningful miRNA target sites and then deriving the probability that a non-meaningful target site passes all score thresholds, i.e., for a single aggregate score, that the incorrect site has a score T > Tc, where Tc is a fixed threshold that may be, in general, different for each miRNA. We chose to estimate the background distribution using shuffled miRNAs obtained by swapping randomly selected pairs of bases of each given miRNA 1,000 times, keeping the nucleotide composition constant. The shuffled miRNA sequences were scanned against human, mouse, and rat 3′ UTR sequences exactly as for the prediction procedure for real miRNA sequences. In the procedure, a miRNA:mRNA match site is predicted to be a target site if it passes three thresholds, S > Sc for match score, |ΔG| > |ΔGc| for free energy of duplex formation, and C > Cc for conservation, where C reflects a binary evaluation of orthology of mRNAs, similarity of position of the site on the mRNA, and a threshold percentage of conserved residues in the two mRNA target sites. Finally, the predicted target sites for a set of shuffled miRNAs are counted and then averaged over a total of ten randomized runs. The percentage of false positives for target transcripts with more than two, three, and four sites is 39%, 30%, and 24%, respectively, using a non-permissive conservation threshold of 100% for target site sequences (Figure 2). In addition, the false-positive rate for single sites with a score of more than 110 is approximately 35%.

To provide a realistic estimate of false positives using randomization, the distribution of scores from random trials (“random-false”) should be similar to the distribution of incorrect (non-meaningful) hits from real trials (“real-false”). The difference between these two distributions is difficult to compute in principle, as very few validated correct predictions are known at present. For human sequences, without any conservation filter, we obtained a total of 2,538,431 predicted target sites for real miRNAs, and, for shuffled miRNAs, on average, 2,033,701 (± 82,172) target sites—a difference of 20%. This difference may be indicative of a biological signal in the raw score (S) and energy (ΔG) calculated by the miRanda algorithm or may be due to different polynucleotide compositions of shuffled miRNAs compared to real miRNAs. Even if this difference represents a real effect, by far the most predictive criterion for accurate target detection is conservation of target sites across species, and not alignment scores or energies (20% compared to a factor of three, see Figure 2; Table S8). As a consequence, the current set of predicted targets rests heavily on the criterion of conservation of miRNA:mRNA match between different species. We believe this to be essentially true for all currently published target prediction methods.

Indirect experimental support: FMRP-associated mRNAs An excellent opportunity to test our target predictions comes from experiments showing the association of mRNAs and miRNAs with proteins involved in translational control, even if these experiments do not provide information on specific miRNA:mRNA pairings. In particular, FMRP, which may regulate translation in neurons, not only associates with hundreds of mRNAs (Brown et al. 2001; Chen et al. 2003; Denman 2003; Miyashiro et al. 2003; Waggoner and Liebhaber 2003) and with miRNAs (Jin et al. 2004), but also associates with components of the miRNA processing machinery, Dicer, and the mammalian homologs of AGO1 and AGO2 (Jin et al. 2004). If all FMRP-bound mRNAs are regulated by miRNAs, one should see a large enrichment of predicted targets among such mRNAs. We tested this hypothesis with 397 FMRP-associated mRNAs taken from a number of recent experiments (Brown et al. 2001; Chen et al. 2003; Denman 2003; Miyashiro et al. 2003; Waggoner and Liebhaber 2003).

Are FMRP-bound messages enriched in predicted targets? Using five different datasets (Table S9), we predicted that 74% of FMRP-associated messages are miRNA target genes (294 of 397 mRNAs). This corresponds to an enrichment factor of about five compared to the 59 targets one would expect from our analysis in a randomly chosen set of 397 mRNAs, where 59/397 equals 4,462/29,785 (4,462 predicted mammalian target mRNAs pass the 90% conservation filter for one or more sites per transcript out of a total of 29,785 transcripts). This suggests that in the 397 FMRP target genes, 59 should pass the filters. The enrichment factor does not vary much with the cutoff parameters used in target prediction (data not shown), but is subject to some uncertainty because of potential false-positive predictions. The enrichment of miRNA:FMRP interaction is consistent with the hypothesis that translational control involving FMRP protein is executed in a complex that involves one or more miRNAs interacting with transcripts at specific sites. Note that this analysis supports the validity of target gene prediction, not the identity of the controlling miRNA or the accuracy of specific sites.

An additional validation test involved FMRP cargoes that had been identified in more than one study, using independent experimental methods. For example, the mRNAs of 14 genes (Brown et al. 2001) were overrepresented in both the polyribosome fraction of mouse fragile X cells and in co-immunoprecipitation with mouse brain mRNPs that contain FMRP. Almost all of the 14 genes are predicted targets with more than one conserved site (11 of 12 annotated UTRs; Table S9). In some cases, expression data provide additional support: postsynaptic density protein 95 (PSD95)–associated (SAPAP4), a neuron-specific protein, is regulated by many miRNAs highly expressed in rat brain primary cortical neurons (Kim et al. 2004).

In summary, the three validation approaches (retrospective, statistical, and indirect experimental) suggest that the current version of the miRanda algorithm, in spite of clear limitations, can predict true miRNA targets at reasonable accuracy, provided that (1) the targets are detected as conserved and (2) the gene contains more than one miRNA target site or a single high-scoring site (S > 110, approximately, including sites with almost perfect complementarity suggestive of mRNA cleavage).

Overview of Mammalian miRNA Target Genes

More than 2,000 mammalian targets. We predicted 2,273 genes as targets with two or more miRNA target sites in their 3′ UTRs conserved in mammals at 90% target site conservation (see Tables S2 and S3). This means we predicted approximately 9% of protein-coding genes to be under miRNA regulation. In addition, we predicted another 2,128 genes with only one target site, but the false-positive rate for these is significantly higher (Figure 2). Of these, the top-scoring 480 genes (S > 110) have an estimated false-positive rate comparable to that of genes with multiple sites and thus also are good candidates for experimental verification. Some of the genes with single sites may contain additional sites that we cannot detect for a number of reasons, including truncated UTRs. A significant subset of the total number of single-site target genes (7%) has near complementary single sites. These near complementary sites may indicate cleavage, for which additional sites may not be necessary. The targets listed in Table 1 were selected for variety of function, variation in number of sites, and varied extent of conservation (some are also conserved in fish). Somewhat surprisingly, the number of predicted targets per miRNA varies greatly, from zero (for seven miRNAs) to 268 (for let-7b), but the distribution is nonuniform (mean = 7.1, standard deviation = 4.7; Figure 3). This indicates a range of specificity for most miRNAs and suggests that regulation of one message by one miRNA is rare.

Functional analysis We analyzed the distribution of functional annotation for all targets of all miRNAs using Gene Ontology (GO) terms (see Materials and Methods; Table S10) and domain annotations from InterPro (Mulder et al. 2003). The target genes reflected a broad range of biological functions (Figure S1). The most enriched GO term was “ubiquitin-protein ligase activity,” with 3.3-fold enrichment (Table S10). Since ubiquitination is a process controlling the quantity of specific proteins in a cell at specific times, miRNA regulation of components of the ubiquitin pathway could increase protein levels. Other overrepresented functional terms were “neurogenesis” (3.2-fold), “protein serine/threonine kinase” (2.5-fold), and “protein-tyrosine kinase activity” (2.5-fold).

The four domains most overrepresented in predicted targets relative to all genes were Homeobox domain, 5.3-fold; KH domain, 4.0-fold; and Guanine-nucleotide dissociation stimulator CDC25 domain, 3.4-fold (Figure S1; Table S10). Interestingly, KH domains are RNA-binding domains found in a wide range of proteins such as hnRNPk, FMR1, and NOVA-1. In addition to the Homeobox domain, other DNA-binding domains and domains associated with chromatin regulation were also enriched, suggesting that miRNAs in animals target the transcription machinery disproportionately, as they do in plants. Another overrepresented domain was semaphorins (3.0-fold). The semaphorins and plexins (semaphorin receptors) are involved in axon guidance, angiogenesis, cell migration, the immune system, and the adult nervous system (Pasterkamp and Verhaagen 2001). Many semaphorins and their receptors are predicted targets of brain-expressed miRNAs (e.g., let-7c, miR-125b, miR-153, miR-103, miR-323, miR-326, and miR-337). The plexins dimerize with Neuropilin (NP1) to signal the Semaphorin ligand attachment; neuropilin is a predicted high-ranking target of let-7g and miR-130, both brain-expressed miRNAs. A significant proportion of ephrin receptors (seven out of ten genes) and ephrin ligands (five out of seven genes) are predicted targets. The family of ephrins is involved in boundary formation, cell migration, axon guidance, synapse formation, and angiogenesis, and the ephrin ligand, EphA2, which contains a conserved cytoplasmic polyadenylation element (CPE) motif, is considered to be under translational regulation in axon growth cones (Steward and Schuman 2003). Although many members of the ephrin family are predicted targets of brain-expressed miRNAs, they appear to be targeted by different miRNAs, consistent with differential regulation. In Drosophila, both ephrin and EphR, closest to the mammalian B class of the ephrin family, also are predicted miRNA targets.

Do specific miRNAs target particular functional groups? We analyzed each miRNA individually for GO term and domain enrichment (Table S10). The targets of some miRNAs were strongly enriched in certain categories, e.g., miR-105 in “small GTPase mediated signal transduction” (5-fold), miR-208 in “transcription factor” (6-fold), and miR-7, which lies in the intron of the hnRNPk (an RNA-binding protein) gene, in “RNA binding proteins.” Neuronal differentiation of embryonic carcinoma cells by retinoic acid in both mice and humans is coupled to induction of let-7b, miR-30, miR-98, miR-103, and miR-135 (Sempere et al. 2004), and their targets are enriched in “neurogenesis” (3.5-fold). miR-124a and miR-125, both highly and specifically expressed in brain, preferentially target RNA-binding proteins. Thirty-one new miRNAs (miR-322miR-352) cloned from rat neuronal polyribosomes have a large number of neuronal target genes and share many targets, e.g., miR-352 and miR-327 target 5HT-2c, and miR-340, -328, -326, -331, and -333 potentially target beta-catenin, which is implicated in various stages of neural differentiation.

Two highly expressed miRNAs in the thymus, miR-181a and miR-142–3p are key components of a molecular circuitry that modulates hematopoietic lineage (Chen et al. 2004). Ectopic expression of miR-181a causes a 2-fold increase in the cells of the B cell lymphoid lineage. Some of our high-ranking targets for miR-181a may provide clues for the mechanism of this effect. Germ cell nuclear factor GCNF (NR6A1) (the second-highest-ranked target for miR-181a) is expressed in the thymus and bone marrow. miR-181a itself is encoded on the antisense strand of an intron of GCNF. We also predict that the gene Bcl11b, known to affect B cell growth, is a target of miR-181a, ranking third, as well as Lim/homeobox protein LHX9, recently found expressed in developing thymus (Woodside et al. 2004).

FMRP cargo mRNAs regulated by miRNAs FMRP is composed of several RNA-binding domains (two KH and one RRG) that bind messages. The specific binding motifs for FMRP on messages are incompletely known, but are thought to include G-quartet patterns and/or U-rich sequences (Dolzhanskaya et al. 2003; Ramos et al. 2003). We predicted 294 mRNAs known to be FMRP cargoes as miRNA targets (see Table S9). The most reliable of these (Table 2) reflect high confidence in experimental identification of FMRP association or conservation of target site between mammals and fish.

Alzheimer's disease amyloid protein Amyloid precursor protein (APP) is an FMRP-bound protein that is translationally regulated. The APP transcript contains a 29-nt motif at position 200 in the 3′ UTR that is known to aid destabilization of the APP mRNA in certain nutrient conditions and that binds nucleolin, a protein associated with RNPs containing FMRP (Rajagopalan and Malter 2000). In addition, there is an 81-nt sequence at position 630 in the APP 3′ UTR that is required for the TGFbeta-induced stabilization of the APP mRNA (Amara et al. 1999). We predicted APP as a target, with a total score of S = 708 with a minimum of eight miRNA sites, including two let-7 top-ranking sites that are conserved in human, mouse, and rat. One of the predicted miRNA target sites in the APP UTR lies in the 81-nt region (Figure 4), and another is within 30 nt of the motif at position 200.

Other APP-interacting proteins, APP-binding family B member 1 (mir-9, miR-340, and miR-135b), APP-binding family member 2 (let-7 and miR-218), and APP-binding family 2 (miR-188 and miR-206) were also predicted targets, some of which had near exact target site matches. In summary, the APP gene appears to be subject to translational regulation by the combinatorial control of a number of different miRNAs.

PSD95 and synaptic processes PSD95 and similar scaffolding molecules, link the NMDA receptor with intracellular enzymes that mediate signaling; this process is involved in the development and maintenance of synaptic function and synaptic plasticity, and interference in this process is implicated in schizophrenia and bipolar disorder (Beneyto and Meador-Woodruff 2003). FMRP binds PSD95 and is required for mGluR-dependent translation of PSD95 (Todd et al. 2003). PSD95 is a high-ranking target of miR-125, miR-135, miR-320, and miR-327, all of which are either exclusively expressed in brain or enriched in brain tissue (Lagos-Quintana et al. 2002; Krichevsky et al. 2003; Sempere et al. 2004). In particular, large transcript numbers of miR-125b are found copurified with polyribosomes in rat neurons in (Kim et al. 2004). PSD95 has one reported G-quartet in its 3′ UTR at position 648 (Todd et al. 2003), further suggesting it as an in vivo FMRP target. We predicted an additional G-quartet site at position 205–235 in the 3′ UTR of PSD95. One of the miRNA (miR-125) target sites overlaps with the G-quartets, raising the possibility that miRNAs directly compete with FMRP to bind the message in this location. Likewise, NAP-22, which has three miRNA target sites (see Table S9), has a miR-207 target site that overlaps with a G-quartet (Darnell et al. 2001).

Other PSD95 family members are also involved in synaptic processes, in particular, in the integration of NMDA signaling in the synaptic membrane. All PSD95 family members in mammals (also known as discs large 1–5), SAP90, and CamKII are predicted miRNA targets (see Table S9), as well as mGluR, the protein product of which is an agonist that induces the rapid translation of PSD95 (Todd et al. 2003) and three NMDA receptor subunits (see Table S9). These results suggest that miRNAs may be involved in NMDA and glutamate receptor signaling to coordinate and integrate information, with specificity achieved through the combinatorial action of different miRNAs.

Components of RNPs Regulated by miRNAs

FMRP-associated proteins FMRP binds its own mRNA, implying negative feedback if the binding inhibits FMRP production (Ceman et al. 1999). The fact that miRNAs target transcripts for FMRP and FMRP-binding proteins suggests another negative feedback loop in which high levels of these proteins inhibit their own production (depending, of course, on the concentration of miRNAs and mRNAs) (Figure 4). The genes for six FMRP-associated (not associated at the same time) proteins, hnRNP A1, Pur-alpha, Pur-beta, Staufen, AGO-2, and PABP, are predicted miRNA targets. This indicates that FMRP-containing RNPs are under miRNA regulation. FXR2, a gene similar to FMR1 is also a miRNA target in human, mouse, rat, and fish. Details of the implied feedback regulation and differential control of RNP action remain to be determined.

RISC. Our data suggest that the RNAi–miRNA machinery itself is under miRNA regulation; for example Dicer appears to be controlled by let-7 and miR-15b; Ago-1 by let-7 and miR-29b/c; Ago-2 by miR-138; Ago-3 by miR-138, miR-25, and miR-103; and Ago-4 by miR-27a/b. Dicer and two of the Argonautes also are predicted to be targets in both zebrafish and fugu. The let-7 sites on the 3′ UTR of Dicer and Ago-1 (Figure 4) will accommodate most of the let-7 variants with similar scores. The variants of let-7 are expressed in a wide range of tissues and developmental stages, suggesting broad regulation of Dicer and Ago-1 by miRNAs. In contrast, the only miRNA that targets Ago-2 is miR-138, which has so far been cloned only once in the cerebellum (Lagos-Quintana et al. 2002). The target site for miR-138 has only one mismatch at position 8; this may induce a siRNA-like cleavage of the message (Hutvágner and Zamore 2002a; Doench et al. 2003). Ago-3 is also a top target for miR-138, with only two mismatches in its site. We suggest that some miRNAs targeting this machinery (e.g., let-7, miR-27, miR-29, and miR-103) are expressed fairly widely, while others (e.g., miR-138 and miR-25) have lower and more restricted expression.

Other RNPs. The highly conserved RNA-binding proteins, ELAV-like proteins (HuR, HuB, HuC, and HuD), contain three RNA-recognition motifs, which bind AU-rich elements in 3′ UTRs of a subset of target mRNAs (Good 1995). These AU-rich elements increase the proteins' cytoplasmic stability and increase translatability (Perrone-Bizzozero and Bolognani 2002). Experiments have identified 18 mRNAs bound to HuB in retinoic-acid-induced cells; of the 14 we were able to map unambiguously, 12 are predicted miRNA target genes: Elavl1 (known to regulate its own mRNA), Gap-43, c-fos, PN-1, Krox-24, CD51, CF2R, CTCF, NF-M, GLUT-1, c-myc, and N-cadherin (Tenenbaum et al. 2000). Three of the ELAV-like genes themselves are also targets of a large number of miRNAs (see Tables S2 and S3; Figure 4). This is yet another example of miRNAs predicted to target the bound messages of RNA-binding proteins and of the regulation of RNA-binding genes by miRNAs.

Cytoplasmic Polyadenylation Binding Proteins Regulated by miRNAs

We predicted all four human cytoplasmic polyadenylation binding proteins (CPEBs) known in mammals as miRNA targets ranked within the top 170 target genes with 6–20 sites in their UTRs (Figure 4; Table S11). Indeed, CPEB2 is the highest-ranking gene of all transcripts. The orthologs to CPEB1 in fish and fly (known as orb in D. melanogaster) are also predicted as targets. CPEB is an RNA-binding protein first shown to activate translationally dormant mRNAs by regulating cytoplasmic polyadenylation in Xenopus oocytes (Hake and Richter 1994). It also regulates dendritic synaptic plasticity (Mendez and Richter 2001; Richter 2001) and dendritic mRNA transport (Mendez and Richter 2001; Huang et al. 2003) and facilitates transport of mRNAs in dendrites together with kinesin and dynein in RNPs (Huang et al. 2003). CPEB binds to its target message through the CPE motif (UUUUAU), which must be within a certain distance of the hexanucleotide AAUAAA. CPEB keeps messages in their dormant state until phosphorylated, after which it activates polyadenylation (Mendez et al. 2000), thereby activating translation or degradation (Mendez et al. 2002). In addition, CPEB co-fractionates with the postsynaptic density fraction in mouse synaptosomes, consistent with translation of stored mRNAs in dendrites being part of the mechanism of synaptic plasticity. We have three more lines of evidence suggesting the notion that translational regulation by CPEB is linked to miRNA regulation. First, our target list and the list of genes regulated by CPEB significantly overlap. There are nine genes known to be CPEB-regulated, seven of which are predicted targets: alpha-CAMIIK, Map 2, Inositol 1, 4–5-Triphosphate Receptor type 1, Ephrin A receptor class A type 2, SCP-1, and CPEB3 (Mendez and Richter 2001). Second, CPEB is known to self-regulate in D. melanogaster (Tan et al. 2001). The CPEB1 homolog in fly, orb, and CPEBs in vertebrates are predicted miRNA targets. Third, the gene most correlated in expression to the CPEB homolog in D. melanogaster is a Piwi protein (Sting), a member of the Argonaute family (Pal-Bhadra et al. 2002; Stuart et al. 2003) that is involved in translational regulation and in the RISC.

Among the predicted miRNA targets, 115 genes also contained CPE motifs, which were conserved in at least two mammals in the same positions in the UTRs and are therefore candidates for CPEB regulation (Table S11; see Materials and Methods). Our predictions include HuB, HuR, Eif-4 gamma, DAZ associated protein 2, VAMP-2 (known to be posttranscriptionally regulated), Presynaptic protein SAP102, and brain-derived neurotrophic factor precursor. Taken together these data suggest that the CPEB genes, the known CPEB-regulated genes, and the predicted CPEB-regulated genes are strong miRNA target candidates and provide rich ground for experimentation.

Targets of Cancer-Related miRNAs

Deregulated expression of certain miRNAs has been linked to human proliferative diseases such as B cell chronic lymphocytic leukemia (Calin et al. 2002; Lagos-Quintana et al. 2003) and colorectal neoplasia (Michael et al. 2003). Recent analysis of the genomic location of known miRNA genes suggested that 50% of miRNA genes are in cancer-associated genomic regions or in fragile sites (Calin et al. 2004). The miRNAs miR-15 and miR-16 are located within a 30-kb region at Chromosome 13q14, a region deleted in 50% of B cell chronic lymphocytic leukemias, 50% of mantle cell lymphomas, 16%–40% of multiple myelomas, and 60% of prostate cancers (Calin et al. 2002). Furthermore, miR-15 and miR-16 are down-regulated, or their loci lost, in 68% of B cell chronic lymphocytic leukemias (Calin et al. 2002). Similarly, miR-143 and miR-145 are down-regulated at the adenomatous and cancer stages of colorectal neoplasia (Michael et al. 2003), and miR-155 is up-regulated in children with Burkitt lymphoma (Metzler et al. 2004).

Our method predicted cancer-specific (by annotation) gene targets of miR-15a, miR-15b, miR-16, miR-143, miR-145, and miR-155. The target genes and their miRNA regulators are as follows: (1) CNOT7, a gene expressed in colorectal cell lines and primary tumors (Flanagan et al. 2003) (miR-15a); (2) LASS2, a tumor metastasis suppressor (Pan et al. 2001) (miR-15b); (3) ING4, a homolog of the tumor suppressor p33 ING1b, which stimulates cell cycle arrest, repair, and apoptosis (Shiseki et al. 2003) (miR-143); (4) Gab1, encoding multivalent Grb2-associated docking protein, which is involved in cell proliferation and survival (Yart et al. 2003) (miR-155); and (5) COL3A1, a gene up-regulated in advanced carcinoma (Tapper et al. 2001) (miR-145).

miR-16 has a tantalizing number of high-ranking targets that are cancer associated and specifically involved in the Sumo pathway There is increasing evidence that Sumo controls pathways important for the surveillance of genome integrity (Muller et al. 2004). The first- and fifth-highest-ranked targets of miR-16 are Sumo-1 activating and conjugating enzymes, respectively. The top two single-site targets for miR-16 are an Activin type II receptor gene (TGFbeta signaling) and Hox-A5, both known to be dysregulated at the level of protein expression in colon cancers (Wang et al. 2001). Both of these sites show near perfect complementary matching between miR-16 and the target genes (indicating possible cleavage). Both of these target genes are also targets for another cancer related miRNA, miR-15.

Targets Conserved between Mammals and Fish

Roughly 55 miRNAs have identical mature sequences in fugu and mammals, and 80 have very similar sequences in the two species; additional fish miRNA sequences can be predicted with confidence based on sequence similarity. It is therefore reasonable to expect that the targets of these probably functionally homologous miRNAs are orthologous genes in the different species. To follow up on this hypothesis, we assessed conservation of mammalian miRNA–target pairs between the 2,273 mammalian and 1,578 fish miRNA targets (with more than one target site per UTR). The analysis yielded 240 target genes conserved between mammals and fish. The number 240 is probably an underestimate because of several factors, including: (1) unfinished annotation of genomes, particularly rat and fugu; (2) ambiguity in assigning orthologs; and (3) lack of UTR information.

The full set of conserved target genes between fish and mammals indicates a wide functional range of conserved targets (Table S12). Many Hox genes are conserved as targets, including the miR-196 targets, Hox-A4:miR-34a, Hox-C9:let-7b (near prefect complementary match), and Hox-B5:miR-27b. Examples from the notch signaling pathway include miR-30:hairy enhancer of split 1 (Hes1) and miR-152:noggin.

Targets Conserved between Vertebrates and Flies

Twenty-eight of the 78 identified miRNAs in flies have apparent mammalian homologs. Based on this remarkable conservation across hundreds of millions of years, it is reasonable to expect that there is some conservation of target sites, target genes, and target pathways between flies and humans. Most strikingly we can identify hox genes and axon guidance genes as common targets between vertebrates and flies, e.g., capicua and sex combs reduced (one of the vertebrate homologs of Hox-A5). The hox gene cluster in Drosophila contains high-ranking predicted targets (Enright et al. 2003) of miR-10 and miR-iab-4, and the hox gene cluster in mammals contains high-ranking targets of miR-196. These miRNAs are themselves located in the hox gene region. We predicted miR-iab-4–3p to target abd-B in Drosophila, a gene related to the ancestral hox-7 cluster, the ancestral parent of many of the predicted targets of miR-196. Axon guidance receptors and ligands conserved as targets include Lar, ephrins, and slits. Human slit1 is a top target of miR-218, which itself is transcribed from the intron of slit2, suggesting down-regulation of slit1 by transcription of slit2. We expect that there are many more conserved targets but we are hindered by the difficulty of mapping orthologous genes between human and fish. Future work will elucidate to what extent there are common pathways regulated by common miRNAs between vertebrates and invertebrates.

Target Sites in Protein-Coding Sequences

Experiments suggest that miRNA target sites in metazoans are preferentially in UTRs, not in coding regions. If this is true, a correct target site prediction method should predict a larger number of targets in UTRs than in coding regions. Alternatively, target sites in coding regions may so far have escaped experimental verification, especially in plants, in which targets of miRNAs in coding regions are the rule, not the exception.

To investigate this issue we computed the average density of target sites for high-scoring targets (S > 130) and before application of conservation filters. The statistical assessment of the influence of conservation filters in coding regions would have raised complicated issues, as nearly two-thirds of nucleotides in coding regions are conserved between mammalian genomes to preserve amino acid sequences. Interestingly, we found, on average, 11 pre-conservation target sites per 1 million nucleotides in coding regions, versus 15 such target sites per 1 million nucleotides in UTRs. This is consistent with a stronger “raw” prediction signal in UTRs and may indicate a lower number of biologically relevant target sites in coding regions in mammals, consistent with early experimental findings.

As a guide to experimentation, we report all sites in coding regions with an alignment score above 110 for miRNAs of length up to 20 nt and an alignment score above 130 for miRNAs longer than 20 nt (scores depend on the length). These cutoff scores approximately correspond to a 75% complementary match between miRNA and target, leaving open the question of how many match pairs are needed to lead to translational inhibition in coding regions, by any mechanism. We identified 942 genes that contained such sites in their coding regions. Strikingly, there was only one site with a perfect match, and this was for the imprinted miR-127, known to be antisense to the reciprocally imprinted retrotransposon-like gene on the opposite strand (Seitz et al. 2003). Of the 942 genes, 25% have been otherwise identified as targets based on conserved target sites in their UTRs. However, only five genes have targets sites in their UTRs complementary to the same miRNA that targets the coding region (see Table S3, columns H and I). For example, miR-211 has a near perfect complementary site in the coding region of a gene of unknown function (Ensembl ID ENSG00000134030, containing an Eif-4 gamma domain) and also has two conserved “normal” sites in the UTR. Similarly, miR-198 has a site in the coding region, as well as conserved sites in the UTR region, of a sodium and chloride GABA transporter (Ensembl ID ENSG00000157103). However, we see no trend for miRNAs that have conserved sites in UTRs to have additional sites in the coding region; rather, stronger target sites for a given miRNA tend to be confined either to the UTR or the coding region and are rarely in both.

Target Sites with Near Perfect Matches in cDNAs

We scanned all cDNAs for high-scoring matches without using conservation to check for high-scoring targets, which we may have missed through strict conservation rules (see Table S6). Over 40 genes contain sites that have near perfect complementarity to a miRNA (S >120), and these target genes may be cleaved rather than translationally repressed as in the case of miR-196 and Hox-B8. For example miR-298, an embryonic-stem-cell-specific miRNA (Houbaviy et al. 2003), has a near match with MCL-1, and miR-328 (neuronally expressed) has a near match with LIMK-1, which is known to be involved in synapse formation and function. miR-129, expressed in mouse cerebellum, has a near perfect complementary match with Musashi-1, which is an RNA-binding gene essential for neural development, regulated in the cerebellum, and up-regulated in medulloblastoma (Yokota et al. 2004).

Comparison of miRNA Target Prediction Methods

Recently, several computational methods for the prediction of miRNA targets have been developed (Enright et al. 2003; Lewis et al. 2003; Rajewsky and Socci 2003; Stark et al. 2003; Kiriakidou et al. 2004; Rehmsmeier et al. 2004). Two of these have been applied to mammalian miRNAs, as described in Lewis et al. (2003) and Kiriakidou et al. (2004). We now compare and contrast these two methods with each other and with the current version of our method, as further developed from miRanda 1.0 and as applied to mammalian and vertebrate genomes (Enright et al. 2003). We compare algorithms and target lists, as an aid to the design of experiments.

The three prediction methods share the goal of identifying mRNAs targeted by miRNAs. All three use sequence complementarity, free energy calculations of duplex formation, and evolutionary arguments in developing a scoring scheme for evaluation of potential targets. Results are reported as lists of target sites and lists of target genes containing such sites. The three methods differ, however, in important technical details, such as the datasets of miRNA and UTR sequences and the algorithm and scoring scheme, as well as the report format. We now summarize these technical differences and compare the lists of resulting target genes for a common subset of miRNAs. The interpretation of such comparisons is hampered by the fact that selection criteria and the use of numerical cutoffs differ conceptually, and genomic coverage is nonuniform.

In the first method, Lewis et al. used 79 miRNAs in human, mouse, and rat, seeking targets in a UTR dataset extracted from the June 2003 version of the Ensembl database. The UTR dataset had 14,300 ortholog triplets conserved between human, mouse, and rat and 17,000 ortholog pairs between human and mouse. All annotated UTRs were extended by 2 kb of 3′ flanking sequence. The algorithm required exact complementarity of a 7-nt miRNA “seed” sequence, defined as positions 2–8 from the 5′ end of the miRNA, to a potential target site on the mRNA, followed by optimization of mRNA–miRNA duplex free energies between an extended window of 35 additional bases of the mRNA and the rest of the miRNA. Target genes were ranked using a composite scoring function, which took into account all sites for a particular miRNA on a given mRNA. Conserved miRNA:mRNA pairs were required to involve orthologs of miRNA and mRNA in human, mouse, and rat, but there was no requirement for conservation of target site sequence (beyond the seed match) or position on the mRNA. Using shuffled miRNA sequences, with the constraint that shuffled controls match real miRNAs in relevant sequence properties, the false-positive rate of predictions was estimated to be 50% for target genes conserved between mouse and human, 31% for target genes conserved in human, mouse, and rat, and 22% for target genes identified in fugu as well as mammals. As a final result, Lewis et al. reported 400 conserved target genes for the 79 miRNAs. Among these targets, 107 genes were reported as conserved in the fish fugu.

In the second method, Kiriakidou et al. used 94 miRNAs in human and mouse, seeking targets in a dataset of 13,000 UTRs conserved in mouse and human (from Ensembl, date not given). The algorithm used a 38-nt sliding mRNA window and calculation of miRNA–mRNA duplex free energies, keeping duplexes with energies below −20 kcal/mol. The duplexes were further filtered using a set of requirements regarding matches and loop lengths in certain positions, as derived and extrapolated from experimental tests involving a predicted target site for let-7b miRNA on the UTR of the human homolog of worm lin-28. The target site sequence was engineered into a Luciferase reporter, followed by sequence variation of the target site and test of an initial set of 15 predictions in the same reporter assay. Using shuffled miRNA sequences, and applying the same rules and parameters, the false-positive rate of predictions was estimated to be 50% for targets conserved between human and mouse. As a final result, Kiriakidou et al. reported 5,031 human targets, with 222 reported as conserved in the mouse.

In the third method (this work), we used 218 mammalian miRNAs and 29,785 transcripts derived from Ensembl (Table 3) and, as a final result, report 4,467 target genes. What are the main differences between these three prediction methods? Comparison of the total number of predicted target genes is not very informative, as different datasets and cutoffs were used. We attempted to remove one of the technical differences, by explicitly comparing reported targets for the same set of 79 miRNAs used by Lewis et al. (although significant differences remained in the sets of UTR sequences used): the overlap of target genes between Kiriakidou et al. (out of 189) and Lewis et al. (out of 400) was 10.6%; the overlap between Lewis et al. (out of 400) and this work (out of 2,673) was 46%; and the overlap between Kiriakidou et al. (out of 189) and this work (out of 2,673) was 49%. In each case the totals (“out of”) are the number of target genes for the common set of 79 miRNAs and the percentage is relative to the smaller set of two compared. The obvious reason for the larger overlap with our results, 46% and 49% respectively, is the larger number of targets in our predictions, which in turn is primarily the result of choice of cutoff.

Direct comparison of the three prediction methods is complicated by the fact that the noticeable differences between the target lists of the three methods are due to the aggregate effects of datasets, algorithm, including selection rules, use of conservation, and cutoffs. The following characteristics of the three methods underlie these differences and should be taken into consideration when choosing targets for experimentation. (1) As to UTR datasets, Lewis et al., with the earliest published report, used a smaller set of UTRs, with some likelihood of false positives as a result of UTR extension. The UTR sets used in this work, the third in terms of publication date, are the most comprehensive and plausibly the most reliable (as of February 2004). (2) As to miRNA datasets, there was an increase from 79 for Lewis et al. to 94 for Kiriakidou et al. to 218 miRNAs used in this work. (3) As to the cooperativity of binding, the scoring system of Lewis et al. evaluated cooperativity of multiple target sites by the same miRNA on a target gene, but disregarded multiple target sites from different miRNAs on one gene; that of Kiriakidou et al. focused on single sites; and that of this work gave high scores to multiple hits on a target gene, no matter whether these hits involved the same miRNA or different miRNAs. These tendencies are not exclusive where scores involve functions of several real numbers, with cutoffs applied to the aggregate score; e.g., our method also allows strong single target sites. (4) As to assessment of false positives using statistical methods based on shuffling, the comparison of percentages is inconclusive, as the statistics of the background distribution of true negatives is not well known. It appears certain, however, from both Lewis et al. and this work, that statistical confidence increases with the extent of conservation among increasingly distant species. (5) As to validation experiments, each of the methods used a different type and set, with mixed overall conclusions. On the reassuring side, there was direct validation of some of the predicted target sites of Lewis et al. and of Kiriakidou et al. using reporter constructs in cell lines. We found some agreement between the sites validated in this way and our predicted targets (details in Table S13), but in some cases we predicted different details of target sites for a given experimentally tested miRNA:mRNA pair. Also, Kiriakidou et al. used a series of such experiments to extrapolate from a set of specific sequence variants to general rules for identification of target sites. However, serious doubts about the validity of any set of rules persist as there is very little in vivo validation in which native levels of specific miRNAs are shown to interact with identified native mRNA targets with observable phenotypic consequences under normal physiological conditions. (6) As to differences in algorithm, one can state opinions about the strengths or weaknesses of each particular algorithm, but the relationship between each prediction method and the actual in vivo process by which miRNAs have functional interactions with their target mRNAs remains unclear or, at best, unproven. In summary, in our view, each of the three methods, including the one in this work, falls substantially short of capturing the full detail of physical, temporal, and spatial requirements of biologically significant miRNA–mRNA interaction. As such, the target lists remain largely unproven, but useful hypotheses.

The predicted targets are useful in practice for the design of experiments as they increase the efficiency of validation experiments by focusing on target lists significantly enhanced in likely targets, compared to random. It is plausible that targets near the top of lists are the most likely to lead to successful experiments. Task-specific filtering of target lists for particular planned experiments is recommended, especially with respect to cooperativity of binding (more than one site for one or more miRNAs on one gene transcript) and coincidence of expression, as new data on expression patterns of miRNAs and mRNAs in different tissues become available. For example, a recommended conservative approach to the design of experiments would use all available expression information and restrict the predicted target genes to those with two or more target sites at normal threshold (S > 90) or one target site with a higher threshold (S > 110), counting only sites with up to one G:U pair in residues 2–8 counting from the 5′ end of the miRNA.

To take into account the rapid development of this field and the likely close interaction of theory and experiment, we plan to periodically update our prediction method and parameters and make revised target lists available on http://www.microrna.org. Next, we discuss some conceptual consequences of the composition of our target list.


rating: 5.00 from 1 votes | updated on: 8 Jan 2007 | views: 715 |

Rate article:







excellent!bad…