prediction the coding sequence

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderators: honeev, Leonid, amiradm, BioTeam

Post Reply
Posts: 1
Joined: Sat Sep 24, 2011 3:53 pm

prediction the coding sequence

Post by mpolianthes » Sat Sep 24, 2011 4:09 pm

If you have no Idea of the coding sequence of a gene, can we predict the coding sequence from the gene sequence?

User avatar
Inland Taipan
Inland Taipan
Posts: 5694
Joined: Mon Sep 14, 2009 7:12 pm

Post by JackBean » Sun Sep 25, 2011 10:28 am

in bacteria it's quite simple, you just need to find START and STOP codons in frame and that should be it. In eukaryotes it's little more complicated, since there may be splicing.

Cis or trans? That's what matters.

Death Adder
Death Adder
Posts: 58
Joined: Sun Oct 02, 2011 2:05 pm

Post by merv » Sun Oct 02, 2011 5:42 pm

Jack Bean is right. In eukaaryotic genes there is always splicing.

In eukaryotes, if you know the cDNA sequence, then you can find the genomic sequence and probably the protein sequence quite easily using NCBI tools. However, usually there are difficulties with this:
there can be multiple splicing, can only be determined experimentally.
there can be multiple start codons in the sequence, usually one of which is known but can only be determined experimentally.
Also, the cDNA gene sequence can undergo post-translational modifications, such as polyadenylation (which nearly always occurs but is different lengths in each cDNA even of the same gene, ie. a random length polyA tail) - and more importantly some cDNA's are edited so that the bases in the final RNA produce different amino acids at specific positions (termed RNA editing).
Also, in the immune system, antibody and T cell receptor V-regions are modified when the cell rearranges its DNA to alter certain sequences into various reading frames. antibody genes also mutate their V regions, and switch their downstream tail in a genomic equivalent of mRNA splicing.
Similarly, trypanosomes shuffle their DNA around to produce novelty within their protein coats.
Finally viruses , particularly RNA viruses such as HIV, undergo mutation due to the the low fidelity of the RT step - thus you need to define more clearly what you mean by 'knowing' the 'gene' sequence.

Post Reply

Who is online

Users browsing this forum: Baidu [Spider] and 0 guests