Login

Join for Free!
119225 members


can protein size be predicted given genome info?

Genetics as it applies to evolution, molecular biology, and medical aspects.

Moderator: BioTeam

can protein size be predicted given genome info?

Postby vulpes » Tue Sep 06, 2011 2:21 am

Ok I am looking at old questions from my teacher and have come up with I believe to be misleading. I know obviously that eukaryotes generally have larger proteins due to many different reasons but based on the information given in this question I do not think you can accurately predict which proteins will be larger without outside knowledge. Heres the question.

Assume a prokaryote encodes 3,000 proteins with a total genome size of 3 x 10 exp 6 base pairs. In this prokaryote, about 90% of the genome is actually protein-encoding (equivalent to our exons).

Assume human to encode 30,000 proteins from a genome of a genome of 3 x 10 exp 9 base pairs. In human, about 1.5% of the genome is exon.

From this information, is there any reason to believe that human proteins might actually be larger on average than prokaryotic ones? Please explain your reasoning.

Just based on this info I am pretty sure that you cant predict protein sizes. I know that eukaryotes have lots of introns and prokaryotes generally don't and that humans have alternative slicing with allows for different forms of the protein from the same mRNA. Is there any way to predict the protein size? Thanks for any help :mrgreen:
vulpes
Garter
Garter
 
Posts: 6
Joined: Sun Sep 04, 2011 4:36 am

Postby JackBean » Tue Sep 06, 2011 5:16 am

well, 90% of 3x10^6 is approx. 3x10^6 (nt, in AA it's 3-times less), if this is in 3000 proteins, one protein has in average 1000 AAs
1.5% of 3x10^9 is approx. 4.5x10^7, if this is in 30 000 proteins, one protein has in average 1500 AAs

if you do more accurate calculations, the difference will be a little more
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5692
Joined: Mon Sep 14, 2009 7:12 pm

Postby vulpes » Tue Sep 06, 2011 4:53 pm

Ok I see what you did there. But when you divided the 4.5x10^7bp by the 30,000 proteins it would be 1500base pairs for 1 gene so that would mean 500amino acids for 1 gene right? 3base pairs or 3 nucleotides code for 1 amino acid. Thanks for the help too
vulpes
Garter
Garter
 
Posts: 6
Joined: Sun Sep 04, 2011 4:36 am


Re: can protein size be predicted given genome info?

Postby JackBean » Wed Sep 07, 2011 6:01 am

yeah, I see I wrote it, but didn't calculate :oops: and the same for bacteria, so it will be approx. 300 and 500 AAs in average ;)
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5692
Joined: Mon Sep 14, 2009 7:12 pm

Postby greatmicrobiologist » Wed Sep 07, 2011 7:21 am

Never mind guys I am very weak in calculations and giving here a simple logic. :D Until we know the coding sequence its hard to assume the protein length. As we can easily calculate the number of amino acids easily but not the protein length. As in the coding sequence there will be many stop codons. So where is the stop codons present can't be estimated and also along with the start condons respectively. Hence its not easy to estimate the size but the number of amino acids may present can be calculated.! ;)

Hope I'm correct. Am I? :idea:
Saumyadip Sarkar
Student of M.Sc. Microbiology,
GITAM Institute of Science,
GITAM University, AP, India

e-mail: saumyadip.gis@gmail.com

Article Writer
http://www.microbioworld.com/
User avatar
greatmicrobiologist
Garter
Garter
 
Posts: 30
Joined: Thu Aug 25, 2011 7:28 am
Location: Vishakhapatnam, Andhra Pradesh, INDIA

Postby JackBean » Wed Sep 07, 2011 8:36 am

what(s the difference between protein length and number of amino acids?
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5692
Joined: Mon Sep 14, 2009 7:12 pm

Re: can protein size be predicted given genome info?

Postby jonmoulton » Wed Sep 07, 2011 7:07 pm

Once a full ribosome assembles at the start codon, proceeds though translation and encounters a stop codon, that's it -- that is the end of the coding sequence. The ribosome leaves the mRNA and the sequence downstream of the stop is the 3'-UTR. So, you normally don't find multiple stop codons within a coding sequence (though there are always exceptions; e.g. a "slippery" sequence can trigger translational frameshift and bring alternative stop codons in-frame).

Jack's question gets to the core of this -- protein length and # of amino acids are the same thing. Keep in mind too that the mature form of a protein might be digested by a protease to a smaller form than was originally translated. In that case, amino acids are clipped away (and with that, the polypeptide is shortened). Post-translational modifications like glycosylation can also affect protein mass, but I would not say that is a change in protein length -- length refers to the number of amino acid residues in the polypeptide.
User avatar
jonmoulton
Viper
Viper
 
Posts: 434
Joined: Fri Feb 15, 2008 5:38 pm
Location: Philomath, Oregon, USA

Re: can protein size be predicted given genome info?

Postby greatmicrobiologist » Thu Sep 08, 2011 11:55 am

jonmoulton wrote:Once a full ribosome assembles at the start codon, proceeds though translation and encounters a stop codon, that's it -- that is the end of the coding sequence. The ribosome leaves the mRNA and the sequence downstream of the stop is the 3'-UTR. So, you normally don't find multiple stop codons within a coding sequence (though there are always exceptions; e.g. a "slippery" sequence can trigger translational frameshift and bring alternative stop codons in-frame).


oooh got my conceptions correct. :) thanks.! :)
Saumyadip Sarkar
Student of M.Sc. Microbiology,
GITAM Institute of Science,
GITAM University, AP, India

e-mail: saumyadip.gis@gmail.com

Article Writer
http://www.microbioworld.com/
User avatar
greatmicrobiologist
Garter
Garter
 
Posts: 30
Joined: Thu Aug 25, 2011 7:28 am
Location: Vishakhapatnam, Andhra Pradesh, INDIA

Re:

Postby merv » Mon Oct 03, 2011 12:06 am

greatmicrobiologist wrote:Never mind guys I am very weak in calculations and giving here a simple logic. :D Until we know the coding sequence its hard to assume the protein length. As we can easily calculate the number of amino acids


i think you mean nucleotides not amino acids

greatmicrobiologist wrote:easily but not the protein length. As in the coding sequence there will be many stop codons.


no there wont, as explained following your original post but overlooked by you in your self congratulations. Are you a time waster? If not, then I kindly suggest you work on both your English writing and comprehension and I wish you good luck in these endeavours. False self-gratification is not going to help you, is why i point it out is all.

greatmicrobiologist wrote:So where is the stop codons present can't be estimated


I don't understand your english there, son

greatmicrobiologist wrote:and also along with the start condons respectively


i am not sure if this is stated elsewhere but some proteins use multiple start codons (well one per polypeptide synthesised).

greatmicrobiologist wrote:. Hence its not easy to estimate the size but the number of amino acids may present can be calculated.! ;)


errrr, no, although one can use software to predict each genes amino acid primary sequence, this can be done from a cDNA clone quite accurately (not completely so), it is one of the most difficult problems in biology to predict what the exons used are without the cDNA sequence. In fact, this is one of the most powerful arguments to say there are an infinite number of genes, not 30,000, because a cell might always opt for a piece of DNA it has never used before (or not for a million years) that has lain idle and who is to say it is not entitled to do so- as such a relatively new protein is made, and thus it could be many distinguished professors reckonings be the old gene (after all a new exon whether 0.01% of the final protein (or even 0% if it is just used regulatorily) or 99.9% of it - ) using the new exon in combination with the old promoter some fraction of its 'usual' exons - yet if it is a different function why call it the same gene....perhaps gene in the sense of the 30,000 odd gene estimates should best be defined by the promoter , although defining these is just as debatable partly due to the plasticity of the requirements of the mRNA polymerases (promoter co-factors etc), and the fact that there are many pseudo-genes which object to the label pseudo!!

greatmicrobiologist wrote:Hope I'm correct. Am I? :idea:


no , you werent. you must be human.
merv
Death Adder
Death Adder
 
Posts: 58
Joined: Sun Oct 02, 2011 2:05 pm


Return to Genetics

Who is online

Users browsing this forum: No registered users and 1 guest