Login

Join for Free!
114117 members


BLAST

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderator: BioTeam

BLAST

Postby xiaolong88 » Sun Oct 11, 2009 3:17 am

I'm sure all bioinformaticians heard of BLAST. Well i'm student of bioinformatics and first time using BLAST. I need help for those who can guide me how to obtain and interprate those information from the result. I did go through the tutorial part of BLAST but i'm not really understand. Hope someone can help me

For example:
i have the following sequence:

TCCATGTCCTTTGTACAAGGAGAAGAAAGTAATGACAAAATACCTGTGGCCTTGGGCCTCAAGGAAAAGAATCTGTACCTGTCCTGCGTGTTGAAAGATGATAAGCCCACTCTACAGCTGGAGAGTGTAGATCCCAAAAATTACCCAAAGAAGAAGATGGAAAAGCGATTTGTCTTCAACAAGATAGAAATCAATAACAAGCTGGAATTTGAGTCTGCCCAGTTCCCCAACTGGTACATCAGCACCTCTCAAGCAGAAAACATGCCCGTCTTCCTGGGAGGGACCAAAGGCGGCCAGGATATAACTGACTTCACCATGCAATTTGTGTCTTCCTAAAGAGAGCTGTACCC//

how can i obtain the information like:

a) What is the length of this sequence?
b) Give the name of the sequence that contains this sequence.
c) What is the type of tissue used in obtaining the sequence information?
d) How many other sequences are aligned with this sequence, giving an alignment score of nearly 100% with a statistically very low level of significance ( ‘E’ value)?
e) What is the ‘score bits’ and ‘E value’ for the second aligned sequence in the BLAST result?
f) In your opinion, do the sequences that are nearly and fully aligned to the query sequence produces a functionally similar protein among the organisms?
g) Name the organisms that give a high similarity score with the query sequence
h) What is the putative name of this query nucleotide / protein sequence?
i) What is the function of this nucleotide / protein and its role in the organism? Give reasons to your answers.
j) What is the GeneBank Accession Number for the sequence?
k) Determine the author, Class Group as well as the Title of the Annotation of the molecule.
l) What is the journal / publication that the sequence was published (including the date of publication, author(s) address) as well as the PMID accession number of the article.
m) For every of the the amino acid sequences determine its size, number of alpha-helixes, beta-sheets and loops.


i can find some of the information, but some i really don't know how and where to get. really hope some one can help me. I want to learn not to get the answer from you, learning is important for me

regards
xiaolong^^
xiaolong88
Garter
Garter
 
Posts: 9
Joined: Mon Feb 23, 2009 5:20 pm

Postby JackBean » Sun Oct 11, 2009 6:42 am

Hi,

you have a lot of questions and many are not related to BLAST at all. BLAST is a tool to align sequences one to each other, so you can't by BLAST find the name of sequence. The name you will get from some file or something regarding the found homologous sequence.
Where do you use the BLAST? Do you have some program on your PC or probably use some on internet?
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5657
Joined: Mon Sep 14, 2009 7:12 pm

Postby xiaolong88 » Sun Oct 11, 2009 3:29 pm

Hi Jackbean:

Thanks for your reply, well for your information, this is the questions that i need to answer for my exercise by using BLAST, i face difficulty on using it, that why i post questions here to ask for help.
The BLAST i use on internet, and i really don't know how to use it.
http://blast.ncbi.nlm.nih.gov/Blast.cgi

Thanks a lot for your information.
Really headache of it.
xiaolong88
Garter
Garter
 
Posts: 9
Joined: Mon Feb 23, 2009 5:20 pm


Re: BLAST

Postby zami'87. » Sun Oct 11, 2009 11:00 pm

It seems your sequence is in Homo sapiens interleukin 1, beta as when I did BLAST there was the greatest matching (see top result 100%)..length of a sequence is 350...lower results have decrease in matching.E value has statistical meaning and when matching decrease,E increases.If E is close to 1 you can’t rely on results.Click the gene ID link for more informations..or here’s the link http://www.ncbi.nlm.nih.gov/gene/3553?o ... e_RVDocSum
You have more than BLAST in that exercise.Here’s the link for protein function and domain analysis http://www.uniprot.org/uniprot/P01584 .

Sorry I'm in a hurry(reading some microbiology for tomorrow) but I hope this short answer helped..Good luck.

ps oh and correct me if I'm wrong as I'm also newbie to it ;)
Every man is a star whose light can make shadows dance differently and change our view of landscape permanently***
User avatar
zami'87.
Coral
Coral
 
Posts: 257
Joined: Sat Mar 05, 2005 6:56 pm
Location: Serbia

Postby JackBean » Mon Oct 12, 2009 2:01 am

zami: the e-value does not depend only on the identities, but also on length of the chain. If you had 100% identical 10-bases long chain or 99% identical 350-long chain, when do you think, you will have lower e-val?
Second, what about to let him do his work by own next time?

xiaolong: yeah, I thought, it will be NCBI's BLAST :-D
Let's take a tour:
as you have nucleotides and wish find homologous nucleotides, you pick nucleotide blast (blastn).
Here you past your sequence or upload file (you can also work with several sequences at once, but they have to be distinguished (like in FASTA format). Afterwards, you choose, what database you wish to use. Now we now, that it's human sequence, so you can keep it, but otherwise the nucleotide collection is probably the best choice for first shot ;) There are also some specialised databases like ESTs or PDB etc.
Now you choose program. As you wish to identify the sequence, keep it for highly similar sequences and than just BLAST ;)

Now you get the results, in the top you have info about your sequence and program/database used. Down you have a table of found sequences and their homology (coded by color), it also shows, where in your sequence is it homologous. Just point a mouse over a line and you will see it's name, e-val etc. You can click on it to get to the results.
But under this graphic you have another table, wih basically the same info :) From left:
Accesion - the ID of found sequence, click on it to get to the sequence info page (including all you need to know and sequence)
Description
Max score - this is calculated from the homology and length, basically, the higher, the more homologous sequences you have
Total score - don't know, what's the difference, but never had to, so I guess you can live without that also or find it in help :)
Query coverage - this tells you, how long piece of your sequence is covered by the one found. E.g. you can get coverage of 100%, but low homology, or you can have like 90% homology, but only on half of your sequence
E-value - this tells you the probability of getting this score and homology by chance (from random sequence), that is, the lower (the higher the number behind e-), the more reliable the result is. As I have mentioned above, it's not only about homology, but also about the length, because you can easily find any piece of 10 nucleotides in 3mld human genome with 100% homology, but if you have 350 nucleotides and get 70% homology, than it's still quite good.
Links to another databases.

With this info, try to answer as much as you can and what you won't know, feel free to ask again ;)
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5657
Joined: Mon Sep 14, 2009 7:12 pm

Postby xiaolong » Mon Oct 12, 2009 4:00 am

Hi zami,
Really thank you for helping me and explain how to use and link it to others. It's really help me a lot. I did found some extra information as a knowledge.
Well xiaolong wishing all the best for your revision in your microbiology.

Hi JackBean,
You know what? i really happy to get your reply here. Actually i did go through all other database as well before. But when come to the exercise by using BLAST, i face some problems. I really don't know how to use it. But from your guide above, i think i can solve some of my difficulties here.

I have did my research and finding information for the sequence. i do have some question to ask:

1. For question D, it ask for sequence that score nearly 100%, is it means there are only two result for this example?
2. I understand ur explaination on e value, but what is the meaning of 3e-117?
3. For question H, is it putative name refers to all those "adenine, guanine, cytosine" etc?

help me, i have spending almost a week to interprate and understand how to used it..haha
hope you can help me

Thanks a lot
^^ xiaolong
xiaolong
Garter
Garter
 
Posts: 3
Joined: Sun Oct 11, 2009 2:59 am

Postby JackBean » Mon Oct 12, 2009 4:19 am

1. d) is quite stupide question, what is nearly 100%? And score is not expressed in % at all. But if you choose the nucleotide database (not only human), you will see, that there are many of higly similar sequences. First 7 sequences are probably the same, just under other accesion number (duplicates, different tissue etc.), the lower are from some chimpanzee, with e-val still of 0 and identity 99% with coverage 100%.
2. that mean 3.10^-117 (quite low, isn't it? :) ) That is you 0. now 116 zeros and than 3 ;) :-D
3. I guess they are asking for the name of the first record, that is " Homo sapiens interleukin 1, beta (IL1B), mRNA" ;)
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5657
Joined: Mon Sep 14, 2009 7:12 pm

Postby xiaolong » Mon Oct 12, 2009 4:34 am

thanks for reply again,
i now totally understand e-value...stupid me!^^
as you mention above, how do i obtain what is the tissue they use for that sequence?
xiaolong
Garter
Garter
 
Posts: 3
Joined: Sun Oct 11, 2009 2:59 am

Postby JackBean » Mon Oct 12, 2009 4:44 am

Sometimes from the title like for X02532.1 it's "Human mRNA for interleukin 1 beta. Peripheral blood mononuclear cells", also, there is category SOURCE (in the sequence file, not at BLAST), but that is just the organism usually.
So you will probably have to look into the article? :(
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5657
Joined: Mon Sep 14, 2009 7:12 pm

Postby xiaolong » Mon Oct 12, 2009 6:08 am

Thanks JackBean
its really helpful. thanks a lot
xiaolong
Garter
Garter
 
Posts: 3
Joined: Sun Oct 11, 2009 2:59 am


Return to Bioinformatics

Who is online

Users browsing this forum: No registered users and 0 guests

cron