BLAST

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderator: BioTeam

BLAST

I'm sure all bioinformaticians heard of BLAST. Well i'm student of bioinformatics and first time using BLAST. I need help for those who can guide me how to obtain and interprate those information from the result. I did go through the tutorial part of BLAST but i'm not really understand. Hope someone can help me

For example:
i have the following sequence:

TCCATGTCCTTTGTACAAGGAGAAGAAAGTAATGACAAAATACCTGTGGCCTTGGGCCTCAAGGAAAAGAATCTGTACCTGTCCTGCGTGTTGAAAGATGATAAGCCCACTCTACAGCTGGAGAGTGTAGATCCCAAAAATTACCCAAAGAAGAAGATGGAAAAGCGATTTGTCTTCAACAAGATAGAAATCAATAACAAGCTGGAATTTGAGTCTGCCCAGTTCCCCAACTGGTACATCAGCACCTCTCAAGCAGAAAACATGCCCGTCTTCCTGGGAGGGACCAAAGGCGGCCAGGATATAACTGACTTCACCATGCAATTTGTGTCTTCCTAAAGAGAGCTGTACCC//

how can i obtain the information like:

a) What is the length of this sequence?
b) Give the name of the sequence that contains this sequence.
c) What is the type of tissue used in obtaining the sequence information?
d) How many other sequences are aligned with this sequence, giving an alignment score of nearly 100% with a statistically very low level of significance ( ‘E’ value)?
e) What is the ‘score bits’ and ‘E value’ for the second aligned sequence in the BLAST result?
f) In your opinion, do the sequences that are nearly and fully aligned to the query sequence produces a functionally similar protein among the organisms?
g) Name the organisms that give a high similarity score with the query sequence
h) What is the putative name of this query nucleotide / protein sequence?
i) What is the function of this nucleotide / protein and its role in the organism? Give reasons to your answers.
j) What is the GeneBank Accession Number for the sequence?
k) Determine the author, Class Group as well as the Title of the Annotation of the molecule.
l) What is the journal / publication that the sequence was published (including the date of publication, author(s) address) as well as the PMID accession number of the article.
m) For every of the the amino acid sequences determine its size, number of alpha-helixes, beta-sheets and loops.

i can find some of the information, but some i really don't know how and where to get. really hope some one can help me. I want to learn not to get the answer from you, learning is important for me

regards
xiaolong^^
xiaolong88
Garter

Posts: 9
Joined: Mon Feb 23, 2009 5:20 pm

Hi,

you have a lot of questions and many are not related to BLAST at all. BLAST is a tool to align sequences one to each other, so you can't by BLAST find the name of sequence. The name you will get from some file or something regarding the found homologous sequence.
Where do you use the BLAST? Do you have some program on your PC or probably use some on internet?
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.

JackBean
Inland Taipan

Posts: 5513
Joined: Mon Sep 14, 2009 7:12 pm

Hi Jackbean:

Thanks for your reply, well for your information, this is the questions that i need to answer for my exercise by using BLAST, i face difficulty on using it, that why i post questions here to ask for help.
The BLAST i use on internet, and i really don't know how to use it.
http://blast.ncbi.nlm.nih.gov/Blast.cgi

Thanks a lot for your information.
xiaolong88
Garter

Posts: 9
Joined: Mon Feb 23, 2009 5:20 pm

Re: BLAST

It seems your sequence is in Homo sapiens interleukin 1, beta as when I did BLAST there was the greatest matching (see top result 100%)..length of a sequence is 350...lower results have decrease in matching.E value has statistical meaning and when matching decrease,E increases.If E is close to 1 you can’t rely on results.Click the gene ID link for more informations..or here’s the link http://www.ncbi.nlm.nih.gov/gene/3553?o ... e_RVDocSum
You have more than BLAST in that exercise.Here’s the link for protein function and domain analysis http://www.uniprot.org/uniprot/P01584 .

Sorry I'm in a hurry(reading some microbiology for tomorrow) but I hope this short answer helped..Good luck.

ps oh and correct me if I'm wrong as I'm also newbie to it
Every man is a star whose light can make shadows dance differently and change our view of landscape permanently***

zami'87.
Coral

Posts: 257
Joined: Sat Mar 05, 2005 6:56 pm
Location: Serbia

zami: the e-value does not depend only on the identities, but also on length of the chain. If you had 100% identical 10-bases long chain or 99% identical 350-long chain, when do you think, you will have lower e-val?
Second, what about to let him do his work by own next time?

xiaolong: yeah, I thought, it will be NCBI's BLAST
Let's take a tour:
as you have nucleotides and wish find homologous nucleotides, you pick nucleotide blast (blastn).
Here you past your sequence or upload file (you can also work with several sequences at once, but they have to be distinguished (like in FASTA format). Afterwards, you choose, what database you wish to use. Now we now, that it's human sequence, so you can keep it, but otherwise the nucleotide collection is probably the best choice for first shot There are also some specialised databases like ESTs or PDB etc.
Now you choose program. As you wish to identify the sequence, keep it for highly similar sequences and than just BLAST

Now you get the results, in the top you have info about your sequence and program/database used. Down you have a table of found sequences and their homology (coded by color), it also shows, where in your sequence is it homologous. Just point a mouse over a line and you will see it's name, e-val etc. You can click on it to get to the results.
But under this graphic you have another table, wih basically the same info From left:
Accesion - the ID of found sequence, click on it to get to the sequence info page (including all you need to know and sequence)
Description
Max score - this is calculated from the homology and length, basically, the higher, the more homologous sequences you have
Total score - don't know, what's the difference, but never had to, so I guess you can live without that also or find it in help
Query coverage - this tells you, how long piece of your sequence is covered by the one found. E.g. you can get coverage of 100%, but low homology, or you can have like 90% homology, but only on half of your sequence
E-value - this tells you the probability of getting this score and homology by chance (from random sequence), that is, the lower (the higher the number behind e-), the more reliable the result is. As I have mentioned above, it's not only about homology, but also about the length, because you can easily find any piece of 10 nucleotides in 3mld human genome with 100% homology, but if you have 350 nucleotides and get 70% homology, than it's still quite good.

With this info, try to answer as much as you can and what you won't know, feel free to ask again
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.

JackBean
Inland Taipan

Posts: 5513
Joined: Mon Sep 14, 2009 7:12 pm

Hi zami,
Really thank you for helping me and explain how to use and link it to others. It's really help me a lot. I did found some extra information as a knowledge.
Well xiaolong wishing all the best for your revision in your microbiology.

Hi JackBean,
You know what? i really happy to get your reply here. Actually i did go through all other database as well before. But when come to the exercise by using BLAST, i face some problems. I really don't know how to use it. But from your guide above, i think i can solve some of my difficulties here.

I have did my research and finding information for the sequence. i do have some question to ask:

1. For question D, it ask for sequence that score nearly 100%, is it means there are only two result for this example?
2. I understand ur explaination on e value, but what is the meaning of 3e-117?
3. For question H, is it putative name refers to all those "adenine, guanine, cytosine" etc?

help me, i have spending almost a week to interprate and understand how to used it..haha
hope you can help me

Thanks a lot
^^ xiaolong
xiaolong
Garter

Posts: 3
Joined: Sun Oct 11, 2009 2:59 am

1. d) is quite stupide question, what is nearly 100%? And score is not expressed in % at all. But if you choose the nucleotide database (not only human), you will see, that there are many of higly similar sequences. First 7 sequences are probably the same, just under other accesion number (duplicates, different tissue etc.), the lower are from some chimpanzee, with e-val still of 0 and identity 99% with coverage 100%.
2. that mean 3.10^-117 (quite low, isn't it? ) That is you 0. now 116 zeros and than 3
3. I guess they are asking for the name of the first record, that is " Homo sapiens interleukin 1, beta (IL1B), mRNA"
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.

JackBean
Inland Taipan

Posts: 5513
Joined: Mon Sep 14, 2009 7:12 pm

i now totally understand e-value...stupid me!^^
as you mention above, how do i obtain what is the tissue they use for that sequence?
xiaolong
Garter

Posts: 3
Joined: Sun Oct 11, 2009 2:59 am

Sometimes from the title like for X02532.1 it's "Human mRNA for interleukin 1 beta. Peripheral blood mononuclear cells", also, there is category SOURCE (in the sequence file, not at BLAST), but that is just the organism usually.
So you will probably have to look into the article?
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.

JackBean
Inland Taipan

Posts: 5513
Joined: Mon Sep 14, 2009 7:12 pm

Thanks JackBean
its really helpful. thanks a lot
xiaolong
Garter

Posts: 3
Joined: Sun Oct 11, 2009 2:59 am