Login

Join for Free!
112481 members


Prevalence of gene

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderator: BioTeam

Prevalence of gene

Postby marifly81 » Thu Sep 23, 2010 10:21 am

I want to find out how common my gene is within Bacteria. I have started blasting my gene against all sequenced bacterial genomes, but of course this is time-consuming as I manually check every species.. Are there other ways but Blast?
Help would be greatly appreciated! :D
marifly81
Garter
Garter
 
Posts: 1
Joined: Thu Sep 23, 2010 7:24 am

Postby bioinfo » Mon Sep 27, 2010 11:18 am

You can use Whole Genome Alignment tools like MUMmer (Maximal Unique Match, www.tigr.org/tigr-scripts/CMR2/webmum/mumplot
BLASTZ (http://bio.cse.psu.edu/)
LAGAN (Limited Area Global Alignment of Nucleotides; http://lagan.stanford.edu/)
PipMaker (http://bio.cse.psu.edu/cgi-bin/pipmaker?basic)
MAVID (http://baboon.math.berkeley.edu/mavid/)
GenomeVista (http://pipeline.lbl.gov/cgi-bin/GenomeVista)
if u wanna some other information then refer this book
Essential
Bioinformatics by
JIN XIONG
Texas A&M University
bioinfo
Garter
Garter
 
Posts: 33
Joined: Wed Sep 15, 2010 8:00 am

Re: Prevalence of gene

Postby xav2121 » Fri Mar 11, 2011 9:38 am

Hi marifly81 !
I'm also interested in determining the prevalence of a given gene (or protein) from sequenced bacterial genomes.
I have tried the links provided by bioinfo with no success. Have you found a way to do this ?
xav2121
Garter
Garter
 
Posts: 2
Joined: Fri Mar 11, 2011 9:30 am


Postby JackBean » Mon Mar 21, 2011 11:05 am

why do you need exact know, whether is it in any bacterial genome?

I think you should be able _at_ PubMed to sort your results by species or to view some tree
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5652
Joined: Mon Sep 14, 2009 7:12 pm

Postby xav2121 » Fri Mar 25, 2011 10:26 am

I still haven't found a way to get the info I want. I basically want to know if a given gene is present/absent in a species. The output I would like to see would be "this gene is found in 90% of the members of this genera, or taxon, or family". I've tried a few things like doing a Blast search in the sequenced genomes in NCBI and get a taxonomy report. The output gives you how many hits were found for a given species but only if there is a hit. So I still don't know in which species the gene is absent...
xav2121
Garter
Garter
 
Posts: 2
Joined: Fri Mar 11, 2011 9:30 am

Postby JackBean » Mon Mar 28, 2011 7:26 am

the question is, whether the other taxons where not sequenced yet or whether there is really no homologue. I guess you need to narrow your search to only few taxons and try them manually or you can always download all sequences from NCBI and run BLAST at-home, if you have some unused computer :D
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5652
Joined: Mon Sep 14, 2009 7:12 pm

Postby nfellaby » Thu Jun 16, 2011 7:23 pm

My method for BLAST'ing large numbers of organisms was just construct a database from the SEED Network, download all the genomes of interest. Save these as a single fasta file. Then to blast against this file the gene of interest. This was all done on BioLinux but runs on perl programming language, I manage to gain large numbers of hits with probability indicators. I still have all the syntax to hand, so get in touch if I can help.
Nick
nfellaby
Garter
Garter
 
Posts: 1
Joined: Thu Jun 16, 2011 7:08 pm

Postby JackBean » Thu Jun 23, 2011 7:06 am

you might be interested in this article
http://www.plosone.org/article/info%3Ad ... ne.0020892
http://www.biolib.cz/en/main/

Cis or trans? That's what matters.
User avatar
JackBean
Inland Taipan
Inland Taipan
 
Posts: 5652
Joined: Mon Sep 14, 2009 7:12 pm

Postby merv » Sun Oct 02, 2011 9:01 pm

As far as I understand it:
What you are looking at is evolutionarily related genes. The problem then depends entirely upon where you draw your lines. If you play with the BLAST scores, you will get different results. Many of the tools on the net such as BLAST on NCBI will give you the result - if you perform a BLAST (I prefer psi BLAST), and then you set your level

type your gene sequence into ncbi home page - this gives you the huige amount of refernces. click on Unigene (note the number of links that have already been assigned)
this takes you to page with: SELECTED PROTEIN SIMILARITIES
Comparison of cluster transcripts with RefSeq proteins. The alignments can suggest function of the cluster.
click on the top link that matches your protein, and in the submenu i suggest protein/protein matches, which has done the BLAST for you -
in my example of CD40 i get a list of matches with the organism the sequence came from: as far as "Blink" very simple to then write all of the species down. i get as far as Cricetulus griseus, the chinese hamster, and its Tumor necrosis factor receptor superfamily member 22 . As I know that CD40 is in this family I can trust it. If I get a bacterial sequence, I know I cant- such a thing occurs when you dont want to use Blink but try to find previously unidentified connections- then you enter a grey area occupied by people who don't like being asked to nail jelly to plates but admit that some people are better at it than others.
If you are not happy with the Blink data and want to challenge it, you can do your own BLAST - click the same link in Unigene, go to the protein, select BLAST, and then choose (in this example) the PSI-BLAST option (the default BLAST is also good), leave everything else the same - then do the alignment. Once you are familar with that, you need to consider how many matches you want to ask for, how much of the servers time you use is therefore worth bearing in mind. You can adjust the choice of matrix (I am not sure of the differences, but they will of course give you different answers - ) and you can adjust the stringency using the GAP-penalties section. You can generate a lifetimes work on one gene alone with the different options- it helps to have a guide if you do this work. PSI and PHI BLASTS allow greater analysis based on repeated iterations based on the previous analysis (including the new data each time, as I understand it). The threshold {EXPECT} is a means of lowering the threshold if you arent getting any results - now we are in 2011, this is rarely required as we have so much sequence data and most genes have homologs and in effect have been sequenced now in most organisms. In my example, I have just found a fish match to the CD40 gene - most interesting ! From the data I have discussed so far this would not be expected, but we know that CD40 exists in fish. So the question is, are you happy with what Blink gives you (humans and chines hamsters are related ) or would you have wanted to get as far as fish in the analysis? I expect you have a good understanding of bacteria- i dont hence my discussion of a mammalian gene but the similarities of approach apply still. If you want to limit the size of your study, just raise the EXPECT.
merv
Death Adder
Death Adder
 
Posts: 58
Joined: Sun Oct 02, 2011 2:05 pm


Return to Bioinformatics

Who is online

Users browsing this forum: No registered users and 0 guests

cron