Login

Join for Free!
117154 members


Want to identify common nucleotide motifs

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderator: BioTeam

Want to identify common nucleotide motifs

Postby biznatch » Thu Nov 16, 2006 5:59 am

I'm looking for a program/website that can identify motifs common between a number of long nucleotide sequences. I'd like to compare sequences (the more the better, 5-10 would be the minimum) each up to 150kb. I've found programs for 2 long sequences, or lots of short ones, but nothing for several long ones. Anyone know of a program/website that can do this?

Thanks.
biznatch
Garter
Garter
 
Posts: 8
Joined: Thu Nov 16, 2006 5:53 am
Location: Canada

Postby G-Do » Thu Nov 16, 2006 3:33 pm

Hi biznatch,

Which motif-finders have you tried? MEME is my favorite. However, I'm not sure that it can be used to discover motifs in such a large sequence set - typically, the bigger your sequences are, the more difficult it is to establish the statistical significance of any particular discovered motif (because the more sequence you have, the more likely it is that a 20-mer, for example, occurs by chance).

I can come up with a list of other motif-finders, but I think most of them will suffer from the statistical problem I just described.

What is the biological context for this project? These aren't 150kb promoters, are they? Maybe we can come up with another solution if I know what it is you're trying to find.
Vi veri veniversum vivus vici
User avatar
G-Do
Garter
Garter
 
Posts: 38
Joined: Mon Oct 02, 2006 12:23 pm
Location: Philadelphia, PA; USA

Postby biznatch » Thu Nov 16, 2006 4:56 pm

Thanks, I've tried MEME but ya it has a maximum of 60kb for all the sequences. I'm not really sure what I'm looking for, just doing a screen for a group of genes I've found to be commonly regulated and want to find possible common motifs in their genomic sequences. I'd like to compare the gene itself plus about 10kb up and downstream. The problem is that some of the genes are very big. I'm focusing more on larger motifs like over 100bp so that should make it easier to establish significance.
biznatch
Garter
Garter
 
Posts: 8
Joined: Thu Nov 16, 2006 5:53 am
Location: Canada


Postby G-Do » Sun Nov 19, 2006 5:52 am

Hi biznatch,

Do you have any reason to expect that the elements controlling these genes' regulation are 100bp in length? Do you have any reason to expect that the elements controlling these genes' regulation are in the exons/introns, as opposed to the 5'UTR/3'UTR and 10kb promoter? I can tell you now that the social conventions for promoter element detection are:

(i) Promoter elements are usually short (6-30bp)
(ii) They are usually found in the UTRs, first intron, and promoter

Granted, there is an issue of ascertainment bias in these claims, but these are the assumptions that motif-detection algorithms (like the ones MEME and AlignACE use) are based on.

If you are dead set on finding ~100bp motifs, I guess you could try local/multiple sequence alignments. A motif of that size would probably be detectable using standard local alignment statistics. Alternatively, if you expect these motifs to pop up a lot in each sequence, you could try breaking the sequences into subsequences and finding the motif inside those.
Vi veri veniversum vivus vici
User avatar
G-Do
Garter
Garter
 
Posts: 38
Joined: Mon Oct 02, 2006 12:23 pm
Location: Philadelphia, PA; USA

Postby sachin » Mon Nov 20, 2006 3:15 pm

Go to;

http://www.ncbi.nlm.nih.gov

Then go for BLAST --- Nucleotide compare Progamme

or FAST ---- Proteom compare programme

You will get results in 15 sec.

I oftnely use this site. It is a large database site. One of my favorites.
Senior Education Officer, BNHS, India. www.bnhs.org

Bitter Truth!
Who says reason for world war IV will be Petrol?
Reason lies in two words "Me and Mine".
User avatar
sachin
King Cobra
King Cobra
 
Posts: 1517
Joined: Mon Jan 30, 2006 12:39 pm
Location: MUMBAI / INDIA

Postby G-Do » Thu Nov 23, 2006 6:00 pm

sachin@biog wrote:Go to;

http://www.ncbi.nlm.nih.gov

Then go for BLAST --- Nucleotide compare Progamme

or FAST ---- Proteom compare programme

You will get results in 15 sec.

I oftnely use this site. It is a large database site. One of my favorites.


How is that supposed to help find ~100mer motifs in many input oligonucleotide sequences? We aren't interested in BLASTing against a database in this problem.
Vi veri veniversum vivus vici
User avatar
G-Do
Garter
Garter
 
Posts: 38
Joined: Mon Oct 02, 2006 12:23 pm
Location: Philadelphia, PA; USA


Return to Bioinformatics

Who is online

Users browsing this forum: No registered users and 1 guest

cron