What are today's bioinformatic bottlenecks?

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderators: Leonid, amiradm, BioTeam

Post Reply
Posts: 1
Joined: Sat May 02, 2009 9:54 am

What are today's bioinformatic bottlenecks?

Post by GerryB » Sat May 02, 2009 9:57 am

I have some interesting questions about practical bioinformatics bottlenecks.. what common or important bioinformatic tasks are the slowest or most limiting?
My background is in text compression, databases, and parallel processing, but I know basic bionformatic algorithms as well, since they're so related.

But what I don't know is what, in practice, is the bottleneck for most bioinformatic users is. I'm hoping to try to focus some new research (using cheap PC graphics cards) to bioinformatic algorithms and I want to work on speeding up the tasks that are truly a problem. Most kinds of applications can be sped up 10-100 times, so it's worthwhile to try to get some of these apps working in practice. But which ones should be sped up first? Probably the tasks that are both common AND slow.

I'd love if you'd share your opinions, experiences, even HOPES for what kind of tools or speedups or new abilities you'd like. Again I've got a good technical background, but not an idea of what tasks are still making actual biologists grind their teeth in frustration.

Some examples:

  • Is BLAST alignment speed an issue? Would you yell in joy if there was a new tool that gave identical results and was twice as fast?
  • Are you happy with BLAST but just want to do much bigger alignments? Stuff like "Here's my 1M nucleotide sequence, give me the top 1000 local alignments", and get an answer in 10 seconds?
  • Or maybe database searching? You want to say "here's 10000 nucleotides, please search every genome in GenBank and give me the best hits from everything! In 2 seconds like a Google search does!!"
  • Or de novo assembly? Is is a huge problem to get shotgun fragments and have to use a zillion CPU hours to assemble a genome?
  • Or maybe you always run a Smith-Waterman high quality alignment as a double check, and that take a week to cook and it really becomes a big issue?

Those are just examples right from the top of my head, and I don't know if those are actually abilities that are despirately desired. Or of course there's likely tasks I haven't even heard of that are a limitation.. please teach me.

Again what I'm really trying to understand is what computational tasks are common, but TOO SLOW. Or too size limited (maybe BLAST is fast for you, but only because you use short sequences because long ones you'd rather use are too slow.)

I appreciate any suggestions or stories or pleas... links to other forums that might help me learn as well, any feedback.
I'll be happy to discuss what algorithms modern hardware can help with as well. You may be surprised.


Posts: 5
Joined: Mon Jul 27, 2009 5:22 pm

Post by farful » Mon Jul 27, 2009 6:04 pm

tblastx is too slow. Find the correct reading frame before hand, and your computation time will decrease 36-fold. Otherwise, I'd say sequence alignment is the least of concerns as far as bioinformatic bottlenecks are concerned...

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest