Help with parsing and trascription factor finding

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderators: honeev, Leonid, amiradm, BioTeam

Post Reply
Posts: 1
Joined: Thu Jan 31, 2008 4:55 pm

Help with parsing and trascription factor finding

Post by Raynor5000 » Sun Feb 03, 2008 9:49 pm

I was wondering if someone could help me out with a very basic bioinformatics problem. I'm trying to find a transcription factor in a list of P. falciparum genes, and so the first thing I have to do is parse out a list of the upstream region of each of the genes I've been given. I’m writing this program in perl. My problem is I'm having trouble deciding which way to orient the gene sequences I’m parsing.

The list of genes I'm working off of are listed like this:
#tax_id chromosome chr_start chr_stop chr_orient contig ctg_start ctg_stop ctg_orient feature_name
5833 1 29733 37349 + NC_004325.1 29733 37349 + MAL1P4.01
5833 1 39205 40430 - NC_004325.1 39205 40430 - MAL1P4.02
5833 1 50586 51859 + NC_004325.1 50586 51859 + MAL1P4.04

The 3rd column represents the start char of the gene in the chromosome, and the 4th column represents the stop char of the gene in the chromosome. The 5th column represents the orientation of the start and stop char.

What I have been told is that if the orientation of the gene is ‘+’, then I need to parse out from chr_start – 2000 to chr_start, to retrieve the 2000 bp upstream region of that gene. If the gene's orientation is listed as '-', then I have to parse out from chr_stop to char_stop + 2000 and produce its inverse (ACGAAATGA -> AGTAAAGCA) to retrieve the 2000 bp upstream region of that gene.

My problem is I'm not sure if I'm supposed to inverse and produce the complementary sequence (C->G & A->T), or just the inverse of the sequence.
I’m pretty sure I’ve been told only to produce the inverse and not the complementary, but I noticed while I was checking my parsed out data that by only producing the inverse the start codon for each gene comes out as TAC, and not the expected ATG.
So I guess a part of my question is does it matter if the start codon is ATG or TAC? Can the promoter bind to a gene this way?

I need to get this squared away before I start looking for my transcription factor (motif), so can someone please help me?

Post Reply

Who is online

Users browsing this forum: Google [Bot] and 9 guests