Login

|
|
need helpModerator: BioTeam
2 posts • Page 1 of 1
need helpI was reading this patent which contained a training of a neural network the training set consisted of a set of genes and a set of non genes could you give me some information about non genes that is
1.what are non genes? 2. how can i get their sequences ? 3. what kind of organisms have high number of non genes? 4.are pseudo genes same as non genes? Patent and Paragraph details publication number :US 2005 /0136480 A1 publication date 23rd june 2005 paragraph no [0081] paragraph text "The training set' consists of 1610 E coli .K-12 NCBI listed protein coding genes and 3000 F. E coli .K-12 ORFS (a stretch of sequence of length more than 20 amino acids and having start codon, stop codon in the same frame) which have not been reported as genes (non-genes). The validation set has 1000 known genes and 1000 non-genes from E coli .K-12 distinct from those used in the training set. The test set contains another 1000 genes and 1000 nongenes from the same organism. For training of the ANN, genes and the non-genes are assigned a probability value of 1 and 0 respectively." can anybody xplain me what this paragraph means...and from where i can get these 3000 non genes
What is the patent for exactly? I'll need a little more info...
First, realize I'm only talking about protein encoding genes here (since I think that's what the patent is doing?). Protein encoding genes are simply regions that code for a protein, and non-protein encoding genes are regions that do not code for a protein. In E.Coli K-12, 85% of the sequence codes for proteins. All bacteria, will have a fairly large % of protein encoding genes (80%+). Also, you may want to know that every DNA fragment has 6 possible reading frames, and only one of the reading frame encodes for a protein. This MAY be what the patent is refering to when they talk about nongenes - the wrong reading frame. If I am to assume the patent is for a machine learning algorithm on gene prediction, this is my take on it. The authors randomly take DNA fragments, and separate them into coding and non-coding depending on if the fragment is a subset of a coding gene, or non-coding. If it's partial, who knows, you'd have to read the patent more.
2 posts • Page 1 of 1
Who is onlineUsers browsing this forum: No registered users and 0 guests |
© Biology-Online.org. All Rights Reserved. Register | Login | About Us | Contact Us | Link to Us | Disclaimer & Privacy
Science Network - Braintrack.com - University Directory | Chemicool.com - Chemistry | Logo design by LogoBee | Powered by phpBB