table of contents 
This work suggests that such a combined approach to target selection is …
|
Table 1
|
Definition of terms.
|
Term
|
Definition
|
|
SP-trEMBL
|
Sequence dataset containing 2,241,227 sequences from the Swiss-Prot (version 48.1) and TrEBML (version 31.1) sequence databases.
|
Integr8_263
|
Sequence dataset containing 913,094 sequences from 263 completed genomes listed in the Integr8 genome database.
|
Pfam_struc
|
Pfam-A family containing a PDB structure that has not yet been classified into the CATH domain database.
|
NewFam
|
Protein families generated in the Gene3D
|
|
Table 2
|
Percentage of problematic and singleton domain sequences.
|
|
Percentage of domains
|
|
Sequence dataset
|
Transmembrane & problematic
|
Singleton
|
|
SP-trEMBL
|
18.5
|
22.6
|
integr8_263
|
17.9
|
24.9
|
A thaliana
|
17.5
|
16.0
|
B anthracis
|
20.3
|
8.6
|
C elegans
|
19.8
|
22.1
|
D melanogaster
|
18.7
|
18.7
|
E coli
|
15.7
|
7.3
|
H sapiens
|
15.9
|
20.9
|
S cerevisiae
|
14.9
|
24.7
|
T maritima
|
13.4
|
12.7
|
|
| The percentage of problematic and singleton domain sequences in Swiss-Prot & TrEMBL, 263 completed genomes and eight model genomes Problematic domains are defined as those containing helical transmembrane helices or significant regions of low complexity or coiled-coil |
Table 3
|
Current structural coverage of sequences, domains and residues in Swiss-Prot & TrEMBL
|
Coverage type
|
Percentage coverage
|
|
|
|
Per sequence
|
Per domain
|
Per residue
|
|
|
Integr8_263
|
SP-trEMBL
|
Integr8_263
|
SP-trEMBL
|
Integr8_263
|
SP-trEMBL
|
|
All Sequences
|
52.4
|
54.4
|
44.8
|
47.7
|
34.5
|
36.2
|
- excluding transmembrane & problematic sequences
|
/
|
/
|
53.4
|
57.3
|
41.1
|
44.1
|
- excluding transmembrane problematic & singleton sequences
|
/
|
/
|
71.1
|
81
|
59.5
|
63.8
|
|
Table 4
|
Analysis of the kingdom distribution.
|
|
Kingdom distribution of structurally uncharacterised families
|
|
|
2500 largest Pfam-A
|
All Pfam-A
|
All Pfam-A & Newfam
|
|
Eukaryotic
|
788
|
1286
|
8240
|
Viral
|
295
|
503
|
1145
|
Prokaryotic (5 or more)
|
1381
|
1959
|
6290
|
Prokaryotic (1 or more)
|
1377
|
2114
|
8304
|
No prokaryotic members
|
1125
|
1833
|
9488
|
|
| Analysis of the kingdom distribution of: Column 2, the largest 2500 structurally uncharacterised Pfam-A families Column 3, All structurally uncharacterised Pfam-A families Column 4, All structurally uncharacterised Pfam-A and Newfam families Of the remaining structurally uncharacterised Pfam-A families, 1959 have 5 or more prokaryotic sequences – additional Newfam families can be used to build a larger target list (eg 2500 families shown in Figure 6, black bars). |
rating: 4.00 from 2 votes | updated on: 28 Jul 2007 | views: 281 |
|