Login

Join for Free!
17726 members
table of contents table of contents

This work suggests that such a combined approach to target selection is …


Biology Articles » Genetics » Genomics » Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint » Tables

Tables
- Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

Table 1
Definition of terms.
Term
Definition
SP-trEMBL
Sequence dataset containing 2,241,227 sequences from the Swiss-Prot (version 48.1) and TrEBML (version 31.1) sequence databases.
Integr8_263
Sequence dataset containing 913,094 sequences from 263 completed genomes listed in the Integr8 genome database.
Pfam_struc
Pfam-A family containing a PDB structure that has not yet been classified into the CATH domain database.
NewFam
Protein families generated in the Gene3D
Table 2
Percentage of problematic and singleton domain sequences.

Percentage of domains
Sequence dataset
Transmembrane & problematic
Singleton
SP-trEMBL
18.5
22.6
integr8_263
17.9
24.9
A thaliana
17.5
16.0
B anthracis
20.3
8.6
C elegans
19.8
22.1
D melanogaster
18.7
18.7
E coli
15.7
7.3
H sapiens
15.9
20.9
S cerevisiae
14.9
24.7
T maritima
13.4
12.7
The percentage of problematic and singleton domain sequences in Swiss-Prot & TrEMBL, 263 completed genomes and eight model genomes Problematic domains are defined as those containing helical transmembrane helices or significant regions of low complexity or coiled-coil
Table 3
Current structural coverage of sequences, domains and residues in Swiss-Prot & TrEMBL
Coverage type
Percentage coverage


Per sequence
Per domain
Per residue

Integr8_263
SP-trEMBL
Integr8_263
SP-trEMBL
Integr8_263
SP-trEMBL
All Sequences
52.4
54.4
44.8
47.7
34.5
36.2
- excluding transmembrane & problematic sequences
/
/
53.4
57.3
41.1
44.1
- excluding transmembrane problematic & singleton sequences
/
/
71.1
81
59.5
63.8
Table 4
Analysis of the kingdom distribution.

Kingdom distribution of structurally uncharacterised families

2500 largest Pfam-A
All Pfam-A
All Pfam-A & Newfam
Eukaryotic
788
1286
8240
Viral
295
503
1145
Prokaryotic (5 or more)
1381
1959
6290
Prokaryotic (1 or more)
1377
2114
8304
No prokaryotic members
1125
1833
9488
Analysis of the kingdom distribution of: Column 2, the largest 2500 structurally uncharacterised Pfam-A families Column 3, All structurally uncharacterised Pfam-A families Column 4, All structurally uncharacterised Pfam-A and Newfam families Of the remaining structurally uncharacterised Pfam-A families, 1959 have 5 or more prokaryotic sequences – additional Newfam families can be used to build a larger target list (eg 2500 families shown in Figure 6, black bars).

 


rating: 4.00 from 2 votes | updated on: 28 Jul 2007 | views: 281 |

Rate article:







excellent!bad…