Login

Join for Free!
112024 members
table of contents table of contents

Protein-Carbohydrate interactions are crucial in many biological processes with implications to drug …


Biology Articles » Biochemistry » Carbohydrate Biochemistry » Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network » Tables

Tables
- Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network

Table 1

Showing Procarb40 dataset. (Some cells are left empty as no Pfam ID could be found for them).

PDB ID
Pfam ID
Ligand name
Ligand formula
Ligand ID
1A7C
Serpins
ALPHA-D-MANNOSE
N-ACETYL-D-GALACTOSAMINE
N-METHYLCARBONYLTHREONINE
C6 H12 O6
3(C8 H15 N1 O6)
2(C6 H11 N1 O4)
MAN
NGA
THC
1AU1
Interferon
ZINC ION
FUCOSE
GLUCOSE
D-GALACTOSE
ALPHA-D-MANNOSE
ZN1 2+
2(C6 H12 O6)
4(C6 H12 O6)
C6 H12 O6
2(C1 H12 O6)
ZN
FUC
GLC
GAL
MAN
1AXM
Fibroblast Growth Factors
SELENOMETHIONINE
O2-SULFO-GLUCURONIC ACID
N,O6-DISULFO-GLUCOSAMINE
6(C5 H11 N1 O2 SE1)
7(C6 H10 O10 S1)
8(C6 H13 N1 O11 S2)
MSE
IDS
SGN
1CVN
Lectin legB
CALCIUM ION
MANGANESE (II) ION
ALPHA-D-MANNOSE
4(CA1 2+)
4(MN1 2+)
12(C1 H12 O6)
CA
MN
MAN
1E6N
CBM_5_12
Glycol_hydro_18
GLYCEROL
SULFATE ION
N-ACETYL-D-GLUCOSAMINE
C3 H8 O3
12(O4 S1 2-)
10(C8 H15 N1 O6)
GOL
SO4
NAG
1FV3
Toxin_R_bind_C
Toxin_R_bind_N
Toxin_trans
GLUCOSE
PHOSPHATE ION
D-GALACTOSE
ETHYL-TRIMETHYL-SILANE
N-ACETYL-D-GALACTOSAMINE
5-N-ACETYL-BETA-D-NEURAMINIC ACID
5-N-ACETYL-ALPHA-D-NEURAMINIC ACID
2(C6 H12 O6)
O4 P1 3-
4(C6 H12 O6)
2(C5 H14 SI1)
2(C8 H15 N1 O6)
2(C11 H19 N1 O9)
4(C11 H19 N1 O9)
GLC
PO4
GAL
CEQ
NGA
SLB
NAN
1FWU
Ricin_B_lectin
FUCOSE
O3-SULFONYLGALACTOSE
ALPHA-METHYL-N-ACETYL-D-GLUCOSAMINE
C6 H12 O5
C6 H12 O9 S1
C9 H17 N1 O6
FUC
SGA
MAG
1G1T
EGF
Lectin C
FUCOSE
CALCIUM ION
D-GALACTOSE
O-SIALIC ACID
N-ACETYL-O-METHYL-D-GLUCOSAMINE
C6 H12 O5
CA1 2+
C6 H12 O6
C11 H19 N1 O9
C9 H17 N1 O6
FUC
CA
GAL
SIA
1NA
1G5N
Annexin
CALCIUM ION
N,O6-DISULFO-GLUCOSAMINE
1,4-DIDEOXY-O2-SULFO-GLUCURONIC ACID
1,4-DIDEOXY-5-DEHYDRO-O2-SULFO-GLUCURONIC ACID
9(CA1 2+)
4(C6 H13 N1 O11 S2)
2(C6 H10 O8 S1)
2(C6 H8 O8 S1)
CA
SGN
IDU
UAP
1GMN
Kringle
PAN
O2-SULFO-GLUCURONIC ACID
N,O6-DISULFO-GLUCOSAMINE
4-(2-HYDROXYETHYL)-1-PIPERAZINE ETHANESULFONIC ACID
3(C6 H10 O10 S1)
2(C6 H13 N1 O11 S2)
2(C8 H18 N2 O4 S1)
IDS
SGN
EPE
1GUI
CBM4/9
CALCIUM ION
GLYCEROL
BETA-D-GLUCOSE
CA1 2+
5(C3 H8 O3)
6(C6 H12 O6)
CA
GOL
BGC
1GWM
Family 29 carbohydrate binding module
GLUCOSE
COBALT (II) ION
1,2-ETHANEDIOL
BETA-D-GLUCOSE
C6 H14 O6
CO1 2+
8(C2 H6 O2)
5(C6 H12 O6)
GLC
CO
EDO
BGC
1IW6
Bac_rhodopsin
GLUCOSE
RETINAL
D-GALACTOSE
ALPHA-D-MANNOSE
2,3-DI-PHYTANYL-GLYCEROL
2,3-DI-O-PHYTANLY-3-SN-GLYCERO-1-PHOSPHORYL-3'-SN-GLYCEROL-1'-PHOSPHATE
C6 H12 O6
C20 H28 O1
C6 H12 O6
C6 H12 O6
C43 H88 O3
4(C46 H94 O11 P2 2-)
GLC
RET
GAL
MAN
L2P
L3P
1J8R
PapG _N
GLUCOSE
D-GALACTOSE
N-ACETYL-D-GLUCOSAMINE
SELENOMETHIONINE
C6 H12 O6
2(C6 H12 O6)
C8 H15 N1 O6
3(C5 H11 N1 O2 SE1)
GLC
GAL
NAG
MSE
1JPC
B_lectin
D-mannose binding lectin
ALPHA-D-MANNOSE
8(C1 H12 O6)
MAN
1LGB
Lectin_legB
Transferrin
FUCOSE
CALCIUM ION
D-GALACTOSE
MANGANESE (II) ION
ALPHA-D-MANNOSE
N-ACETYL-D-GLUCOSAMINE
C6 H12 O6
CA1 2+
C6 H12 O6
MN1 2+
3(C1 H12 O6)
4(C8 H15 N1 O6)
FUC
CA
GAL
MN
MAN
NAG
1M5J

ALPHA-D-MANNOSE
O1-PENTYL-MANNOSE
2- [N-CYCLOHEXYLAMINO]ETHANE SULFONIC ACID
8(C6 H12 O6)
C11 H22 O6
C8 H17 N1 O3 S1
MAN
OPM
NHE
1OH4

CALCIUM ION
GLYCEROL
SULFATE ION
BETA-D-MANNOSE
ALPHA D-GALACTOSE
CA1 2+
2(C3 H8 O3)
O4 S1 2-
5(C6 H12 O6)
2(C6 H12 O6)
CA
GOL
SO4
BMA
GLA
1Q8V
Lectin_legB
CALCIUM ION
MANGANESE (II) ION
ALPHA-D-MANNOSE
PYROGLUTAMIC ACID
2(CA1 2+)
2(MN1 2+)
5(C6 H12 O6)
2(C5 H7 N1 O3)
CA
MN
MAN
PCA
1QFO
V-set
GLUCOSE
D-GALACTOSE
O-SIALIC ACID
2(C6 H12 O6)
2(C6 H12 O6)
3(C11 H19 N1 O9)
GLC
GAL
SIA
1RID
Sushi
O2-SULFO-GLUCURONIC ACID
N,O6-DISULFO-GLUCOSAMINE
8(C6 H10 O10 S1)
8(C6 H13 N1 O11 S2)
IDS
SGN
1SE3
Stap_Strp_tox_C
Stap_Strp_toxin
GLUCOSE
D-GALACTOSE
O-SIALIC ACID
C6 H12 O6
C6 H12 O6
C11 H19 N1 O9
GLC
GAL
SIA
1SL4
Lectin_C
CALCIUM ION
ALPHA-D-MANNOSE
3(CA1 2+)
4(C6 H12 O6)
CA
MAN
1SLC
Gal-bind_lectin
D-GALACTOSE
ALPHA-D-MANNOSE
N-ACETYL-D-GLUCOSAMINE
4(C6 H12 O6)
6(C1 H12 O6)
6(C8 H15 N1 O6)
GAL
MAN
NAG
1T0W
Chitin_bind_1
AMINO GROUP
N-ACETYL-D-GLUCOSAMINE
H2 N1
3(C8 H15 N1 O6)
NH2
NAG
1T8U
Sulfotransfer_1
SODIUM ION
SULFATE ION
O2-SULFO-GLUCURONIC ACID
N,O6-DISULFO-GLUCOSAMINE
ADENOSINE-3'-5'-DIPHOSPHATE
1,4-DIDEOXY-5-DEHYDRO-O2-SULFO-GLUCURONIC ACID
2(NA1 1+)
O4 S1 2-
C6 H10 O10 S1
2(C6 H13 N1 O11 S2)
2(C10 H15 N5 O10 P2)
C6 H8 O8 S1
NA
SO4
IDS
SGN
A3P
UAP
1ULE

D-GALACTOSE
N-ACETYL-D-GLUCOSAMINE
4(C6 H12 O6)
2(C8 H15 N1 O6)
GAL
NAG
1UX7
CBM_6
CALCIUM ION
SULFATE ION
BETA-D-XYLOPYRANOSE
2(CA1 2+)
O4 S1 2-
3(C5 H10 O5)
CA
SO4
XYP
1UY4

SODIUM ION
CALCIUM ION
GLYCEROL
BETA-D-XYLOPYRANOSE
NA1 1+
CA1 2+
C3 H8 O3
4(C5 H10 O5)
NA
CA
GOL
XYP
1UYY
CBM_6
CALCIUM ION
BETA-D-GLUCOSE
4(CA1 2+)
7(C6 H12 O6)
CA
BGC
1VBO

ALPHA-D-MANNOSE
N-ACETYLALANINE
20(C6 H12 O6)
8(C5 H9 N1 O3)
MAN
AYA
1VPS
Polyoma Coat
D-GALACTOSE
O-SIALIC ACID
N-ACETYL-D-GLUCOSAMINE
5(C6 H12 O6)
10(C11 H19 N1 O9)
5(C8 H15 N1 O6)
GAL
SIA
NAG
1W9T
CBM_6
SODIUM ION
XYLOPYRANOSE
BETA-D-XYLOPYRANOSE
6(NA1 1+)
2(C5 H10 O5)
8(C5 H10 O5)
NA
XYS
XYP
1XT3
Toxin_1
CITRIC ACID
N,O6-DISULFO-GLUCOSAMINE
1,4-DIDEOXY-O2-SULFO-GLUCURONIC ACID
C6 H8 O7
3(C6 H13 N1 O11 S2)
3(C6 H10 O8 S1)
CIT
SGN
IDU
2BOS
SLT beta
BUTYL GROUP
GLUCOSE
D-GALACTOSE
3(C4 H9)
5(C6 H12 O6)
14(C6 H12 O6)
BUT
GLC
GAL
2FCP
Plug
TonB_dep_Rec
GLUCOSE
PHOSPHATE ION
D-GALACTOSE
NICKEL (II) ION
3-OXO-BUTYRIC ACID
3-OXO-PENTADECANOIC ACID
GLUCOSAMINE 1-PHOSPHATE
GLUCOSAMINE 4-PHOSPHATE
ETHANOL AMINE PYROPHOSPHATE
L-GLYCERO-D-MANNO-HEPTOPYRANOSE
3-DEOXY-D-MANNO-OCT-2-ULOSONIC ACID
2-TRIDECANOYLOXY-PENTADECANOIC ACID
C6 H12 O6
O4 P1 3-
2(C6 H12 O6)
2(NI1 2+)
C4 H6 O3
C15 H28 O3
C6 H14 N1 O8 P1
C6 H14 N1 O8 P1
C2 H9 N1 O7 P2
2(C7 H14 O7)
2(C8 H14 O8)
2(C28 H54 O4)
GLC
PO4
GAL
NI
LIN
LIM
GP1
GP4
EA2
GMH
KDO
LIL
2MPR
LamB
GLUCOSE
CALCIUM ION
C6 H12 O6
Ca 2+
GLC
CA
3CHB
Enterotoxin b
GLUCOSE
D-GALACTOSE
O-SIALIC ACID
N-ACETYL-D-GALACTOSAMINE
N-(EHTYLSULFITE)MORPHOLINE
5(C6 H12 O6)
10(C6 H12 O6)
5(C11 H19 N1 O9)
5(C8 H15 N1 O6)
2(C6 H14 N1 O4 S1)
GLC
GAL
SIA
NGA
MES
3MAN
Cellulase
ALPHA-D-MANNOSE
3(C6 H12 O6)
MAN
3MBP

GLUCOSE
3(C6 H12 O6)
GLC

Malik et al. BMC Structural Biology 2007 7:1   doi:10.1186/1472-6807-7-1

Table 2

Propensities of Procarb40, PDNA62 & PLD116 along with their binding and non-binding data


PROCARB40
PDNA62
PLD116
Residue
Propensity
BS
NBS
Propensity
BS
NBS
Propensity
BS
NBS
A
0.43
9
494
0.64
42
389
0.79
109
2684
C
0.00
0
29
0.34
7
143
1.07
24
436
D
1.41
27
433
0.36
18
292
0.79
84
2009
E
1.81
29
356
0.39
32
510
0.92
92
1952
F
0.66
9
318
0.77
33
245
1.09
70
1346
G
0.80
20
581
0.71
46
372
1.26
176
2633
H
1.58
8
114
1.08
39
194
2.09
81
712
I
0.12
2
392
0.48
30
373
0.72
70
1837
K
1.40
26
419
1.95
180
423
0.59
65
2053
L
0.34
8
561
0.38
39
624
0.81
120
2872
M
0.19
1
124
0.54
14
149
1.11
42
716
N
1.96
38
429
1.45
74
260
1.17
92
1485
P
0.40
5
297
0.66
35
307
0.45
38
1597
Q
1.54
18
263
1.19
61
272
0.74
46
1123
R
2.77
32
246
2.41
208
360
1.80
139
1450
S
0.43
9
499
1.33
91
355
1.03
112
2049
T
0.70
15
499
1.36
85
325
0.87
90
2030
V
0.00
0
472
0.59
40
399
0.73
92
2315
W
3.31
23
144
1.40
22
81
2.30
67
518
Y
1.68
25
333
1.19
43
189
1.88
125
1189

Malik et al. BMC Structural Biology 2007 7:1   doi:10.1186/1472-6807-7-1

Table 3

Comparison of Binary and PSSM prediction results using jackknife leave-one-out method (binding sites were labeled at 3.5 Å cut-off distance between carbohydrate and protein atoms).

Data type
Validation type
Average-sensitivity
Average-specificity
Average -net Prediction
P-value
GalBind18
Leave1 out (Using PSSM)
0.63 (0.19)
0.79 (0.09)
0.71 (0.09)
0.08859
GalBind18
Leave1 out (Using single sequences)
0.62 (0.26)
0.68 (0.12)
0.65 (0.11)

Procarb40
Leave1 out (Using PSSM)
0.87 (0.12)
0.23 (0.08)
0.55 (0.06)
0.00209
Procarb40
Leave1 out (Using single sequences)
0.68 (0.22)
0.55 (0.16)
0.61 (0.12)

Due to a large number of iterations required in a leave-one-out method, the prediction performance has a significant standard deviation, which has been shown in brackets. P-values are for two-tailed t-test conducted to distinguish between the predictions performances of single sequences versus evolutionary information coded by PSSM. In Procarb40, evolutionary profiles give a significantly poorer result than single sequences, due to a high false positive rate (low specificity).

Malik et al. BMC Structural Biology 2007 7:1   doi:10.1186/1472-6807-7-1

 


rating: 5.00 from 2 votes | updated on: 15 Nov 2007 | views: 7900 |

Rate article:







excellent!bad…