table of contents table of contents

A recent definition of a reference set of proteins essential for the …

Home » Biology Articles » Immunobiology » Efficiency of the immunome protein interaction network increases during evolution » Methods

- Efficiency of the immunome protein interaction network increases during evolution

Reconstructing the human immunome related protein-protein interaction network

Human immune system related proteins were collected from the Immunome database, which is a reference set for the human immune system composed by a combination of literature analysis and data mining [16]. Protein interactions were associated with the immunome proteins according to the Human Protein Reference Database (HPRD) [19]. Since only interactions between the immunome proteins were taken into account, no new nodes were added, but proteins without interactions were eliminated from the dataset. The final network contains 584 nodes out of the 847 original ones, forming altogether 1349 interactions. Interactions which appeared more than once were simplified to single edges.

Generating subnetworks at ten levels of evolution

Evolutionary information was assigned to all the proteins of the network. The data was taken from the ImmTree database [18] which represents ten evolutionary emergence levels from Eukaryota (level 9) to Homo sapiens (level 0) (Table 1) [16]. The numbers are the unified evolutionary levels from the database. The emergence levels were defined by analysis of the orthologs of each protein, and they represent the earliest common ancestor of species where orthologs of a protein can be identified. For example in the case of the FYN gene for protein-tyrosine kinase fyn, the ortholog from the largest evolutionary distance according to the ImmTree database is C. elegans, therefore we assume that this gene was already present in the ancestor of the whole Bilateria group, so we assign level number 7 for this gene.

Subnetworks for all ten levels were generated with the program Cytoscape [50] so that all the nodes and the corresponding edges, which emerged later than the present level, were eliminated from the interaction network. Thus, network level 0 is equivalent to the original protein-protein interaction network; whereas networks with higher level numbers represent the network at earlier steps of evolution with a fewer number of nodes and edges (Table 1). Statistics and different network parameters were calculated for the subnetworks using the igraph R library [51].

PPI interactions for D. melanogaster and C. elegans

Experimentally derived data for low level PPI subnetworks in fruitfly and worm was acquired from the IntAct [42] and PIMRider [43] databases, respectively. Orthologs of human immunome proteins were identified in these genomes using data from the ImmTree database. Then, interactions between the immunome ortholog proteins were identified from the datasets and included to the low level networks. Interactions in the fruitfly data were introduced to the Coelomata ancestor's subnetwork (level 6) and worm data to the Bilateria ancestor's subnetwork (level 7). There were 132 immunome ortholog proteins identified in fruitfly and 27 in worm. The interaction datasets contained 13 new interactions between these proteins. These interactions were analysed similar to the human PPI network derived data. Thus, interactions were maintained in the earlier subnetworks only if both the interacting proteins were present on that level.

Degree distribution of the subnetworks

Power law distribution was fitted to the distribution of degrees in all subnetworks. The power law exponent (α) was estimated with its standard error by maximum likelihood method.

Average entropy of the proteins

Multiple protein sequence alignments were downloaded from the ImmTree database for each protein with an evolutionary level number higher than 0. Entropy values were calculated for each site of the alignments [47] as follows:


where pi is the frequency of residues from class i at the position. The following six classes of amino acids were used: aliphatic (A, V, L, I, M, C), aromatic (F, W, Y, H), polar (S, T, N, Q), basic (K, R), acidic (D, E) and special conformation (G, P). The arithmetic mean of the entropy was calculated for those sites, where at least 50%+1 of the sequences was present in order to avoid the overestimation of conservation caused by long unique sequence parts in the alignment, which usually appear at the ends of the alignment.

Efficiency of the network

Global efficiency quantifies the efficiency of the network in sending information between nodes, assuming that the efficiency for sending information between two chosen nodes is proportional to the reciprocal of their distance [48]. Global efficiency was calculated as follows:


where dij is the distance between the i-th and j-th nodes as the minimal number of edges on the shortest path between them.

Expected efficiency of the networks

In small world networks the average path length is expected to follow L ~ ln ln N, where N is the number of nodes in the network [36]. Global efficiency is the reciprocal of the average path length [48], therefore we calculated the expected efficiency of the networks as:


If we assume that the subnetworks have power law degree distribution, this model of expected efficiency can therefore be used. However, in ultra small networks the average path length is better estimated like L ~ ln N. Since our network models are small, a second curve for the expected efficiency was calculated as:


Maximum vulnerability of the networks

The vulnerability of a network was calculated using the efficiency characteristics of the networks [49]. The vulnerability, Vi, of a network associated with the i-th node is


where E is the global efficiency of the network while Ei is the global efficiency of the network without the node i and all of its interactions. The overall vulnerability of the network is the value of the most vulnerable node, i.e. the largest loss in performance when a node is deleted from the network. The smaller the value for vulnerability the more stable the network is against random node removal.

Visualization of the results

Notched boxplots were used to visualize the distributions in a succinct, comparable way using the default settings for boxplots in R. For these figures, a box was plotted between the lower and upper hinges. The median of the dataset is also indicated. Whiskers were drawn toward the data extremes, up to 1.5 times the length of the box. Data points further than the whiskers are marked with circles. The notches extend to



from the median on the sides of the boxes, and represent roughly a 95% confidence interval for the medians. If the notches of two plots do not overlap it is a strong evidence that the two medians differ.

rating: 0.00 from 0 votes | updated on: 20 Jul 2009 | views: 9414 |

Rate article: