Reconstructing the human immunome related protein-protein interaction network
Human immune system related proteins were collected from the
Immunome database, which is a reference set for the human immune system
composed by a combination of literature analysis and data mining . Protein interactions were associated with the immunome proteins according to the Human Protein Reference Database (HPRD) .
Since only interactions between the immunome proteins were taken into
account, no new nodes were added, but proteins without interactions
were eliminated from the dataset. The final network contains 584 nodes
out of the 847 original ones, forming altogether 1349 interactions.
Interactions which appeared more than once were simplified to single
Generating subnetworks at ten levels of evolution
Evolutionary information was assigned to all the proteins of the network. The data was taken from the ImmTree database  which represents ten evolutionary emergence levels from Eukaryota (level 9) to Homo sapiens (level 0) (Table 1) .
The numbers are the unified evolutionary levels from the database. The
emergence levels were defined by analysis of the orthologs of each
protein, and they represent the earliest common ancestor of species
where orthologs of a protein can be identified. For example in the case
of the FYN gene for protein-tyrosine kinase fyn, the ortholog from the largest evolutionary distance according to the ImmTree database is C. elegans,
therefore we assume that this gene was already present in the ancestor
of the whole Bilateria group, so we assign level number 7 for this gene.
Subnetworks for all ten levels were generated with the program Cytoscape 
so that all the nodes and the corresponding edges, which emerged later
than the present level, were eliminated from the interaction network.
Thus, network level 0 is equivalent to the original protein-protein
interaction network; whereas networks with higher level numbers
represent the network at earlier steps of evolution with a fewer number
of nodes and edges (Table 1). Statistics and different network parameters were calculated for the subnetworks using the igraph R library .
PPI interactions for D. melanogaster and C. elegans
Experimentally derived data for low level PPI subnetworks in fruitfly and worm was acquired from the IntAct  and PIMRider 
databases, respectively. Orthologs of human immunome proteins were
identified in these genomes using data from the ImmTree database. Then,
interactions between the immunome ortholog proteins were identified
from the datasets and included to the low level networks. Interactions
in the fruitfly data were introduced to the Coelomata ancestor's
subnetwork (level 6) and worm data to the Bilateria ancestor's
subnetwork (level 7). There were 132 immunome ortholog proteins
identified in fruitfly and 27 in worm. The interaction datasets
contained 13 new interactions between these proteins. These
interactions were analysed similar to the human PPI network derived
data. Thus, interactions were maintained in the earlier subnetworks
only if both the interacting proteins were present on that level.
Degree distribution of the subnetworks
Power law distribution was fitted to the distribution of degrees in all subnetworks. The power law exponent (α) was estimated with its standard error by maximum likelihood method.
Average entropy of the proteins
Multiple protein sequence alignments were downloaded from the
ImmTree database for each protein with an evolutionary level number
higher than 0. Entropy values were calculated for each site of the
alignments  as follows:
where pi is the frequency of residues from class i at
the position. The following six classes of amino acids were used:
aliphatic (A, V, L, I, M, C), aromatic (F, W, Y, H), polar (S, T, N,
Q), basic (K, R), acidic (D, E) and special conformation (G, P). The
arithmetic mean of the entropy was calculated for those sites, where at
least 50%+1 of the sequences was present in order to avoid the
overestimation of conservation caused by long unique sequence parts in
the alignment, which usually appear at the ends of the alignment.
Efficiency of the network
Global efficiency quantifies the efficiency of the network in
sending information between nodes, assuming that the efficiency for
sending information between two chosen nodes is proportional to the
reciprocal of their distance . Global efficiency was calculated as follows:
where dij is the distance between the i-th and j-th nodes as the minimal number of edges on the shortest path between them.
Expected efficiency of the networks
In small world networks the average path length is expected to follow L ~ ln ln N, where N is the number of nodes in the network . Global efficiency is the reciprocal of the average path length , therefore we calculated the expected efficiency of the networks as:
If we assume that the subnetworks have power law degree
distribution, this model of expected efficiency can therefore be used.
However, in ultra small networks the average path length is better
estimated like L ~ ln N. Since our network models are small, a second curve for the expected efficiency was calculated as:
Maximum vulnerability of the networks
The vulnerability of a network was calculated using the efficiency characteristics of the networks . The vulnerability, Vi, of a network associated with the i-th node is
where E is the global efficiency of the network while Ei is the global efficiency of the network without the node i and
all of its interactions. The overall vulnerability of the network is
the value of the most vulnerable node, i.e. the largest loss in
performance when a node is deleted from the network. The smaller the
value for vulnerability the more stable the network is against random
Visualization of the results
Notched boxplots were used to visualize the distributions in a
succinct, comparable way using the default settings for boxplots in R.
For these figures, a box was plotted between the lower and upper
hinges. The median of the dataset is also indicated. Whiskers were
drawn toward the data extremes, up to 1.5 times the length of the box.
Data points further than the whiskers are marked with circles. The
notches extend to
from the median on the sides of the boxes, and represent roughly a
95% confidence interval for the medians. If the notches of two plots do
not overlap it is a strong evidence that the two medians differ.