Manually curated sets of 62 S. cerevisiae and 50 R. norvegicus proteins with experimental evidence of peroxisomal location were compiled from the literature [9-11,19] and from the Saccharomyces Genome  and Swiss-Prot  databases. For the purpose of this paper we consider a protein to be peroxisomal when it permanently resides in the peroxisomal matrix or membrane, or when it is a cytoplasmic protein but has a dedicated function in peroxisomal protein import and/or biogenesis.
Protein sequences encoded by 144 publicly available complete genomes were obtained from Swissprot , except for Plasmodium falciparum, Schizosaccharomyce pombe, Candida albicans, Encephalitozoon cuniculi (Genbank, ), Homo sapiens, Rattus norvegicus and Mus musculus (EBI, ).
For every yeast and rat peroxisomal protein, homologous sequences (E . Neighbour Joining (NJ) trees were made using Kimura distances as implemented in ClustalW . Positions with gaps were excluded and 1000 bootstrap iterations were performed. Maximum Likelihood (ML) trees were derived using PhyML v2.1b1 , with a four rate gamma-distribution model, before and after excluding from the alignment positions with gaps in 10% or more of the sequences. In all cases NJ and ML trees were manually examined to search for consistent patterns indicating the origin of the peroxisomal proteins. Trees in which eukaryotic proteins clustered together, within the Bacteria or the Archaea and with a specific prokaryotic out-group were classified as having that phylogenetic origin (e.g. Figure 1b). Trees were only regarded as resolved when both the NJ tree and the ML tree agreed to the level of resolution required, e.g. a specific bacterial group as a sister clade of the peroxisomal group of proteins, or when at least the ML tree had the level of resolution required while the NJ tree did not point to another origin of a protein.
Reconstruction of yeast, rat peroxisomal metabolisms and their ancestral states
Annotated biochemical and cellular functions of the yeast and rat peroxisomal proteins were mapped onto metabolic KEGG maps  and are represented in Figure 5, indicating their phylogenetic origin by a color-code. Proteins known or predicted to be membrane-associated are depicted close to the membrane. The minimal ancestral opisthokont peroxisome was reconstructed by combining proteins that are present in both yeast and rat peroxisomal proteomes or that are present in only one of the two proteomes but have orthologs in plants with a peroxisomal location or are described as putative peroxisomal proteins in Araperox database . The minimal ancestral eukaryotic proteome is formed by those proteins of the ancestral opisthokont proteome that are also found in the genomes of plants, Typanosoma brucei and Leishmania major. Catalase and Fox1 that are absent from glycosomes were nevertheless included for the reasons explained in the results and discussion section.