Materials and methods
Data sources

The genomes of 18 cyanobacteria included Synechocystis, Synechococcus, Prochlorococcus, Anabaena, Nostoc, Trichodesmium, Gloeobacter and Crocosphaera were downloaded from IMG database. Each genome was fed into the program formatdb [18] to create an organism-species database. A set of crt genes was obtained from IMG v.1.1 (http://img.jgi.doe.gov/v1.1/main.cgi) and GenBank database. This dataset, including well-characterized and putative enzymes encoded by cyanobacterial crt genes, was used to construct a query protein set. Each protein in this query dataset was used to search the potential novel sequences in all cyanobacterial species with whole genome sequences available, by using the BLASTP and TBLASTN programs, with e-valuecrt genes were manually inspected for each species. Similarity searches of the above databases also led to identification copies of crt genes in these species.

Multiple sequence alignment and phylogenetic analysis

Multiple protein sequence alignment was performed using ClustalX program with the implanted BioEdit [19, 20] for each of caroteniod biosynthetic pathway genes. Motifs of these enzymes across the domains were determined by NCBI BLAST search or SMART (http://smart.embl-heidelberg.de/) [21]. Phylogenetic trees were reconstructed using neighbor-joining method [22], as implemented in the program MEGA 2.1 [23]. Bootstrap support was estimated using 1000 replicates for distance analyses.

Tertiary structure prediction

To well understand the evolution of certain enzyme, protein structure was analyzed using homology modeling. The protein sequences of lycopene cyclase from Prochlorococcus MIT 9312 and Arabidopsis thaliana were submitted to the protein model server: RCSB protein data bank Web server (http://www.rcsb.org/pdb/Welcome.do) with PDB-1pn0 as the model template. All the manipulations were performed using PdbViewer.

