Genome sequences of serotype A and B isolates of C. trachomatis
The genomes of two serotype B ocular isolates of C. trachomatis were
sequenced. These provide the first high quality genome sequences of
ocular isolates: B/TZ1A828/OT and B/Jali20/OT (referred to as CTB and
Jali20 respectively, for ease of nomenclature), summarised in Table 1.
Whole genome comparisons showed that the genomes are highly syntenic,
and that there are no whole gene differences between the strains. The
pairwise mean nucleotide identity between orthologous genes is 99.8%. A
comparison of the genomes against that of C. trachomatis serotype A strain A/HAR-13 (referred to as A/HAR-13) sequenced by microarray [21] showed no insertions or deletions of predicted coding sequences (CDSs).
Previous microarray and sequencing studies have shown that almost all sequence and gene variation distinguishing serotypes of C. trachomatis from each other are restricted to a region at the terminus of replication known as the Plasticity Zone (PZ) [22-24].
It was demonstrated by microarray and PCR analysis that a strain
belonging to serotype B lacked most of the genes found within this
region [22,23].
However, analysis of the serotype B isolates sequenced for this study
showed that they possess many of the CDSs found within the PZ (see
additional file 1).
These include cytotoxin gene fragments, remnants of a much larger
cytotoxin gene which is thought to have been similar to those still
found intact in C. muridarum [25]. In a strain of C. trachomatis serotype D, a remnant of the cytotoxin gene is expressed, and is cytotoxic on HeLa cells [26].
Consistent with previous analysis of the PZ, the phospholipase D genes
display high levels of disruption in both serotype B strains [24]. The PZ also encompasses the trpRBA operon, which has been found to be functional only in genital isolates, and to carry mutations in ocular strains [27]. This operon is non-functional in both the strains presented here, with either the trpB (B/TZ1A828/OT) or trpA (B/Jali20/OT) being disrupted (see additional file 1).
Additional file 1. Comparison of the Plasticity Zone between several strains, visualised by the Artemis Comparison Tool.
The grey lines indicate forward and reverse reading frames of sequenced
genomes, with predicted coding sequences superimposed. The red bars
indicate regions of 97–100% nucleotide identity. Brown CDSs denote
pseudogenes. The cytotoxin locus is reduced in D/UW-3/CX, yet produces
an active cytotoxin. It is further deleted in strain L2/434/BU. The
phospholipase D locus contains pseudogenes in all strains. The trp operon is complete in strains D/UW-3/CX and L2/434/BU, but has pseudogene components in the serotype A and B strains: trpB in B/TZ1A828/OT and A/HAR-13, and trpA in Jali20 and A/HAR13.
Format: PPT
Size: 85KB Download file
Despite there being no whole gene differences between
the strains, analysis of the genomes identified several pseudogene
differences when they were compared to each other and also with the
genomes of other C. trachomatis serotypes and biovariants A/HAR-13 and the serotype L2 strain L2/434/BU (see additional file 2).
Eleven of the disrupted CDSs are common to both serotype B strains,
whereas only one of these is common also to A/HAR_13 and L2/434/BU
(CTB_3211/JALI_3211/CTA_0350/CTL0578). The equivalent putative membrane
proteins CTB_0571/JALI_0571 contain different inactivating mutations: a
single base deletion causing a frameshift after codon 333 (CTB), or a
single nucleotide polymorphism (SNP) prematurely truncating the protein
after codon 177 (Jali20). Another example of a CDS with distinct
inactivating mutations is the putative exported protein
JALI_1341/CTA_0142. This CDS carries a frameshift mutation in Jali20,
and is truncated by a premature stop codon leading to a loss of 60
amino acids from the C terminus in A/HAR-13.
Additional file 2. Pseudogene differences between strains B/TZ1A828/OT, B/Jali20, A/HAR-13 and L2/434/BU. Pseudogenes (Ψ) are highlighted in brown, and the CDSs contained within the plasticity zone are shown in mauve.
Format: XLS
Size: 22KB Download file
Inclusion proteins are an important family of chlamydial
proteins, associated with virulence, which target the host inclusion
membrane. Consequently, the inactivation of CDSs CTB_2231 and JALI_2231
encoding candidate inclusion membrane proteins may have implications
with regard to how the cell interacts with the host. This CDS has been
disrupted by an identical SNP creating a stop codon after codon 113 in
both strains, with JALI_2231 having undergone a further single base
insertion and deletion at other sites to create two additional
frameshifts. Another notable difference is the variation in the
sequence of the secF gene, which is present as a full length gene in the serotype B strains, but is found as two separate CDSs secD/secF in A/HAR-13 [21].
A further functional loss is the operon comprising pyruvoyl-dependent
arginine decarboxylase and arginine/ornithine antiporter, involved in
pH homeostasis. The antiporter (CTB_3721/JALI_3721/CTA_0406-7) is
disrupted in the serotype A and B strains, the decarboxylase (CTL0627)
is disrupted in L2/434/BU, whereas the operon is intact in the serotype
D strain D/UW-3/CX [24].
Set against a high level of sequence identity (generally in excess
of 99% identity at the nucleotide level), some predicted CDSs display
higher levels of variation (see additional file 3). As has been noted before, these include ompA, which is used to distinguish between serotypes [3], tarp, encoding the translocated actin recruiting phosphoprotein [24], and hctB, encoding histone-like HC2 [28].
Additional file 3. Variable CDSs, comparing strains B/TZ1A828/OT, B/Jali20 and L2/434/BU. CDSs with significant variability between the strains are listed, with brief description of variation.
Format: XLS
Size: 19KB Download file
Phylogenetic analysis of C. trachomatis genomes
The first genome-scale, SNP-based phylogenetic analysis of all six available C. trachomatis genomes was carried out, covering serotypes A, B, D, L2 and L2b (Table 2).
Comparative genome analysis identified 11,500 SNPs, of which the large
majority define splits between the three major groups (Figure 1).
Monophyly of LGV strains is supported by 6200 SNPs, 1477 SNPs unite the
three ocular strains and 1377 are unique to the genital, serotype D
strain. These splits are also strongly supported in the results of the
phylogenetic analysis of the SNPs, using a general time reversible
model of evolution and four discrete gamma distributed rate categories
to account for among site rate variation. Pairwise comparisons of the
genomes by SNP numbers also confirms this clustering (Table 3).
Within the ocular strains, the two serotype B isolates cluster
together, suggesting that serotypes are identifiable on the basis of
SNP phylogenies. These data are derived from the genomes of six
isolates (five serotypes), whilst the numbers are small they reflect
the same patterns of genome evolution observed using fragments of the
genome [5]
and our data using complete genomes are strong enough to support these
associations, although further studies would be beneficial to confirm
these findings when the technology will allow rapid easy purification
of genomes from these obligate intracellular pathogens.
Phylogenetic analysis of C. trachomatis plasmids
The plasmid sequences from the strains CTB and Jali20 were assembled
from the genome shotgun. To investigate the new variant strain which
evaded diagnosis, isolates from epithelial sexually transmitted
infections (STIs) from the city of Malmo in Sweden were included.
Genital tract isolates representing the new variant C. trachomatis (strain
Sweden2, serotype E) and three concurrently isolated strains (Sweden3,
serotype E; Sweden4 and Sweden5, both serotype F) were selected. To
represent plasmids from other chlamydial serotypes, plasmid sequences
were obtained from Genbank covering further trachoma and LGV strains
(Table 2).
Alignment of these 11 plasmid sequences showed that there are 83 SNP
locations, representing approximately 1.1% variation. Six of these
occur in intergenic locations. The SNPs and their effects on coding
sequences are shown in Figure 2. Each plasmid is unique and identifiable by the presence of at least one SNP and/or indel (Figure 1B–D).
Only two SNPs, at positions 5,328 and 7,458 (using pSW3 as the
reference sequence), allow differentiation of the chlamydial plasmids
into LGV, trachoma and genital tract groupings. Most of the SNPs are
located within CDSs and there is no significant clustering of SNPs
within the plasmid. Only one non-synonymous mutation occurs within CDS2
which may suggest that this gene is under strong selection. Analysis of
the informative SNPs and indels allowed phylogenetic reconstruction of
the relationships between the plasmids (Figure 1).
The resulting phylogenetic tree shows that the chlamydial plasmids
segregate into tight groupings reflecting the phenotypes of their host
bacteria. The LGV plasmids are the most distantly related, with the STI
and ocular strain plasmids having apparently diverged at a later time.
Comparative phylogenetics of complete genomes and plasmids
A comparison of the phylogenies of the genomes and plasmids is given in Figure 1.
Of the serotype A, B, L2 and L2b strains, for which both complete
genome and plasmid sequence sets exist, and serotype D for which there
are independent complete genome and plasmid sequences, the phylogenies
mirror each other. The complete genomes and plasmids from the LGV
strains cluster tightly, as do those from the ocular strains, and the
STI strains branch from these and also cluster together.
These data suggest that the chlamydial plasmids have not been freely
exchanged, but have remained closely linked to their cognate host
chromosome (Figure 1).
Analysis of the new variant plasmid
The plasmid with the most variation (pSW2) was found in strain
Sweden2. However, pSW2 still belongs to the genital tract lineage and
therefore has not appeared as result of a transfer event. The complete
nucleotide sequence of pSW2 comprises 7,169 bp, 333 bp smaller than the
7,502 bp plasmid (pSW3) from strain Sweden3. Sweden3 could be
hypothesised to be the potential progenitor strain of Sweden2 because
they have identical sequences of the chromosomal ompA gene.
The difference in size between pSW2 and pSW3 is accounted for by a
deletion of 377 bp and a duplication of 44 bp at a different locus
(Figure 3).
pSW2 is the smallest chlamydial plasmid described, some 200 bp smaller
than the previously smallest known chlamydial plasmid, pCpnE1 (7,369
bp), which is from an equine strain of C. pneumoniae [29,18].
The 377 bp deletion within pSW2 is situated within CDS1,
creating a frameshift which shortens the predicted protein from 305 to
178 amino acids and removing a primer binding site for the diagnostic
nucleic acid amplification tests (NAATs). This region of the plasmid
was originally selected as the target for several commercial diagnostic
NAATS and the new variant strain became established because infection
by this strain went undetected and hence untreated [9]. Plasmid pL2 from strain L2/434/BU contains a single nucleotide deletion within CDS1 (position 910) [15,30]
leading to a truncated CDS1 protein (260 amino acids). pCpnE1 also has
a deletion within CDS1, but the location is different to that within
pSW2 and the effect is to create two small putative CDSs, which are
unlikely to be functional.
The observation of CDS1 as a region of the plasmid apparently prone
to inactivation and therefore potentially dispensable may be explained
by the possible functional redundancy between the proteins encoded by
CDS1 and CDS2. These proteins have similar sizes (305 and 332 amino
acids respectively), share 35% amino acid sequence identity, and both
match to the Pfam domain PF00589 (CDS1 e-value of 0.003, CDS2 e-value
of 4.9e-39), suggesting some functional equivalence [29].
A second difference between the pSW2 and the other C. trachomatis plasmids
is a 44 bp perfect tandem duplication, located immediately upstream of
both CDS2 and CDS3, which are divergently transcribed. The
transcription start points (tsp) for CDS2 and CDS3 (encoding a
homologue of DnaB, a protein involved in forming the replication
complex) have been mapped previously [31]
and are both located within this 44 bp section. The duplication of the
tsp could potentially boost CDS2 expression. Both the deletion in CDS1
and the 44 bp duplication are unique to pSW2, and no intermediate
plasmid carrying either of the mutations separately has been
identified. This could indicate that these changes are related events,
and that potential up-regulation of CDS2 may compensate for the loss of
a functional product from CDS1.
Candidate plasmid regions for improved diagnostic targets
When all C. trachomatis plasmid CDSs from the eleven
complete nucleotide sequences were compared, CDS2 was found to be the
most highly conserved. Although there are eleven SNPs within the coding
sequence, only one results in an amino acid change (Figure 2),
suggestive of a functional requirement. This SNP (Met-Leu, position
1,147), present in pSW2 and pSW3, is at the extreme carboxy terminus of
the protein. A further constraint on variation within CDS2 is the
presence of two short RNA molecules (225 and 415 nucleotides), which
are complementary to the 3' terminus of the primary transcript encoding
CDS2 [32].
These two short 'antisense' transcripts are differentially expressed
during the developmental cycle. This level of sequence conservation,
possibly tied to an essential function, suggests that the region of the
plasmid encompassing CDS2 would be a good target for future screening.
CDS6, CDS7 and CDS8 also show high levels of amino acid
conservation. CDS6 (unknown function) is the smallest plasmid encoded
protein and contains a single SNP. The proteins encoded by CDS7 and
CDS8 display homology to proteins involved in the process of plasmid
partitioning [29],
and have been shown to be active at cell division. These proteins may
play an important role for these relatively low copy number plasmids,
ensuring that each daughter cell acquires an equal number of plasmid
copies.
The protein predicted to be encoded by CDS5, previously designated
ORF 5 (pgp3) has the largest number of non-synonymous SNPs. There are
14 SNPs, evenly spread throughout CDS, resulting in ten amino acid
changes (Figure 2).
SNP 5,112 differentiates LGV plasmids from the trachoma plasmids and
SNP 5,114 is unique to the blinding trachoma isolates (using pSW3 as
the reference sequence). The protein encoded by CDS5 (pgp3) has been
located to the cell surface and it has recently been suggested that the
CDS5 product can be secreted from inside the inclusion, to the
cytoplasm of Chlamydia-infected cells [33].
Thus the higher number of non-synonymous changes in this CDS could
result from immune selection giving rise to more variation.
The area of the plasmid from the stop codon of CDS8 to the start
codon of CDS1 has the highest density of intergenic SNPs, as well as
apparent deletions, making the region the most susceptible to mutation
within the C. trachomatis plasmids. Thus the area around the
replication origin is the most variable and is a poor region in which
to design diagnostic PCR primers.
Analysis of plasmid copy number
The sequencing of pSW5 revealed that this plasmid carries one 22 bp
repeat fewer than the others at the putative origin of replication. To
test whether this affects plasmid copy number, DNA from several strains
was subjected to quantitative PCR. The results showed that, where loss
of the repeat sequence had occurred, plasmid copy number was not
adversely affected, with plasmid/genome (P/G) ratios in the range of
2–6 (Figure 4).
Interestingly, it appears that genital strains have a slightly lower
plasmid copy number than the others, and strain Jali20 has the highest
P/G ratio.