Microbial pathogens show surprising capacity for adaptation to new hosts, antibiotics, or immune systems. Three principal mechanisms are regarded as important in this adaptive potential: Darwinian, or positive selection, favoring the fixation of advantageous mutations; acquisition of new genetic material by lateral DNA exchange (that is, recombination); and gene regulation. Several studies have suggested that recombination might be the key factor in adaptation of pathogens and that the recombination rates of bacteria might be higher than their mutation rates [1-4]. At the same time, there is a portion of the genome - the core-genome - that is thought to be representative of bacterial taxa, at various taxonomic levels . Recent molecular evolution analyses of Escherichia coli and Salmonella enterica [6,7] have identified genes under positive selection pressure in the core-genome of these enteric bacteria. Genome sequence data are now available for numerous species of several genera of bacteria, providing the possibility of using comparative evolutionary genomic approaches to assess positive selection pressure and the role of horizontal gene transfer in the evolution of the core-genome of a bacterial genus.
One such important bacterial genus is Streptococcus, which includes some of the most important human and agricultural pathogens, causing a wide range of different diseases, and inflicting significant morbidity and mortality throughout the world, as well as resulting in significant economic burden. Twenty six genomes of Streptococcus are available on public databases belonging to six different species, including S. pneumoniae, S. agalactiae, S. pyogenes, S. thermophilus, S. mutans and S. suis. S. pyogenes (Group A Streptococcus; GAS), is responsible for a wide range of human diseases, including pharyngitis, impetigo, puerperal sepsis, necrotizing fasciitis ('flesh-eating disease'), scarlet fever, the postinfection sequelae glomerulonephritis and rheumatic fever. In addition, S. pyogenes has recently been associated with Tourette's syndrome and movement and attention deficit disorders . A resurgence of S. pyogenes infections has been observed since the mid-1980s. S. agalactiae is another important human pathogen and is the leading cause of bacterial sepsis, pneumonia, and meningitis in US and European neonates . Although S. agalactiae normally behaves as a commensal organism that colonizes the genital or gastrointestinal tract of healthy adults, it can cause life threatening invasive infection in susceptible hosts, such as newborns, pregnant women, and nonpregnant adults with chronic illnesses . S. agalactiae was first recognized as a pathogen in bovine mastitis . S. pneumoniae is the leading cause of human bacterial infection worldwide , although paradoxically, is primarily carried asymptomatically. It has been an object of medical study and scrutiny for over a century. S. mutans is implicated as the principal causative agent of human dental caries (tooth decay) . S. thermophilus is a non-pathogenic, food microorganism, widely used in the dairy product industry. S. suis is responsible for a variety of diseases in pigs, including meningitis, septicemia, arthritis, and pneumonia . It is also a zoonotic pathogen that causes occasional cases of meningitis and sepsis in humans, but has recently also been implicated in outbreaks of streptococcal toxic shock syndrome .
A recent comparative genomic analysis of five of these above mentioned streptococcal species (S. suis not included), focused on understanding the role of lateral gene transfer in shaping the genomes of each of these lineages, and analyzed some of the species specific genes for potential adaptive evolution . Species or strain specific loci are often the focus of attempts to understand adaptive differences in bacteria. However, with the exception of the Chen et al.  study on E. coli, assessments of adaptive evolution in the core-genome components of other bacterial species have not been thoroughly explored. In addition to individual genome sequences for several species of Streptococcus, there are also complete genome sequences available for multiple strains of S. agalactiae, S. pyogenes, and S. thermophilus. Genome wide molecular selection analyses, designed to assess selection pressure across the entire core-genome of different species and strains of Streptococcus have not been reported, and also no published reports have attempted to address the relative role of selection versus recombination in the diversification of the core-genome of Streptococcus.
Along with the burgeoning increase in microbial genome sequence data there has been a concomitant development of sophisticated methods for detecting positive selection in protein coding genes. These methods can be used to compare orthologous DNA sequence data across the entire genomes of the available species within the genus Streptococcus. Ziheng Yang, Rasmus Nielsen and colleagues [17-21] have developed powerful statistical methods for detecting adaptive molecular evolution. Their methods compare synonymous and nonsynonymous substitution rates in protein coding genes and regard a nonsynonymous rate elevated above the synonymous rate as evidence for positive or Darwinian selection. Positive natural selection leads to the fixation of advantageous mutations driven by natural selection, and is the fundamental process behind adaptive changes in genes and genomes, leading to evolutionary innovations and species differences. A significant advancement on many earlier methods, which averaged over sites and time, their methods are designed to detect positive selection at individual sites and lineages . Our study employs these powerful selection methods to assess positive selection pressure across the core-genome components of the genus Streptococcus, as well as several species of Streptococcus, while concomitantly assessing levels of recombination within the core-genome.
Concomitant with the identification of bacterial core-genomes, it has become evident that there is an apparently dispensable portion of bacterial genomes, consisting of partially shared and strain-specific genes that can, even within a particular species, represent a surprisingly large proportion (for example, ). The concept of dispensable portions of genomes implies that genes have been lost and gained since separation from common ancestors, which in turn implies that this loss and gain can be estimated from reconstructed genome composition. This sort of approach has been undertaken previously, including for a few species of Streptococcus , with one of the resulting conclusions being that gene gain tends to be much greater than gene loss. An additional purpose of this paper is to compare gene gain and loss within and between Streptococcus species, making use of the larger comparative data set of species and strains now available, and to compare that history with histories of positive selection and recombination in the core-genome.