Investigation into the genetic relationships between dog breeds is an area of explosive recent growth that holds great promise. Initial studies of breed relationships were highly focused. In a study of five Finnish breeds, Koskinen et al. reported that the phylogenetic distances between breeds were greater than those typically seen between human populations [60]. In addition, individual dogs from these five breeds could be correctly assigned to their breed of origin by analyzing allele patterns associated with small numbers of microsatellites [61]. Irion et al. went a step further using 100 microsatellites and 28 breeds [62]. Their analysis methods, based largely on neighbor-joining trees, revealed the important fact that little higher order structure could be found to describe the relationships between most breeds.
A subsequent and larger study from our own group that entailed genotyping five unrelated dogs from each of the 85 breeds with 96 microsatellite markers was undertaken in 2004 [63]. Assignment tests using the computer program Doh demonstrated that dogs could be correctly assigned to their breed of origin 99% of the time [63]. The majority of variation observed in the dog rests in the differences that separate breeds. In fact, 27% of genetic variation that exists in the dog is found when comparing breeds, whereas 5%–10% of all human variation is found between populations and races [63]. To determine how to best harness the power of canine population structure for mapping, studies of linkage disequilibrium (LD) in the dog have been undertaken [1,64–66]. Hyun et al. found LD extended for 33 centimorgans around the copper toxicosis disease locus in Australian Bedlington terriers. The authors suggest, however, that because the study focused on a single disease locus, already identified by linkage studies, the conclusions were probably not readily generalized to the rest of the genome. The study of Lou et al. is based on analysis of a 10 centimorgan microsatellite scan in a single crossbred pedigree, and identified LD spanning five to ten centimorgans. A particular strength of the paper is its clear description of the nuances of analyzing LD using multiallelic markers. The authors, however, suggest that since only a single pedigree was analyzed, much larger studies need to be undertaken involving larger numbers of both dogs and breeds to develop a clear picture of LD in the dog.
There are over 150 breeds recognized in the United States by the American Kennel Club. The top ten most popular breeds account for more than half of all registrations, while more than 100 of the more uncommon breeds account for less than 15% of the total (Figure 2). This range in population sizes is representative of a variety of breed histories. The LD studies of Sutter et al. and Lindblad-Toh, Wade, and collaborators were designed to be more widely applicable to the general population of purebred dogs [1,66]. Sutter and colleagues used 189 single nucleotide polymorphisms (SNPs) to examine 20 unrelated dogs from each of five breeds at five loci. They found a 10-fold difference in extent of LD in breeds that range from rare to popular, and whose population histories feature a range of popular sire and bottleneck effects [66]. These results were corroborated and extended in a much larger study by Lindblad-Toh et al. Using ten breeds and nearly 1,300 SNPs, the investigators were able to dissect the underlying haplotype structure of the dog genome in addition to measuring the extent of LD [1]. Both studies conclude that breed choice will have a profound effect on the number of markers required to complete whole genome association studies, and care should be taken when selecting breeds for the initial mapping stage. In addition, because of breed architecture, considerably fewer SNPs will be needed for mapping traits in dogs than in humans [1,66].
Nearly a million dogs are registered with the American Kennel Club each year. Though the total includes dogs from 154 breeds, most registrations represent a limited number of very popular breeds. The most popular breed, Labrador retriever, accounts for 15.3% of yearly registrations. This is greater than the 118 least popular breeds combined. Each breed on the chart above is represented by a colored block. The height on the y-axis indicates the number of dogs registered in 2004. The blocks are divided into six stacks indicating the percent of overall registrations acquired by that breed, as listed on the x-axis. Above each column is the percent of total registrations for all breeds in that category. Registration statistics can be found at http://www.akc.org/reg/dogreg_stats.cfm.
One additional way to improve power for fine mapping is to combine data across breeds. To determine the ancestral relationship between breeds, Parker et al. used the same dataset as described previously to perform an unsupervised clustering analysis with the computer program Structure [63]. The 85 breeds were ordered into four clusters, generating a new canine classification system for dog breeds based on similar patterns of alleles, presumably from a shared ancestral pool (Figure 3) [4]. Cluster one comprised dogs of Asian and African origin, as well as gray wolves. Cluster two was made up of mastiff-type dogs, largely sharing a common theme of big, boxy heads and strong, sturdy bodies. The third and fourth clusters split a group of herding dogs and sight hounds away from the general population of modern hunting dogs including terriers, hounds, and gun dogs.
The dataset includes five unrelated dogs from each of the 85 breeds that have been genotyped using 96 (CA)n repeat-based microsatellites that spanned the dog genome at an average density of 30 megabases. Clusters were obtained using the computer program Structure [69], which implements a Bayesian model–based clustering algorithm that attempts to identify genetically distinct subpopulations based on patterns of allele frequencies. The work is described in detail in [63]. Four distinct clusters described by Parker et al. are depicted as colored circles: cluster one is yellow, cluster two is blue, cluster three is green, and cluster four is red. Breeds associated with each cluster are listed within the appropriate circle, and examples of breeds are shown in the pictures. Some breeds show patterning similar to more than one cluster, and are listed in the overlapping space. Analysis is ongoing to expand the number of breeds in the dataset and to refine the clusters.
The Parker clusters offered the first look at relationships between breeds, and in doing so, suggest study designs for trait mapping. For example, Modiano and colleagues have sought to determine the origin of B and T cell lymphomas in dogs [67]. They found that while B cell lymphomas are most common overall, rates of T cell lymphoma are significantly higher in breeds from the Parker cluster one, the Asian cluster, than any other group. This suggests an ancestral cause of T cell lymphoma in Asian dogs, while arguing against a single ancestor for B cell lymphoma in any other group. The optimal mapping study for T cell lymphomas would, therefore, focus on dogs from the Asian group. Also of interest is the work of Neff and colleagues who describe a single haplotype surrounding the multidrug-resistant gene MDR1 in nine breeds [68]. The nine breeds represented a range of herding dogs and sight hounds that presumably shared a single common ancestor, and again suggests a strategy for mapping studies involving this set of breeds.
While understanding the relationships between breeds will assist in minimizing the task of mapping multigenic diseases, moving from locus to gene remains a daunting task [66]. Both Sutter et al. and Lindblad-Toh, Wade, and collaborators have undertaken studies to determine how haplotype analysis can facilitate such efforts [1,66]. Using their respective datasets, both studies demonstrate high haplotype sharing between breeds and low haplotype diversity within breeds. Thus, disease alleles will be most easily identified by comparison of haplotypes that are identical by descent in affected dogs from two or more breeds. Data from additional breeds can then be used for fine resolution mapping. The recent availability of 2.1 million SNPs (http://www.broad.mit.edu/mammals/dog/snp/) from the canine genome sequencing project will greatly enhance such studies [1].