Details on the availability of reagents can be found in the Supplementary Information. All analyses described here were performed on Version 2.0 of the genome sequence. Updates to the sequence and annotation are available at http://dictybase.org and http://www.genedb.org/genedb/dicty/index.jsp. Further details of analyses not explicitly described below can be found in the Supplementary Information.
A short-range (~100kb), high-resolution (+/−8.54kb) mapping panel was prepared as described9
. Briefly, 96 aliquots each containing ±0.52 haploid genome equivalents of sheared AX4 genomic DNA were pre-amplified by PEP (primer extension pre-amplification89
). A total of 4913 STS markers (Table SI 1) were typed by 2-phase hemi-nested PCR (multiplexed for up to 1200 markers in the first phase) on aliquots of the diluted PEP products. Maps were assembled from good-quality data essentially as described previously8
. A second, longer-range (±150kb) mapping panel was used to confirm some linkages on chromosomes 2 and 5. HAPPY map analysis and PCR primer design for HAPPY mapping was performed using various custom programmes (PHD and ATB unpublished).
Genomic DNA from D. discoideum
strain AX4 was prepared and separated by PFGE essentially as described27, 9
, except that gels were run in stacked pairs; one member of each pair was stained with ethidium bromide, and bands excised from its unstained counterpart by alignment.
WCS and YAC-subclone libraries
For WCS libraries, gel slices (above) were disrupted by several passages through a 30G syringe needle, digested with beta-agarase (NEB) and phenol-extracted. DNA was concentrated by ethanol precipitation, sonicated, end-blunted using mung bean nuclease and size-fractionated on 0.8% low melting-point agarose gels. Fractions of 1.4-2kb and 2-4kb were excised, DNA extracted as before and ligated into the Sma
I site of pUC18 or pUC19. Clone propagation and template preparation followed standard protocols.
For YAC subclone libraries, AX4-derived YACs were identified (and their position and integrity confirmed) by screening the set described by Loomis et al22 using markers from the HAPPY map. Subclones were prepared from PFG-purified YACs essentially as for the WCS libraries; contaminating yeast-derived sequences were filtered out in silico.
Sequencing and assembly
Details of the sequencing and assembly methods can be found in Supplementary Information. Generally, mapped sequence features were used to nucleate sequence contigs assembled from the WCS data, and extended using read-pair information and iterative searches for overlapping sequences, followed by directed gap-closure using a range of approaches.
Fluorescent in situ hybridisation
hybridization was performed as in reference 17
Gene prediction and identification of sequence features
Full details are provided in the Supplementary Information. Briefly, automated gene prediction was performed using a combination of programmes which had been trained on well-characterized D. discoideum
genes, and the results integrated with reference to D. discoideum
cDNA sequences and homology to genes in other species. Other features in the predicted proteins, and other sequence features, were identified using a variety of software packages.
Analysis of functional gene clustering
Microarray targets54, 90, 91
; and N. Van Driessche & G. Shaulsky unpublished) and gene models were mapped onto the genome sequence using BLAST92
and the modified LIS algorithm93
. To look for clustering of genes with correlated temporal expression profiles,
pairwise correlation coefficients were calculated for genes with known expression profiles on each chromosome91
. Blocks of ≥6 consecutive genes were sought for which either (a) all pairwise correlation coefficients were positive and ≥70% were >0.2 (genes with similar developmental trajectories) or (b) each gene had a partner with an absolute correlation coefficient value of >0.6 (tightly co-regulated genes); no statistically significant clusters met these criteria.
To look for clustering of genes associated with specific developmental stages94, 95 or cell types90, 96, the genome was scanned with various sized windows97 for regions with significant (p
Analysis of duplicated genes
Predicted protein sequences were clustered using TribeMCL98
, using a BLAST-P expectation of −40 as a cutoff. A χ-squared test invalidated the hypothesis that members of a family are randomly distributed in the genome. Within each family, protein divergences (similarity distances computed using the ‘Protdist’ module of PHYLIP; http://evolution.genetics.washington.edu/phylip.html
) and physical intergenic distances between all pairs of family members were tabulated, and the correlation coefficient between the former and latter values was calculated. Analysis was performed on the 86 gene families (representing 155 gene pairs) with at least 10 intrachromosomal distance pairings to provide robust statistical confidence.
Other sequence analysis and graphical representation
Other sequence analyses (nucleotide and dinucleotide composition; identification of simple-sequence repeats in nucleotide and protein sequence; coding density computation; tRNA cluster identification) was performed using a range of custom software (PHD and ATB unpublished). Graphical representation of chromosomes in Fig. 2
was done primarily using Cinema4D-8.5 (Maxon Computer GmbH) after pre-processing using custom software (PHD).