BAC fingerprint-based assembly of the physical map
A BAC library of Vitis vinifera 'Cabernet Sauvignon' was
used for the construction of the physical map. It contained 44,544
clones with a mean insert size of 142 kbp representing about 12.3
genome equivalents [35,38].
A total of 77,237 BAC end sequences (BES) with a mean size of 671 bp
were obtained from sequencing both ends of 44,544 BAC clones, and were
retained after quality check [38].
These sequences randomly covered approximately 0.1 × of the grape
genome. This library has been adopted as one of the reference genomic
resources by the International Grape Genome Program network [55] and BAC clones are freely available upon request [56].
BAC clones were fingerprinted following the protocol published by [13] and adapted to grape as preliminarily reported by [11]. Briefly, DNA was isolated from each BAC clone, digested with four rare cutter endonucleases (EcoRI, BamHI, NdeI, and XbaI) and a frequent cutter (HaeIII).
Fragments were labelled with the SNaPshot labelling kit (Applied
Biosystems, Foster City, CA), purified using genCLEAN plates (Genetix,
St James, NY, USA) and re-suspended in formamide. Fragments were
separated by capillary electrophoresis on an ABI3730 automated
sequencer (Applied Biosystems, Foster City, CA) and sized using the
Genescan LIZ-500 internal size standard. The electrochromatograms were
analyzed using GeneMapper 3.5 (Applied Biosystems, Foster City, CA). An
output text file containing data of area, height, and size of each peak
was generated and edited using a homemade perl script. The peaks
corresponding to the background, to the vector, and the peaks shorter
than 75 bp or longer than 500 bp were removed. Intra-plate
contaminations and possible chimaeras were inspected and discarded by
three checks. A first set was identified with the GenoProfiler package [57].
Second, all the clones that yielded more than 210, 230, and 250 peaks
for inserts of 120, 140, and 160 kbp average size, respectively, were
removed. Third, the BES of neighbouring clones were compared. If two or
more neighbour clones presented a high identity for both BES (> 95%
identity over > 95% of the length of the shortest BES in the
pairwise alignment), the BAC clones harbouring the redundant BES were
removed using a home made perl script. When only one BES was available
for a given BAC clone and this BES was highly similar to a BES of a
neighbour clone, the physical map was examined to check if those two
clones were buried. If that was the case, the clone with only one BES
available was removed.
Trimmed data were assembled using the software FPC 8.2 [58].
The tolerance was set at 0.4 and the first automatic assembly was
performed under highly stringent conditions using a Sulston score of
1e-40. The contigs containing more than 10% of Q clones were split
using the DQing option, with three rounds of analysis at progressively
lower cut-off of 1e-45, 1e-50, and 1e-55. The contigs were then
end-merged at a cut-off of 1e-35. Subsequently, the assembly was
carried out following an iterative strategy [50]
with alternate steps of end-merging and DQing, using progressively less
stringent cut-off for end-merging of 1e-30, 1e-25, and 1e-20. After the
last merging, singletons were inserted into the contigs using a cut-off
of 1e-30.
The BAC assembly was validated based on the information of the
molecular markers placed on each contig, and edited manually in three
steps. First, each contig that included BAC clones with unlinked
genetic markers was tentatively broken by re-assembling at a more
stringent cut-off. Second, if the contig remained putatively chimerical
at the most stringent cut-off of 1e-30, the primer pairs of the markers
were blasted against the reference genome 8.4× assembly of V. vinifera 'PN40024' [1]
to assess the number of possible annealing sites and to confirm the
chromosomal location. When the discrepancy between the genetic and the
physical map was confirmed for single copy markers, the corresponding
physical contig was considered chimerical. Finally, contigs containing
BAC clones associated with genetically linked markers, were tested for
merging at a less stringent cutoff of 1e-15.
Analysis of BAC-end sequences
Chloroplastic contamination of the library was assessed using the complete chloroplast genome of Vitis vinifera (Embl accession number DQ424856, [59])
as a query for BlastN search. A BES was considered a chloroplastic
sequence when a > 95% identity over > 100 bp was found by BlastN
or when > 95% identity over 33–133 amino acids and > 80% identity
over > 133 amino acids was found by tBlastX.
BES were masked for repetitive elements and then aligned to the
PN40024 genome sequence through a Blat analysis (90% of identity on 80%
of length, less than 5 hits) as reported in [1, Supplementary data].
The results were then filtered using homemade perl scripts, and a BAC
clone was considered as aligned to the genome sequence only if both
paired ends matched at a distance less than 300 kbp and with a
consistent orientation [1, Supplementary data]. Physical contigs were
filtered and a subset that met the following requirements was used for
the validation. First, all the contigs made up of less than 3 BACs were
discarded. Second, the contigs made up of 3 BACs that aligned to 2 or 3
different linkage groups were also discarded. A total of 846 physical
contigs passed these two steps of trimming. Then, a contig was
considered as chimeric if at least two BACs anchored onto a linkage
group and if at least two other BACs anchored onto another linkage
group.
Physical localisation of markers and genes on the BAC clones
Primer pairs for microsatellite markers present in the genetic map of [37] and for genes relevant to this study were scored on BAC pools according to the protocol described by [38].
Microsatellite markers present in the reference genetic map were used
to integrate the physical map and the genetic map, randomly across the
genome.
The primers used for PCR screening of the BAC library for non-host,
host, and signalling resistance genes are described in [Additional file
5] and were developed as follows. We first selected from Arabidopsis thaliana and Nicotiana benthamiana 30
genes with proven functions in the above mentioned categories, and the
corresponding proteins were downloaded from NCBI [see Additional file 5].
The amino acid sequences were used for tBLASTn search of the grape ESTs
at the TIGR and NCBI databases as of September 1, 2006. Grape ESTs were
found for all of the genes, and the EST showing the highest identity
with the corresponding gene was retained. Selected ESTs were then
compared by BLASTn against the 6× genome assembly of PN40024 available
at that time to deduce the corresponding gene model. PCR primers were
preferentially designed on a single exon, and over two sequence arrays
without SNPs between the EST and the PN40024 sequence [see Additional
file 5].
In addition, a tBlastN analysis was performed on the BES and the
results were parsed with the following: E value < 1e-04 for all the
queries excepted for those corresponding to large multigene families
like EDR1, MPK4, SIPK, and WIPK, where the E value < 1e-20, and PBS1 where the E value < 1e-50.
The primers used for screening the BAC library for the NBS-LRR and RLK genes were described in [34]: 33 primers (series rgVamu and rgVrip) were originally developed from Vitis amurensis and Vitis riparia, 26 primers (series GLP and MHD) were designed on a Muscadinia rotundifolia × Vitis vinifera hybrid, 7 primers (series rgVvin and UDV-) were from Vitis vinifera 'Cabernet Sauvignon', and four primers were designed on grape sequences with the highest similarity to the Pto (tomato) and Xa21 (rice) resistance genes (stkVa008, stkVa036, stkVa043, and stkVr011).
We also used 182 grapevine sequences for NBS-LRR proteins available
at NCBI in October 2004 as queries for tBLASTx of the BES. Of these,
131 queries were gene fragments spanning the NBS domain and already
presented in [34] GenBank accession no. AY427077–AY427135, AY427152–AY427194, AF369813–AF369837, AF365879–AF365881,, and AF365851 were ESTs that mostly covered the 3'-end of the LRR domain. The BES were also queried using 27 RLK gene fragments (AY427136–AY427151 and AY427195–AY427205).
Development of additional SSR markers in genomic regions potentially involved in disease resistance
Some contigs containing candidate genes for resistance to pathogens
were a ssigned to chromosomes by the SSR markers of the reference
genetic map. A search for microsatellites was performed in the BES of
all BAC clones included in these non-assigned contigs with a modified
version of Sputnik [60],
and used for the design of contig-specific SSR markers. When no useful
SSR was found in the neighbour BES, the primer pairs of the gene
previously used to physically localise the gene in the BAC clones were
used for a BlastN search in the 6× assembly of PN40024. If a unique and
perfect match was found with a distance between the primer sites
compatible with the amplicon size obtained from the BAC clones, a
search for additional SSRs was performed in sequence contigs of PN40024
within a 40-kbp interval around the gene [see Additional file 5]. The new markers were genetically mapped in the progeny 'Chardonnay' × 'Bianca' and 'Cabernet Sauvignon' × hybrid '20/3' [34]. Segregation data were added to the previous dataset as described in [34]. Marker positions were projected on the Vitis reference map by map alignment using common markers.
Abbreviations
BAC: Bacterial Artificial Chromosome; BES: BAC End Sequence; ETI:
Effector Triggered Immunity; HICF: High Information Content
Fingerprinting; JA: Jasmonic Acid; MAPK: Mitogen Activated Protein
Kinase; NBS/LRR: Nucleotide Binding Site/Leucine Rich Repeat; PAMP:
Pathogen Associated Molecular Patterns; PTI: Pathogen Triggered
Immunity; QTL: Quantitative Trait Loci; RLK: Receptor Like Kinase; SA:
salicilic Acid.
Authors' contributions
MaM physically mapped non-host and defence signalling genes and
developed new SSR markers for contig anchorage, integrated the results
from all contributors, and drafted the manuscript. SP constructed the
physical map with the help of ILC, AC, CC, and VDB, realised the
preliminary analysis of BES with LF, and participated in drafting the
manuscript. RM screened the BAC library for host resistance genes. AC
and ILC screened the BAC library for reference SSR markers. LF wrote
the perl scripts for the BES analysis and maps visualisation. CG and VB
developed databases for the following of experiments and the storage of
the results. GDG and RT implemented genetic maps with new markers,
participated in the interpretation of the results and in drafting the
manuscript. SS and MiM contributed to map assembly. A–FA–B conceived
this study, coordinated the construction of the physical map and
finalised the manuscript. All authors read and approved the final
manuscript.
Acknowledgements
Authors thank Pascal Audigier for technical work, Philippe Grevet
and Sebastien Reboux for taking care of the informatic systems, and
Antonella Pfeiffer for genetic mapping of the SSR markers developed to
anchor the physical contigs containing non-host resistance genes not
previously integrated by Doligez's markers and Courtney Coleman for
proofreading the manuscript.