This manuscript (permalink) was automatically generated from ch728/cucurbit-usda@506f93d on December 11, 2021.
Christopher Owen Hernandez
0000-0002-1668-7121
Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, 14853 USA; H. Rouse Caffey Rice Research Station, Louisiana State University, Rayne, LA, 70578 USA
Joanne Labate
Plant Genetic Resource Conservation Unit, United States Department of Agricultural Research Service, Geneva, NY, 14456 USA
Bob Jarret
Plant Genetic Resource Conservation Unit, United States Department of Agricultural Research Service, Griffin, GA, 30223 USA
Kathleen Reitsma
North Central Regional Plant Introduction Station, Iowa State University, Ames, IA, 50014 USA
Jack Fabrizio
0000-0002-1552-6924
Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, 14853 USDA
Kan Bao
Boyce Thompson Institute, Cornell University, Ithaca, NY, 14853 USA
Zhangjun Fei
0000-0001-9684-1450
Boyce Thompson Institute, Cornell University, Ithaca, NY, 14853 USA
Rebecca Grumet
Department of Horticulture, Michigan State University, East Lansing, MI, 48824 USA
Michael Mazourek
0000-0002-2285-7692
Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, 14853 USA
The Cucurbita genus is home to a number of economically and culturally important species. We present the analysis of genotyping-by-sequencing data generated from sequencing the USDA germplasm collections of Cucurbita pepo, Cucurbita moschata, and Cucurbita maxima. These collections include a mixture of wild, landrace, and cultivated specimens from all over the world. Roughly 4,000 - 40,000 quality SNPs were called in each of the collections, which ranged in size from 314 to 829 accessions. Genomic analyses were conducted to characterize the diversity in each of the species and revealed extensive structure corresponding to a combination of geographical origin and morphotype/market class. GWAS was conducted for each data set using both historical and contemporary data, and signals were detected for several traits, including the bush gene (Bu) in C. pepo. These data represent the largest collection of sequenced Cucurbita and can be used to direct the maintenance of genetic diversity, develop breeding resources, and to help prioritize whole-genome re-sequencing for further GWAS and other genomics studies aimed at understanding the phenotypic and genetic diversity present in Cucurbita.
The Cucurbitaceae (Cucurbit) family is home to a number of vining species mostly cultivated for their fruits. This diverse and economically important family includes cucumber (Cucumis sativa), melon (Cucumis melo), watermelon (Citrullus lanatus), and squash (Cucurbita ssp.) [1]. Like other cucurbits, squash exhibit diversity in growth habit, fruit morphology, metabolite content, disease resistance, and have a nuanced domestication story [2,3]. The genomes of Cucurbita ssp. are small (roughly 500 Mb), but result from complex interactions between ancient genomes brought together through an allopolyploidization event [4]. These factors make squash an excellent model for understanding the biology of genomes, fruit development, and domestication. Within Cucurbita, five species are recognized as domesticated. Three of these are broadlycultivated: Cucurbita maxima, Cucurbita moschata, and Cucurbita pepo [1]. Few genomic resources have been available for working with these species; although, draft genomes and annotations, along with web-based tools and other genomics data are emerging [5]. Already, these resources have been used to elucidate the genetics of fruit quality, growth habit, disease resistance, and to increase the efficiency of cucurbit improvement [6,7,8,9,10,11]; however, there has yet to be a comprehensive survey of the genetic diversity in large diverse Cucurbita germplasm panels, such as those maintained by the USDA within the Germplasm Resources Information Network (GRIN) system.
Germplasm collections play a vital role in maintaining and preserving genetic variation. These collections can be mined by breeders for valuable alleles and can also be used by geneticists and biologists for mapping studies [12]. Like many other orphan and specialty crops,there has been little effort put into developing community genetic resources for squash and other cucurbits. The Cucurbit Coordinated Agricultural Project (CucCap project) was established to help close the knowledge gap in Cucurbits. This collaborative project aims to provide genomics resources and tools that can aid in both applied breeding and basic research. The genetic and phenotypic diversity present in the USDA watermelon and cucumber collections has already been explored as part of the CucCap project, partially through the sequencing of USDA germplasm collections and development of core collections for whole-genome sequencing [13,14]. The diverse specimens of the USDA squash collections have yet to be well characterized at the genetic level; although, an elaborate system has been established for classifying squash based on species and various other characteristics.
The classification system used in squash is complex. Squash from each species can be classed as winter or summer squash depending on whether the fruit is consumed at an immature or mature stage, the latter is a winter squash [15]. Squash are considered ornamental if they are used for decoration, and some irregularly shaped, inedible ornamental squash are called gourds; however, gourds include members of Cucurbita as well as some species from Lagenaria—not all gourds are squash [16]. Many squash are known as pumpkins; the pumpkin designation is aculture dependent colloquialism that can refer to jack O’ lantern types, squash used for desserts or, in some Latin American countries, to eating squash from C. moschata known locally as Calabaza [1]. Cultivars deemed as pumpkins can be found in all widely cultivated squash species. Unlike the previous groupings, morophotypes/market classes are defined within species.For example, a Zucchini is reliably a member of C. pepo and a Buttercups are from C. maxima. Adding to the complexity of their classification, the Cucurbita species are believed to have arisen from independent domestication events and the relationships between cultivated and wild species remains poorly understood [17].
C. pepo is the most economically important of the Cucurbita species and is split into two different subspecies: C. pepo subsp. pepo and C. pepo subsp. ovifera [10]. Evidence points to Mexico as the center of origin for pepo and southwest/central United States as the origin of ovifera. The progenitor of ovifera is considered by some to be subsp. ovifera var. texana, whereas subsp. fraterna is a candidate progenitor for pepo [17]. Europe played a crucial role as a secondary center of diversification for pepo, but not ovifera [18]. Important morphoptypes of pepo include Zucchini, Spaghetti squash, Cocozelle, Vegetable marrow, and some ornamental pumpkins. C. pepo subsp. ovifera includes summer squash from the Crookneck, Scallop, and Straightneck group, and winter squash such as Delicata and Acorn [19].
The origin of C. moschata is more uncertain than C. pepo; it is unclear whether C. moschata has a South or North American origin [3]. Where and when domestication occurred for this species is also unknown; however it is known that C. moschata had an India-Myanmar secondary center of origin where the species was further diversified [4]. C. moschata plays an important role in squash breeding as it cross-fertile to various degrees with C. pepo and C. maxima, and can thus be used as a bridge to move genes across species [4]. Popular market classes of C. moschata include Cheese types like Dickenson, which is widely used for canned pumpkin products, Butternut (neck) types, Japonica, and tropical pumpkins known as Calabaza [1].
C. maxima contains many popular winter squash including Buttercup/Kobocha types, Kuri, Hubbard, and Banana squash [1]. This species also sports the world’s largest fruit, the giant pumpkin whose fruit are grown for competition and can reach well over 1000 Kg [20]. Although this species exhibits a wide range of phenotypic diversity in terms of fruit characteristics, it appears to be the least genetically diverse of the three species described [17]. C. maxima is believed to have a South American origin, and was likely domesticated near Peru, with a secondary center of domestication in Japan/China [nee_domestication_1990; [4]].
In this study, we set out to characterize the genetic diversity present in the USDA Cucurbita germplasm collections for C. pepo, C. moschata, and C. maxima. We present genotyping-by-sequencing data from each of these collections, population genomics analysis, results from genome-wide association using historical and contemporary phenotypes, and develop a core panel for re-sequencing.
All available germplasm were requested from USDA cooperators for C. maxima (534), C. moschata (314), and C. pepo (829) respectively. Seeds were planted in 50-cell trays and two 3/4 inch punches of tissue (approximately 80-150 mg) was sampled from the first true leaf of each seedling. DNA was extracted using Omega Mag-Bind Plant DNA DS kits (M1130, Omega Bio-Tek, Norcross, GA) and quantified using Quant-iT PicoGreen dsDNA Kit (Invitrogen, Carlsbad, CA). Purified DNA was shipped to Cornell’s Genomic Diversity Facility for GBS library preparation using protocols optimized for each species. Libraries were sequenced at either 96, 192, or 384-plex on the HiSeq 2500 (Illumina Inc., USA) with single-end mode and a read length of 101 bp.
SNP calling was conducted using the TASSEL-GBS V5 pipeline [21]. Tags produced by this pipeline were aligned using the default settings of the BWA aligner [22]. Raw variants were filtered using VCFtools [23]. Before filtering SNPs, samples with a total read depth of \(\geq 2\) standard deviations below the mean of all samples were removed before further analysis. Settings for filtering SNPs were as follows, minor allele frequency (MAF) \(\geq 0.01\), missingness \(\leq 0.5\), and biallelic. Three outlier genotypes were found in an initial PCA analysis of the C. maxima data and were removed, as they were likely not C. maxima. Variants were further filtered for specific uses as described below.
ADMIXTURE [24], which uses a model-based approach to infer ancestral populations (\(k\)) and admixture proportions in a given sample, was used to explore population structure in each dataset. ADMIXTURE does not model linkage disequilibrium; thus, marker sets were further filtered to obtain SNPs in approximate linkage equilibrium using the “–indep-pairwise” option in PLINK [25] with \(r^2\) set to 0.1, a window size of 50 SNPs, and a 10 SNP step size . All samples labeled as cultivars were removed from the data prior to running ADMIXTURE. Cross-validation was used to determine the best \(k\) value for each species. Briefly, ADMIXTURE was run with different \(k\) values (1-20) and the cross-validation error was reported for each \(k\). The \(k\) value with minimal cross-validation error was chosen for each species (Supplemental Figures, Figure 5). Ancestral populations were then assigned to cultivars using the program’s projection feature.
Principal components analysis (PCA) was used as a model-free way of determining population structure. The original filtered marker data, not the LD-pruned data used for ADMIXTURE, were converted to a dosage matrix using VCFtool’s “–012” argument. A kinship matrix \(\mathbf{K}\) was created using the dosage matrix as input to the “A.mat()” function in Sommer [26]. PCA was conducted using the R function “princomp()” with \(\mathbf{K}\) supplied as the covariance matrix.
Historical data were obtained from the USDA Germplasm Resources Information Network (GRIN; http://www.ars-grin.gov) for C. maxima, C. pepo, and C. moschata. All duplicated entries were removed for qualitative traits, where categories are mutually exclusive, leaving only samples with unique entries for analysis. Contemporary phenotypic data were collected from a subset of the C. pepo collection grown in the summer of 2018 in Ithaca, NY. Field-grown plants were phenotyped for vining bush habit at three different stages during the growing seasons to confirm bush, semi-bush or vining growth habit. Plants that had a bush habit early in the season but started to vine at the end of the season were considered semi-bush.
Genomic heritability [27] (\(h_{g}^2\)) was calculated for all phenotypes. The parameter \(h_{g}^2\) was calculated for continuous traits using the formula \(h_{g}^2 = \frac{\sigma_{g}^2}{\sigma_{g}^2 + \sigma_{e}^2}\), where \(\sigma_{g}^2\) and \(\sigma_{e}^2\) are genetic and error variances estimated from a whole-genome regression of phenotype on marker data using ASReml-R . Multi-class categorical traits were converted to one or several different binary traits depending on the number of entries in each category. For binary traits, a Logit model was fit for the binary response and the heritability was estimated as \(h_{g}^2 = \frac{\sigma_{g}^2}{\sigma_{g}^2 + \frac{\pi^2}{3}}\) [28]. In addition to heritability, the amount of phenotypic variance explained by population structure (\(R_{pop}^2\)) was calculated from a multiple linear regression of phenotype on sturcture inferred by ADMIXTURE. The R function lm was used to regress continuous phenotypes on the \(\mathbf{Q}\) matrix obtained from ADMIXTURE. The R glm function was used with “family=binomial” to regress binary traits on population structure. As there is no \(R^2\) defined for logistic models, McFadden’s psuedo \(R^2\) was used to assess the correlation between binary traits and population structure [29].
Data were imputed prior to association analysis. LinkImpute [30], as implemented by the TASSEL [31] “LDKNNiImputatioHetV2Plugin” plugin was used for imputation with default settings. Any data still missing after this process were mean imputed. The GENESIS [32] R package, which can model both binary and continuous traits, was used for association. All models included the first two PCs of the marker matrix as fixed effects and modeled genotype effect (\(u\)) as a random effect distributed according to the kinship (\(\mathbf{K}\)) matrix (\(u \sim N(0, \sigma_{u}^2\mathbf{K})\)). Binary traits were modeled using the logistic regression feature in GENESIS.
All tools used in the analysis can be found on the Cucurbit Genomics website (http://cucurbitgenomics.org/). A candidate gene for dwarfism in C. maxima was elucidated by a previous study {[33]} and was named Cma_004516. The Cucurbit Genomics Database gene ID of Cma_004516 was identified by using the BLAST tool to align primer sequences used for RT-QPCR in the previous study against the C. maxima reference genome. The synteny analysis was done by using the Synteny Viewer tool and evaluating C. maxima’s chromosome 3 with C. pepo’s chromosome 10 and searching for an ortholog to the candidate gene. The physical position of the C. pepo ortholog was identified by searching the gene using the Search tool.
Subsets representative of each panel’s genetic diversity were identified through running GenoCore [34] on each of the filtered SNP sets. A subset of the C. pepo panel and key genotypes from the other two species were combined to form a core collection for the cucurbit community. Key genotypes were chosen to represent important market classes and for variation based on variation in traits. These genotypes will be further purified through two additional rounds of selfing and then resequenced using skim-sequecing to produce whole-genome data.
Each Cucurbita ssp. collection was genotyped using the Cornell Genotype by Sequencing (GBS) protocol. This resulted in 534 accessions for C. maxima, 314 for C. moschata, and 829 for C. pepo. Figure 1 shows the regional distribution of accessions broken down by species. C. maxima and C. moschata constitute the majority of accessions collected from Central and South America, whereas C. pepo accessions are more prevalent in North America and Europe. C. pepo had the highest number of raw SNPs (108,279) followed by C. moschata (85,345) and C. maxima (56,598). After filtering, C. pepo and C. moschata had a similar number of SNPs, around 40,000, whereas C. maxima had an order of magnitude fewer filtered SNPs (4787). This discrepancy may be an artifact of using Pst1, a rarer base-cutter previously optimized for use in C. maxima [33 ], rather than ApeK1 which was used for C.pepo and C. moschata. The number and distribution of SNPs across each chromosomes is shown in Table 1.
Chrom. | C. pepo | C. moschata | C. maxima | |||
---|---|---|---|---|---|---|
Raw | Filtered | Raw | Filtered | Raw | Filtered | |
0 | 16901 | 5656 | 3748 | 1236 | 1501 | 419 |
1 | 9245 | 4155 | 4575 | 2627 | 4185 | 300 |
2 | 6160 | 2921 | 4092 | 2535 | 2101 | 169 |
3 | 5908 | 2668 | 3815 | 2393 | 2201 | 157 |
4 | 5540 | 2652 | 7868 | 4458 | 5703 | 382 |
5 | 4813 | 2254 | 3226 | 1804 | 3115 | 154 |
6 | 4555 | 2100 | 3663 | 2182 | 3035 | 345 |
7 | 3677 | 1761 | 3300 | 1784 | 2705 | 148 |
8 | 4551 | 2189 | 2692 | 1577 | 2391 | 191 |
9 | 4521 | 1995 | 3427 | 1902 | 2750 | 229 |
10 | 4366 | 2052 | 4219 | 2225 | 2297 | 120 |
11 | 3839 | 1727 | 5212 | 2962 | 3713 | 309 |
12 | 3777 | 1614 | 5329 | 2286 | 2026 | 162 |
13 | 4002 | 1879 | 3888 | 2013 | 2131 | 257 |
14 | 4275 | 1973 | 5568 | 3198 | 4317 | 297 |
15 | 3086 | 1427 | 3911 | 2358 | 2662 | 172 |
16 | 4274 | 1589 | 3407 | 1987 | 2058 | 302 |
17 | 3519 | 1657 | 3557 | 1888 | 2195 | 251 |
18 | 3568 | 1723 | 3775 | 2105 | 1826 | 133 |
19 | 4015 | 1860 | 3278 | 1716 | 1793 | 169 |
20 | 3687 | 1692 | 3795 | 1623 | 1893 | 133 |
Total | 108279 | 47544 | 85345 | 46859 | 56598 | 4799 |
Group | Species | ||
---|---|---|---|
C. pepo | C. moschata | C. maxima | |
1 | Europe/Asia, mostly for Turkey | South American/Latin American | Mixed origin; kobocha/turban types |
2 | Europe, mostly from Macedonia | South American/Latin American | European, mostly from Macedonia |
3 | North America, wild and landrace ovifera | African | Asia |
4 | Mixed origin | India | South American |
5 | South America, mostly from Mexico | Mixed origin; elongated fruit type | African |
Filtered SNPs were used for population structure analysis. Available geographical, phenotypic, and other metadata were retrieved from GRIN and were used to help interpret structure results. Results from model-based admixture analysis are shown in Figure 2 panel A. These data support five ancestral groups (K=5) in each of the species. Population structure was driven mostly by geography, except in C. pepo where the presence of different subspecies was responsible for some of the structure. Commonalities among structure groups are described in Table 2. The first two principal components (PCs) derived from principal components analysis (PCA) of the marker data are shown in Figure 2 panel B. As with the model-based analysis, PCA showed geography as a main driver of population structure with accessions being derived from Africa, the Arab States, Asia, Europe, North America, and South/Latin America. PC1 in C. pepo separates C. pepo subsp.ovifera, which have a North American Origin, from subsp. pepo.
Ancestry proportions from admixture analysis were projected onto cultivars/market types identified in the accessions, which were excluded from the initial analysis used to infer ancestral groups. Cultivars were grouped according to known market class within species to help identify patterns in ancestry among and between market classes. Key market types identified in accessions from C. pepo including Acorn, Scallop, Crook, Pumpkin (jacko’ lantern), Zuchinni, Marrow, Gem, and Spaghetti; Neck, Cheese, Japonica, and Calabaza in C. moschata; and Buttercup, Kobocha, Kuri, Hubbard, and Mammoth (show squash) in C. maxima. These groupings are shown in Figure 3. In general, members of each market class exhibit similar ancestry proportions. In C. pepo market classes from the two different subspecies had distinct ancestry patterns. For example, Acorn, Scallop and Crook market classes are all from subsp. ovifera and all of these classes had similar ancestry proportions with roughly 50% of ancestry from the wild ovifera. In contrast, market classes within pepo had a small percentage of ancestry from wild ovifera and more ancestry in common with European and Asian accessions. With C. moschata, Neck and Cheese type market classes showed very similar ancestry patterns, whereas the Japonica and Calabaza types were more distinct. Relative to the C. pepo and C. moschata, the C. maxima cultivars were less distinct from one another.
All available historical data from GRIN were compiled. Only traits with \(\geq\) 100 entries were considered for further analysis. Filtering resulted in 21 traits for C. pepo, 5 for C. moscahta and 16 for C. maxima. Traits spanned fruit and agronomic-related characteristics, as well as pest resistances. The number of records for a given trait ranged from 108 to 822, with an average of \(\sim\) 270. Fruit traits included fruit width, length, surface color and texture, and flesh color and thickness. Agronomic data included plant vigor and vining habit, and several phenotypes related to maturity. Pest-related traits included susceptibility to cucumber beetle and squash bug in C. pepo and watermelon mosaic virus (WMV) and powdery mildew (PM) in C. maxima.
Around half of the traits were quantitative/ordinal and half were categorical and coded as binary traits, see Table 3. The majority of traits measured on a quantitative scale were normally distributed. Marker-based narrow-sense heritability (\(h_{G}^2\)) was calculated for each trait. Values for \(h_{G}^2\) ranged from 0.12 to close to 1. Most traits had moderate to high heritabilities (\(\geq\) 0.4). Regression of trait data on the \(\mathbf{Q}\) matrix obtained from structure analysis was used to determine the amount of phenotypic variation explained by population structure. In C. pepo, traits related to fruit morphology tended to have high correlations with population structure (\(R_{pop}^2\)). Seed weight had the highest correlation with an \(R_{pop}^2\) of 0.6. In C. moschata, maturity showed the highest correlation with population structure (\(R_{pop}^2\) of 0.52). None of the 16 traits in C. maxima had a high correlation with population structure. The only exception was plant growth habit.Traits related to pest resistance were measure in C. maxima and C. pepo and had among the lowest correlations with population structure.
Trait | Description | Pop Size | \(h_{G}^2\) | \(R_{pop}^2\) | |
---|---|---|---|---|---|
C. pepo | |||||
Max Fruit Thickness | Maximum fruit thickness in centimeters | 421 | 0.72 | 0.27 | |
Min Fruit Thickness | Minimum fruit thickness in centimeters | 174 | 0.58 | 0.14 | |
Min Fruit Length | Minimum fruit length in centimeters | 413 | 0.82 | 0.37 | |
Max Fruit Length | Maximum fruit length in centimeters | 315 | 0.91 | 0.33 | |
Max Fruit Width | Minimum fruit width in centimeters | 303 | 1.00 | 0.49 | |
Min Fruit Width | Maximum fruit width in centimeters | 413 | 1.00 | 0.49 | |
Fruit Texture | Fruit texture coded as smooth or not smooth | 130 | 0.55 | 0.23 | |
Fruit Skin Pattern | Skin patterning coded as solid color or patterned | 248 | 0.58 | 0.27 | |
Fruit Shape1 | Fruit shape coded as oblong or not oblong | 331 | 0.69 | 0.60 | |
Fruit Shape2 | Fruit shape coded as globe or not globe | 331 | 0.67 | 0.48 | |
Flesh Color | Flesh color coded as either yellow or orange | 377 | 0.53 | 0.19 | |
Fruit Color1 | Color of fruit coded as yellow or not yellow | 181 | 0.55 | 0.19 | |
Fruit Color2 | Color of fruit coded as green or not green | 181 | 0.68 | 0.55 | |
Cucumber Beetle Damage | Severity of beetle damage on a 0-4 scale | 248 | 0.32 | 0.08 | |
Adult Squash Bug | Number of adult squash bugs on plant | 237 | 0.88 | 0.07 | |
Nymph Squash Bug | Number of squash bug nymphs on plant | 166 | 0.46 | 0.02 | |
Plant Type1 | Historical plant architecture data coded as vining or bush | 404 | 0.64 | 0.37 | |
Plant Type2 | Contemporary plant architecture data coded as vining or bush | 293 | 1.00 | 0.36 | |
Plant Vigor1 | Minimum plant vigor on 1-5 scale | 414 | 0.54 | 0.14 | |
Plant Vigor2 | Maximum plant vigor on 1-5 scale | 414 | 0.54 | 0.14 | |
100 Seed Wt. | Weight of 100 seeds in grams | 822 | 0.90 | 0.60 | |
C. moscahta | |||||
Fruit Color | Fruit color coded as orange or not orange | 140 | 0.43 | 0.13 | |
Fruit Surface Texture | Fruit surface texture encoded as smooth or not smooth | 127 | 0.18 | 0.07 | |
Fruit Diameter | Fruit diameter in centimeters | 122 | 0.62 | 0.18 | |
Fruit Length | Fruit length in centimeters | 121 | 1.00 | 0.18 | |
Maturity | Fruit maturity on scale of early to late (1-8) | 108 | 1.00 | 0.52 | |
C. maxima | |||||
Fruit Color1 | Fruit color encoded as gray or not gray | 183 | 0.53 | 0.17 | |
Fruit Color2 | Fruit color encoded as orange or not orange | 183 | 0.57 | 0.08 | |
Fruit Color3 | Fruit color encoded as green or not green | 183 | 0.46 | 0.15 | |
Flesh Color | Flesh color on a scale of yellow to dark orange (1-5) | 231 | 0.44 | 0.09 | |
Flesh Depth | Flesh thickness in centimeters | 251 | 0.29 | 0.01 | |
Fruit Diameter | Fruit diameter in centimeters | 248 | 0.37 | 0.29 | |
Fruit Length | Fruit length in centimeters | 248 | 0.49 | 0.27 | |
Fruit Spot | Fruit spotting from slight to pronounced (1-9) | 193 | 0.40 | 0.01 | |
Fruit Ribbing | Fruit ribbing from slight to pronounced (1-9) | 243 | 0.64 | 0.14 | |
Powdery Mildew Susceptibility | Susceptibility to PM from slight to severe (0-9) | 211 | 0.33 | 0.06 | |
WMV Susceptibility | Susceptibility to WMV from slight to severe (0-9) | 212 | 0.19 | 0.05 | |
Fruit Set | Fruit set from poor to excellent (1-9) | 251 | 0.36 | 0.15 | |
Uniformity | Fruit uniformity from poor to excellent (1-9) | 244 | 0.35 | 0.07 | |
Vigor | Plant vigor from poor to excellent (1-9) | 251 | 0.12 | 0.00 | |
Plant Type | Plant type as vining or not vining | 251 | 0.74 | 1.19 | |
Days to Pollen | Number of days from field transplanting to date of first pollination | 236 | 0.52 | 0.15 |
Genome-wide association was conducted for all traits using standard mixed-model analysis. No significant signals were detected in C. moschata. A weak signal was detected in C. maxima for fruit set on chromosome 12 and fruit ribbing on chromosome 17. Three phenotypes were significantly associated with SNPs in C. pepo: bush/vine plant architecture on chromosome 10, fruit flesh color on chromosome 5, and fruit width on chromosome 3. The bush/vine phenotype exhibited the strongest signal, and the Manhatten plot and p-value quantile-quantile plot is shown in Figure 4.
A candidate gene for dwarfism found in the species C. maxima was named Cma_004516 {[33]} and corresponds to the gene ID CmaCh03G013600 in the Cucurbit Genomics Database. The gene Cp4.1LG10g05740 on chromosome 10 in C. pepo was found to be orthologous to CmaCh03G013600 and coincides with the region significantly associated with the bush/vine plant architecture phenotype identified by GWAS in the C. pepo collection.
A core set of accessions that covered over 99% of total genetic diversity was identified in each of the panels. Roughly 10 to 20% of the accessions were required to capture the genetic diversity in the panels (See Supplemental Figures). This amounted to 245 accessions in C. pepo, 154 in C. moschata, and in 248 C.maxima. The core subset identified in C. pepo was augmented with accessions that represented key market classes or that had traits of interest to breeding programs. Additionally, key accessions were selected from C. maxima, C. moschata and some wild species. Together these genotypes were purified through two additional rounds of selfing and seed will serve as the basis for a Cucurbita ssp. core to be used by breeding programs and researchers for further studies.
Cucurbita pepo, Cucurbita moschata, and Cucurbita maxima,
exhibit a wide range of phenotypic diversity.
This diversity was evident in the GRIN phenotypic records for these species.
We have demonstrated that there is also a wide range of genetic diversity through
genotyping-by-sequencing and genetic analysis of available specimens from
the germplasm collections. Thousands to tens of thousands of whole-genome markers
where discovered for each species. Clustering of samples and admixture analysis
produced results that align closely with known secondary centers of origin in all species.
This was especially clear in our analysis of the Cucurbita pepo collection.
Cucurbita pepo has its origin in the new world, with a secondary center of
diversification in Europe. This pattern was conspicuous in the our PCA analysis.
Phylogenetic anlaysis of Cucurbita pepo using the whole-genome
markers also supported the known relationships between the various subspecies in pepo.
Together with the mapping of a putative bush gene (Bu) that appears to be syntenic
with the bush gene mapped in C. maxima, we have demonstrated that these data
constitute a new, high quality genetic resource for the Cucurbit community.
These markers and our analysis of available germplasm
have a number of uses for breeding and future experments
aimed at biological insight.
Our data provides many genome-wise markers which could be used to develop marker panels for use in breeding applications, as has been done in other crops [35]. Possible breeding applications would include marker assisted selection, marker assisted backcrossing, and purity assessment of seedstock using a low density panel; whereas, a medium density panel could be developed for routine genomic selection. Our clustering of samples based on marker data suggest geography is a key driver for overall population structure. When projecting ancestry proportions onto cultivars of known market classes, the ancestry proportions were relatively similar within market class grouping. Although there is genetic diversity within each species, this diversity is constrained within market classes. This suggests that crosses between these market classes would greatly increase the amount of genetic diversity to be leveraged in breeding efforts. Crossing between market classes would come at the cost of bringing in undesirable characteristics with regards to achieving a specific morpho-type associated market class. This cost could be mitgated through the use of markers to recover morpho-type expediciously during pre-breeding. Ultimately, the judicious infusion of diversity into a breeding program is necessary for sustaining long-term gain.
Genomic selection (GS) was proposed over twenty years ago [36], and has since become a standard breeding technique. Yet, to our knowledge, GS is not used to any appreciable degree by applied breeding programs working with cucurbits. Studies specifically looking at GS in squash have demonstrated, as with every other crop, that GS is a viable breeding method; although the specific implementation may vary for each program and must take into account the nature of the trait being predicted [9,11,37]. Since cucurbit crops are more space-limited than seed-limited, a predict-part-test-part or sparse testing strategy is potentially an even more efficient strategy in cucurbits than it has been shown to be in grain crops [38]. Selective phenotyping of resource-intensive quality traits based on marker data to enable prediction is also low-hanging fruit. Our work lowers the barrier to entry for GS in squash, as it provides a set of markers that can be filtered idependently by interested breeding programs, rapidly convered into an amplicon-based assay, and tested in target germplasm. This set can then be used for routine genotyping, which is a necessary first step towards implementing GS [39].
At the interface of breeding and biology lies the phenomena of heterosis in squash. Although there is some evidence of heterosis in squash, the basis of this heterosis is not well understood. Unlike many other outcrossing monoicous crops such as, maize and onion, cultivars from Cucurbita, similar to sunflower, do not suffer from debilatating inbreeding. With little inbreeding depression, it would stand that little better-parent heterosis would be expected under the dominance theory of heterosis. Initial papers suggested that inbreeding in Cucurbita may not simply reduce yield as inbred varieties have the capacity to compete with commercial check cultivars; however, better-parent heterosis has been observed in a C. pepo and C. maxima. Further, interspecific-heterosis has been observed at the gene-expression level in C moschata x C. maxima hybrids [4]. Anecdotally, interspecific crosses have led to the production of commercially successful cultivars [40]. The genetic groups identified in this study could help direct the development of heterotic groups and the study of heterosis in squash. Although, there is little evidence that crossing between genetically differentiated groups leads to heterosis, these groupings can nonetheless be used to guide the initial formation of heterotic groups. Reciprical recurrent selection will likely be necessary to develop true heterotic groups.
Our data provides a useful starting point for association studies. In the case where traits are common in the panel, the panel can be phenotyped for a trait of interest and combined with marker data and insight provided by our study. We demonstrated this approach in our association analysis of the bush gene. In the case of a rare phenotype, such as a resistance gene, subsets of the germplasm and markers should be used to develop custom populations. Plant introductions (PI) are frequently used as source parents in mapping studies and for germplasm improvement, as was the case for mapping Phytophthora resistance and developing resistant breeding lines [41,42]. Further, if a trait segregates closely with population sructure, as was the case for seed size in C. pepo and maturity in C. moschata, this would indicate that populations should be formed by crossing between the groups identified to remove the confounding effects of population structure [43]. When higher density genotyping may be necessary or the PIs are not well charaterized for a trait of interest, the data generated in this study can be used to prioritize accessions for re-sequencing and phentyping. Our GenoCore analysis provides a subset of several hundred accessions that would likely be informative for re-sequencing efforts.