Project Details: SOYGEN2: Increasing SB genetic gain for yield & seed composition by developing tools, know-how & community among public breeders in the NC US (2021)

2021

SOYGEN2: Increasing SB genetic gain for yield & seed composition by developing tools, know-how & community among public breeders in the NC US

Home

Contributor/Checkoff:

North Central Soybean Research Program

Category:

Sustainable Production

Keywords:

GeneticsGenomics

Parent Project:

Increasing the rate of genetic gain for yield in soybean breeding programs

Lead Principal Investigator:

Leah McHale, The Ohio State University

Co-Principal Investigators:

Asheesh Singh, Iowa State University
William Schapaugh, Kansas State University
Dechun Wang, Michigan State University
Katy M Rainey, Purdue University
Brian Diers, University of Illinois at Urbana-Champaign
Matthew Hudson, University of Illinois at Urbana-Champaign
Nicolas Frederico Martin, University of Illinois at Urbana-Champaign
Aaron Lorenz, University of Minnesota
Pengyin Chen, University of Missouri
Andrew Scaboo, University of Missouri
George Graef, University of Nebraska
David Hyten, University of Nebraska at Lincoln

+11 More

Project Code:

GRT00060503

Contributing Organization (Checkoff):

North Central Soybean Research Program

$531,450

United Soybean Board

$178,550

Institution Funded:

The Ohio State University

$710,000

Brief Project Summary:

Information And Results

Project Summary

The soybean research community has generated incredible public resources for soybean breeding, including collaborative yield trials such as the Northern Uniform Soybean Trials (NUST) which dates back to 1941 and commodity board funded genotypic data and genotyping platforms. However, these tools can be better leveraged to enhance gains for yield and seed composition in soybean. As part of our first objective, we propose to add value and utility to these resources through development of a breeding database that will be housed within SoyBase, the current community-supported USDA-ARS repository for soybean genetics and genomic data. We also propose the addition of environmental data to the NUST and addition of genotypic data to both the NUST and the SCN Regional Trials, both of which will facilitate breeding objectives for stability of both yield and seed composition.

Genomics-assisted breeding entails the use of genome-wide molecular marker data to aid in breeding decisions that make breeding programs more efficient and effective. Such applications range from the use of genomic selection, which can increase selection intensity and allow selection of parents earlier in a program, to the use of genomic data to optimally pair parents for creation of breeding populations containing more superior breeding lines and even possibly more favorable correlations between traits such as seed yield and protein. This latter application has been called “genomic mating”.

Numerous scientific articles have been published on the development and optimization of genomics-assisted plant breeding and, in part through our current NCSRP project, we have learned a lot about the optimal application of genomics-assisted breeding methods applied to soybean. The actual implementation of genomics-assisted breeding in the public plant breeding communities, however, has been minimal. Thus, Objective 2 is focused on the development and use of high-throughput genome-wide genotyping technologies that are of low cost with high-quality repeatable marker data, and making available tools for genomic data management and decisions that integrate genomic data and phenotypic data along with various analysis pipelines in a user-friendly form. The transfer and availability of these technologies to the public sector is critical to our ability to effectively train future soybean breeders, many of whom will be employed by private sector companies using these techniques.

Increases in soybean yield through breeding have been slower than growers expect. A collaborative study led by Diers of a historic set of MG II-IV varieties released from 1923 to 2008 revealed a recent rate of genetic gain of 0.43 bu/ac/yr, whereas reports of genetic gain in corn generally range from 1.0 to 1.2 bu/ac/yr. Moreover, this same study found that protein has decreased between these time periods by 1.7 percentage points, an undesirable outcome. Based on the mathematical formula for change resulting from selection, there are a number of possible targets for improving the rate of genetic gain. Objective 3 of this work focuses on the evaluation of different breeding methods each of which target one or more areas for improvement, such as selection intensity, accuracy, diversity, and the time required for each breeding cycle, and simultaneous improvement of traits that typically show negative correlations, such as yield and seed protein content. Breeders will implement and test the methods in their own breeding programs to determine which methods are most viable to improve genetic gains.

The proposed activities build on the current project funded to this group by NCSRP, “Increasing the rate of genetic gain for yield in soybean breeding programs.” One main objective in that project deals with extensive evaluation of diverse soybean genotypes from the USDA Soybean Germplasm Collection over four years and 30 environments to obtain high-quality phenotype and environment data. Completion and follow-up on that is detailed under Objective 4 in this project, and it provides foundational information for tool development and implementation here. Information from that study will be leveraged in this project for Objectives 1, 2, and 3. The entire set of 750 accessions, or some various subsets of those (i.e. exotic land races only, elite germplasm only, certain geographical regions only, etc.) can be used as training sets for prediction of yield, seed composition traits, maturity, and other traits for various objectives and for other programs.
Ultimately, in this project SOYGEN (Science Optimized Yield Gains across Environments) will leverage and build upon ongoing and previously funded work to increase soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US.

Project Objectives

Objective 1: Elevating collaborative field trials;
Objective 2: Development of a genomic breeding facilitation suite;
Objective 3: Evaluation of soybean breeding methods that increase gain
Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success

Project Deliverables

1.1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
1.2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
1.3) Data from the uniform tests will become more useful as it will be connected to environmental and genotypic data.
1.4) Breeders will better understand how to weight data from different environments of the Uniform Tests to know how well it will predict performance.
2.1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
2.2) Workshop on genomic selection delivered to public soybean breeding community.
3.1) Methods to improve selection of progeny rows based on genomic selection with secondary traits and/or improved spatial statistics.
3.2) Understand the potential to improve the unfavorable correlation between yield and protein in soybean through genomic mating.
3.3) Application and limitations established for rapid cycling genomic selection in soybean.
3.4) Characterization of allelic effect of putative yield alleles and markers for their selection.
4.1) Comparison of sampling methods and effective ways to efficiently sample the genotype collection, particularly for improvement of quantitative traits like yield.
4.2) Means and variances for traits in the different sampling groups (so, effect of sampling method on those estimates).
4.3) Identify loci associated with yield and other traits in this diverse panel of accessions that represents the genetic diversity in the collection, so we may ID new loci and alleles that will be useful for commercial and public breeding programs.
4.4) Provide genomic predictions for yield (done), maturity, seed protein and oil %, and other traits as appropriate, for all untested accessions in the USDA Soybean Germplasm Collection.
4.5) Investigate genotype-environment interaction effects on traits, and evaluate stability of yield and composition traits across environments.
4.6) Use data/results in implementation in Objectives 1, 2, and 3 of this project (FY20-22).
4.7) Preliminary analysis of data from the validation set of 250 entries.

Progress Of Work

Updated April 2, 2021:
Objective 1: Elevating collaborative field trials
1c. Key performance indicators
(3) Collection of genotypic data from the Soy6KSNP chipfor UT and SCN regional trial entries.
We collected 6K genotype data on all 2020 UT lines. The 2020 SCN UT lines will be planted in the field along with all 2021 UT and SCN UT lines for tissue collection and genotyping.
(4) Weather data will be collected for the majority of the future NUST field environments.
Weather datasets were collected in the site years corresponding to NUST field trials from using the geographic coordinates of the field trials linked with the DAYMET weather data. This information along with field trial phenotypic information will be used to compare the year to year site trialing similarity.
(5) The data from the NUST will be analyzed to determine the usefulness of test locations in predicting the performance of the experimental lines.
1d. Deliverables
(1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
Database tables and draft query user interfaces have been created. Beta testing of the interface by project participants continues.
(2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
Phenotypic data from collaborative trials from 1989 to the present have been loaded into the data tables and are accessible to project participants. Environmental data will be available through an interface to the DayMet meterological API.

Objective 2: Development of a genomic breeding facilitation suite
2c. Key performance indicators
(1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
We have received 7,730 DNA samples to run with the 1k SNP set. We are currently processing these samples. The first 2,592 are in the process of being sequenced.
(2) Beta version of R script to impute underlying whole-genome haplotypes developed.
The scripts were completed and are being tested in the Lorenz laboratory. We have been working to improve their accuracy and iterating new versions to make the scripts more useful in different use cases.
(4) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
During this past reporting reporting period we were able to install a genome-wide marker database called GIGWA (https://gigwa.southgreen.fr/gigwa/). We have deposited our current genome-wide marker data into this, including all the genotype data collected on the UT as part of this project. A workflow of software tools and scripts was initiated to seamlessly combine data held in this database with phenotypic data and genomic prediction models to ease the use of genomic selection in a practical breeding context. There are a few steps that need to be developed, such as low-to-high marker density imputation and training population optimization. The current postdoc left for a permanent position, and we are currently seeking another postdoc to continue this work.
On a related front, co-PI Nelson, with input from Lorenz, is research the adoption of a platform called BreedBase (breedbase.org). We are hoping this can be installed at Soybase and be available to public breeders for depositing the phenotypic and genotypic data and facilitate the use of genome-wide marker data for breeding. This is in the early stages of development right now.
2e. Deliverables
1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
This first batch of 7,000 lines is helping us to streamline our submission process and determine what parts of the genotyping process need to be improved for this summer.

Objective 3: Evaluation of soybean breeding methods that increase gain
3c. Key performance indicators
(1) Preliminary single-site validation of spatial statistics are selection of added growth stage and/or drone based phenotyping and soil parameter factors (Task 1).
Preliminary yield prediction models have been run on single location progeny rows from 2019 using elastic net, ridge regression and lasso. Preliminary results show RMSE of 7 bu/acre and R2 of 0.69. Models have shown relative maturity and pedigree information to have the largest effect on yield. Soil parameters and canopy area have also shown some significance. Soil data is extracted using fine scale soil maps generated in collaboration with soil scientist Dr. Miller and his postdoc Dr. Khaledian. With these soil maps we get soil nutrient data (N,P,K, CA,MG, CEC, NO3, OM) as well as soil texture data on a 3m x 3m scale. Further machine learning and model development and selection criteria are being developed with Dr. Sarkar and his graduate student Luis Riera.
(3) Validation and selection of spatial statistics and added factors based on multi-location data (Task 1).
In collaboration with statistican Dr. Dutta and his graduate student, Dongjin Li, we have prepared a tutorial using the statgenSTA R package. This tutorial includes videos, and an html notebook showing the steps from data preparation, fitting and running models, as well as outlier analysis. The statgenSTA package allows users to fit traditional non-spatial models, as well as spatial models, by including row and column information as well as replications. Users can use the lme4, SPATs or ASREML packages for fitting the data. This tutorial will be shared with the breeding community prior to the fall season. We used the SPATs engine, which uses a penalized spline for spatial correction. This allows for a more dynamic spatial correction compared to the traditional moving means corrections. We also used this tool in our spatial adjustments for 2020 yield trials, and compared it with the traditional moving means method that we have used in the past. We have not validated results yet, on which method used for selection gives more accurate results, and this is an on-going work.
(4) Genotyping of advanced lines, development, and cross-validation of breeding program specific models (Task 2).
7000 advanced lines have been submitted and in the process of being genotyped (see Objective 2).
(8) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
We used genomic prediction to predict the mean, variance, yld-pro correlation, and superior progeny mean of all possible crosses among 2019 and 2020 UT lines. We made this information available to all SOYGEN2 breeders for their consideration in terms of 2021 crosses.
(9) Advance generation by single seed descent for generated crosses in (8) and perform preliminary yield trials with protein data collected by NIRS on F3 or F4 derived lines in FY22 (Task 4).
Due to inability to MTA from the USDA for many of the cultivars used in the pedigrees of these lines, we were only able to complete a single cross combination: LG09-8165 x LG11-5120. F2 seed is currently being generated.
(10) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme (FY20-22) (Task 5).
Crosses were made in Nebraska summer 2020 and sent F1 seeds to Puerto Rico to grow F1 plants from October ’20 to January ’21. Intermating among F1 plants were attempted, but virus issues in Puerto Rico caused issues and we were not able to obtain all of the F1 x F1 crosses. Instead, F2 seeds were harvested from all of the confirmed F1 plants and are now crossing among F1:2 lines for the second intermating.

Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success
4c. Key performance indicators
(1) Soybean breeding programs choose soybean accessions for use in their breeding programs based on results of this work.
Predictions for crosses are now currently being obtained.

View uploaded report PDF file

Updated November 8, 2021:
Objective 1: Elevating collaborative field trials
1c. Key performance indicators
(3) Collection of genotypic data from the Soy6KSNP chipfor UT and SCN regional trial entries.
We collected 6K genotype data on all 2020 UT lines. The 2020 SCN UT lines will be planted in the field along with all 2021 UT and SCN UT lines for tissue collection and genotyping. All materials from 2021 UT and 2020 SCN UT was sampled and DNA isolation will commence shortly.
(4) Weather data will be collected for the majority of the future NUST field environments.
Weather datasets were collected in the site years corresponding to NUST field trials from using the geographic coordinates of the field trials linked with the DAYMET weather data. This information along with field trial phenotypic information will be used to compare the year to year site trialing similarity.
(5) The data from the NUST will be analyzed to determine the usefulness of test locations in predicting the performance of the experimental lines.

Objective 2: Development of a genomic breeding facilitation suite
2c. Key performance indicators
(1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
We have received 9620 DNA samples to run with the 1k SNP set. Thus far, 6764 have been genotyped. The remaining samples are in the process of being sequenced.
(2) Beta version of R script to impute underlying whole-genome haplotypes developed.
The scripts were completed and are being tested in the Lorenz laboratory. We have been working to improve their accuracy and iterating new versions to make the scripts more useful in different use cases.
(4) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
We installed a genome-wide marker database called GIGWA (https://gigwa.southgreen.fr/gigwa/) and deposited our current genome-wide marker data into this, including all the genotype data collected on the UT as part of this project. A workflow of software tools and scripts was initiated to seamlessly combine data held in this database with phenotypic data and genomic prediction models to ease the use of genomic selection in a practical breeding context.
Collaborator Rex Nelson (soybase.org) is working to implent a version of BreedBase for soybean breeders. An overview of the software package was givent to all PIs on the project who unanimously agreed to its utility. The BreedBase team has agreed to allow an instance in their cloud account for our work, which will make installation and implementation significantly simpler.

Objective 3: Evaluation of soybean breeding methods that increase gain
3c. Key performance indicators
(1) Grow single rep progeny row and preliminary yield trials and test two different methods of spatial adjustments (Task 1).
Code and full totorials for the selection process were shared with the entire research group during the last reporting period.
(3) Genotyping of advanced lines, development, and cross-validation of breeding program specific models (Task 2).
More than 6000 advanced breeding lines have been genotyped for the development of genomic selection models (see Objective 2).
(4) Genotyping of 2500 F4 lines in two years for each participating breeding program (Task 3).
DNA has been submitted for genotyping (+1000; Objective 2) while more are in process.
(8) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
We used genomic prediction to predict the mean, variance, yld-pro correlation, and superior progeny mean of all possible crosses among 2019 and 2020 UT lines. We made this information available to all SOYGEN2 breeders for their consideration in terms of 2021 crosses. Crosses with model predicted parents and breeder selected parents were carried out during summer 2021 for 8 breeding programs.
(9) Advance generation by single seed descent for generated crosses in (8) and perform preliminary yield trials with protein data collected by NIRS on F3 or F4 derived lines in FY22 (Task 4).
Due to inability to MTA from the USDA for many of the cultivars used in the pedigrees of these lines, we were only able to complete a single cross combination: LG09-8165 x LG11-5120. Markers were developed for four loci predicted to be selected for yield. F2 lines have been genotyped and harvested. Seed will be sent to a winter nursery in Puerto Rico where F2:3 families will be produced for homozygous alleles and F3 inbred lines will be produced to further the generation of near isogenic lines derived from heterogenous inbred families.
(10) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme (FY20-22) (Task 5).
Crosses were made in Nebraska summer 2020 and sent F1 seeds to Puerto Rico to grow F1 plants from October ’20 to January ’21. Intermating among F1 plants were attempted, but virus issues in Puerto Rico caused issues and we were not able to obtain all of the F1 x F1 crosses. Instead, F2 seeds were harvested from all of the confirmed F1 plants and are now crossing among F1:2 lines for the second intermating.

Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success
4c. Key performance indicators
(1) Compile and annotate the data for the validation study.
As we did for yield and agronomic traits previously, we conducted a genome-wide association analysis for each of the seed composition traits using the multi-year, multi-location phenotype data collected as part of this project, along with the existing 50K genotype data from the collection. The association analyses were conducted by sampling group (CLU, RAN, SSD) and over all lines together. Results for some of the seed composition traits are shown in Figures 4 to 6. We are continuing with analysis and interpretation of these results, identifying significant SNPs and underlying genes for each of the traits.

View uploaded report PDF file

Final Project Results

Benefit To Soybean Farmers

Ultimately, in this project SOYGEN (Science Optimized Yield Gains across Environments) will leverage and build upon ongoing and previously funded work to increase soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US. This work will result in greater genetic gains in soybean for yield, as well as any other targeted trait. This will translate to improved cultivars which achieve higher yields and higher quality.

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.