Genomes by Environment Dataset Now Publicly Accessible

"We have this direct connection between environmental data and phenotypic performance... this is a very unique dataset." - GxE project coordinator Natalia de Leon. Image composite by Mariah Wall/CyVerse, with aerial photo of a 2014 GxE trial courtesy of Natalia de Leon/GxE.

A unique dataset that directly links crop genotype and phenotype information with environmental data is now publicly accessible in the CyVerse Data Commons.

The dataset was compiled by collaborators of the Genomes by Environment (GxE) subproject of the Genomes to Fields Initiative (G2F). G2F aims to catalyze and coordinate research linking genomics and predictive phenomics to generate agricultural and social advancements. The goal of GxE is to collect and make available large-scale datasets that connect crop productivity with environmental data. The datasets could improve our understanding of this interaction and lead to development of crops that are better able to survive climate change.

“These efforts are focused on identifying mechanisms to improve our ability to predict how plants perform in typical production environments using molecular information,” said Natalia de Leon, an associate professor of agronomy in the College of Agriculture and Life Sciences at the University of Wisconsin-Madison, a co-leader of G2F, and one of the coordinators of GxE.

“GxE is focused on accumulating information to understand how maize diversity responds to different environments,” she said. “We aim to explain how specific signatures in the genome of an individual would affect its ability to respond to a particular environmental cue and what internal resources that plant deployed in the process. Incorporating that level of information will enhance models that predict plant growth.”

Sixteen researchers across North America, from Texas to Ontario, Canada, grew a set of genetically characterized diverse maize hybrids at 22 sites and measured productivity and other relevant agronomic and phenological data, as well as field-level data such as planting and harvesting times, inputs used, and planting density. An identical weather station was installed at each site to ensure the same environmental parameters were recorded with similar equipment. Subsets of these sites also deployed different field-based phenotyping tools to test their utility for phenomic prediction.

CyVerse’s Data Store served as a common data platform that the team members can all access despite their disparate geographical locations. The Data Store allows GxE project members to control who can contribute to and view their evolving datasets. As datasets are completed, they can easily be published to the CyVerse Data Commons, where they receive permanent identifiers in the form of Digital Object Identifiers (DOIs) and are publicly available to anyone, with or without a CyVerse account.

Ramona Walls, a CyVerse senior science informatician, was excited to host the GxE data in the Data Commons. “Sharing these types of community-generated, large-scale data was precisely why the Data Commons was created. CyVerse can now support data throughout their whole life cycles, from generation and initial analysis to publication and reuse in new analyses.”

The published dataset contains measurements from 2014, the project’s initial year. The experiment has been ongoing ever since, with new field sites added each year. The team expects data for 2015 to become available by March this year, and to release data annually thereafter.

The dataset should be of especial interest to researchers studying plant predictions, breeders, quantitative geneticists, and modelers, de Leon noted. “We have this direct connection between environmental data and phenotypic performance. For those interested in incorporating environmental data into their prediction tools this is a very unique dataset.” She added that the dataset and infrastructure as a whole represent an invaluable resource to test high-throughput phenotypic tools and other technologies.

The GxE project has pulled together funds from the Iowa Corn Promotion Board (ICPB) as well as numerous corn growers’ associations in other states, and has received funding for information management from the National Corn Growers Association. Individual investigators also have leveraged broad-ranging resources from their own institutions.

“This has really been a grassroots effort,” said de Leon. “This project initiated out of the motivation of the investigators, who decided this was an important thing to do.”

Corn growers across the country agree. “Ever since the corn genome was published in 2009, corn farmers have continued to push for translation of this sequence information into tangible results in their fields,” said David Ertl of the ICPB. “ICPB got involved in funding the G2F Initiative to assist the process of understanding how genes impact corn performance. The GxE project is a large scale, multi-year project to understand how genes and environment influence the phenotype of corn, and then to use that knowledge to predict and create new, more productive, more resilient genotypes, resulting in improved phenotypes for farmers.”

The GxE 2014 dataset can be accessed at dx.doi.org/10.7946/P2201Q.