CyVerse community member Ray Enke, an assistant professor of biology at James Madison University, with some of his students. (Image: JMU 4-VA)
When a multidisciplinary group of geneticists, botanists, and plant scientists submitted the original proposal for the iPlant Collaborative, CyVerse’s nom-de-naissance, in 2008, one of its key components was to provide “new ways to access, understand, connect, and evaluate complex data and information... enabling effective remote use of Collaborative resources.”
That mission is now fulfilled with the suite of CyVerse products. The CyVerse Discovery Environment, or DE, especially provides an accessible web-based platform for secure data storage, sharing, analysis, and visualization tools to researchers worldwide.
The DE also answers the needs of researchers who struggle with data storage options at their home institution. “I have limited server access, so the DE is very helpful for me to do my research because so many of the tools that I need are already installed or can be installed easily,” said Margaret Woodhouse, a genome and plant biologist and scientific illustrator.
Woodhouse uses the DE as a portal to CoGe (Comparative Genomics), an independent platform that leverages CyVerse’s computational infrastructure to store thousands of genomic datasets and provide analytical tools to compare and analyze them.
The easy-to-use interface allows researchers or students with limited computer skills to efficiently complete computational tasks. Woodhouse also finds the DE to be a useful instructional tool for those with no previous programming experience.
Through the DE, anyone with a freely-available user account is able to access scalable tools and data, run existing bioinformatics software apps on CyVerse clusters or supercomputers, use hundreds of command line tools and scripts without the need to learn command line (and easily integrate any command line tool), access and manage data files, workflows, and results, and more.
Ray Enke, an assistant professor of biology at James Madison University who studies retinal development, diseases, and cell-specific differences between retinal neurons, also finds CyVerse’s free data storage through the DE especially useful. “One of the nice things about CyVerse is you can have it as a personal store for data that you’re still working on, until we’re ready to publish datasets in a public repository.”
Enke was introduced to the CyVerse DE at a workshop at Cold Spring Harbor Laboratory, a CyVerse partner site, in 2014. “The idea was to teach undergraduate faculty to integrate RNA-sequencing analyses into undergraduate curricula.” Enke and others learned to use CyVerse tools including the DE, and then took their knowledge back to their home institutions to integrate into undergraduate courses.
“RNA-seq fit well with my research program and also with the undergraduate courses I teach,” Enke said. He now uses the DE to share datasets with students. “It’s handy to have a single project folder that 24 students can access. I can share data easily with my students and with collaborators at other institutions.”
Enke, together with collaborators and students, recently published a Scientific Data article describing RNA-seq data generated and curated for retinal development in chickens. Enke and his team shared datasets and analyses in the DE while they worked on the publication.
“I use CyVerse for pretty much all of my sequencing data storage and sharing,” Enke said. “CyVerse has become indispensable for me.”
“The part that excites me the most is that we are only beginning to see what researchers can do as they think about analyzing all available data rather than just their own,” said Eric Lyons, a CyVerse co-principal investigator and an assistant professor in the University of Arizona’s School of Plant Sciences. “CyVerse has become an essential part of the life science research fabric, connecting researchers and their data to the computing resources they need to collaborate on a scale that has never been possible before.”