This agreement is between CyVerse and the users of the CyVerse Data Commons. Using services or data available at or submitting data to the Data Commons (DC) requires agreeing to the following policies. This document covers only policies specific to the DC. The CyVerse Data Policy covers policies relevant to any data hosted by CyVerse, including data in the DC. Acceptance of this document implies acceptance of the CyVerse Data Policy. Please see other CyVerse Policies for general usage of CyVerse cyberinfrastructure.
About the Data Commons
The Data Commons (DC) provides services within the CyVerse cyberinfrastructure to organize, preserve, and publish data drived from scientific research. We strive to aid researchers in creating, managing, publishing, reusing, and discovering research data by:
- Facilitating metadata entry and acquisition;
- Supporting the translation of metadata across existing metadata standards such as DataCite, Dublin Core, MIxS, or MIAPPE;
- Publishing data through the Data Commons Repository or to external repositories;
- Providing access to public data that is in the CyVerse Data Store;
- Providing persistent access to datasets through globally unique, permanent identifiers (DOIs and ARKs);
- Connecting data to analyses conducted on CyVerse platforms to support reproducible science;
- Raising data visibility and discoverability;
- Preserving datasets in secure and reliable large-scale storage systems.
DC development builds on foundational CyVerse infrastructure such as our Data Store, APIs, and user interfaces, while expanding into new areas such as metadata and ontologies, a data repository, and federation with external collaborators and repositories. Key components of the Data Commons are the web data portal at http://datacommons.cyverse.org/ (also http://dc.cyverse.org) and functions within the Discovery Environment such as metadata templates, permanent identifier requests, data submissions to NCBI, and a Projects Interface (under development).
Data Commons Mission and Vision
Infrastructure for open data from scientific research, where data can live as a searchable, discoverable, and reusable resource.
To aid researchers in managing, publishing, reusing, and discovering data.
Data Commons Functions
Data Organization and Curation
Data curation is the set of processes, involved in generating and maintaining a sustainable, complete, and accurate dataset across time. In the DC, users are the primary, specialized curators of their own data, because they know their data and how it was produced. DC users are responsible for organizing and describing their data and how it was produced. DC users are responsible for organizing and describing their data in a way that represents their research. To facilitate these activities, the DC provides functions through the Discovery Environment where users can organize and append standardized metadata to the data that they will publish, including metadata templates and bulk metadata upload. In addition, data curators on the D team are available for consultation about how to organize data, what metadata standards are recommended for your data, and how to assign identifiers.
A data curator verifies all datasets that are submitted for publication and will contact users if they identify incomplete metadata or any issue that can be improved to present data to the public in a way that is clear for reuse. The curator does not verify the contents or quality of the data files; this is the responsibility of the researcher creating the dataset. Data in the DC is not peer-reviewed, but it may be reviewed outside CyVerse as part of a journal article.
The DC hosts public data (data accessible without a user account) that is stored in Community Released Data under the directory /iplant/home/shared and allows CyVerse users to publish data through the CyVerse Curated Data. The DC also supports publication to selected External Repositories. See the CyVerse Data Policy for a detailed description of the different types of data stored at CyVerse. The key difference between Community Released Data and CyVerse Curated Data is that Community Released Data are controlled by a community member and subject to change, whereas CyVerse Curated Data are unchanging and can only be updated by DC curators.
The following points apply to all data made available through the DC, either in Community Released or CyVerse Curated Data:
- By submitting data to the DC, you give permission to make the data publicly available.
- Data in the Data Commons are visible to anyone via the Data Commons web interface and all methods described in Downloading Data without a User Account.
- CyVerse provides access to research data so that they can be used with CyVerse tools and services. The current focus of the DC is on life sciences data, other data types will be considered on a case-by-case basis.
- Public data that are appropriate for hosting at a long-term public data repository (e.g., NCBI, Dryad, or TreeBASE) should be deposited at that repository. If there is a valid reason to duplicate data from a public repository on the CyVerse Data Store (e.g., the dataset is enhanced with additional data or features or is actively being used for analysis with the CyVerse cyberinfrastructure), the depositor should specify this in their request for Community Released Data or when requesting a permanent identifier.
- Any data, including derived data, shared with the public through CyVerse must comply with any copyright or reuse restrictions placed on the original source data.
- If there is a dispute about data that is uploaded or published to the DC, the data will not be displayed publicly until the dispute is resolved.
- When necessary or preferable for technical reasons, CyVerse may mirror or replicate existing reference databases. For data provided to CyVerse by a reference database, CyVerse will comply with the policies asserted by the specific data source.
- Data published to any external repository via CyVerse services is subject to the terms and conditions of that repository.
- There is no limit to dataset size in the DC. When you request a Community Released folder, you must specify the expected size of the data, and an allocation increase will be considered simultaneously. If you plan to publish a dataset to CyVerse Curated Data that is larger than the default allocation of 100GB, you must first request an allocation increase, so that you can upload and organize your data in your private directory. As part of your allocation increase, you should indicate that you plan to publish the data to CyVerse Curated Data. For methods of uploading data into the CyVerse Data Store, see Downloading and Uploading Data.
Community Released Data
Community Released Data are available for evolving datasets that individuals or communities want to make available as quickly as possible for research and reuse. Community Released Data are intended for datasets that are growing or changing frequently or that may not need long-term preservation. Data can transition from Community Released to CyVerse Curated by requesting a permanent identifier.
In addition to the policies above for all data in the DC, the following apply specifically to data in Community Released Data:
- Community Released Data are required to meet minimal metadata standards, as described in the Preparing Community Released Data. The owner of the folder maintains control over data organization.
- Community Released Data are owned by the user who requests the folder and count toward that user's allocation. However, it is understood that users owning Community Released Data will have larger total data allocations, and their personal allocation will not be penalized for hosting a public folder.
- To request a Community Released Data folder, use the Community Released Data Request Form. You can simultaneously request an additional allocation with this form.
- Data in Community Released Data can be published to a repository, but the published data should move out of the Community Released Data folder, unless a formal request is made to continue to house them there, for example, for use in CyVerse analysis tools.
- You may keep part of the data in a Community Released Data folder private (for example, data being prepared for release or supporting content such as data management documents), but Community Released Data folders are intended to hold primarily public data.
CyVerse Curated Data
Data publication to CyVerse Curated Data is a service offered for datasets that are intended to be stable and permanent. For CyVerse Curated Data, the DC provides landing pages, permanent Digital Object Identifiers (DOIs) or Archival Resource Keys (ARKs), and the requirement to include an open data license. Permanent identifiers allow data to have a stable location on the web so that other users can always find it, along with the information that makes it understandable, citable, and reusable. An open data license is important to allow others to reuse your data, but it does not exclude users from the obligation to correctly cite your data.
In addition to the policies above for all data in the DC, the following apply specifically to CyVerse Curated Data:
- Before requesting a permanent identifier, you must determine if your data is ready to publish in the DCR.
- CyVerse Curated Data must meet minimum requirements for data organization and metadata, as well as including a ReadMe file and inventory, as specified in the CyVerse Curated Data Organization Guidelines. This includes using the DOI Request - DataCite Metadata template available through the Discovery Environment.
- Additional scientific metadata for both the home data folder and elements contained in the folder are highly recommended and may be required in some cases.
- Metadata are displayed on Data Commons landing pages. The landing page is the best advertisement for a research project, and it is the user's responsibility to provide complete and accurate data and metadata about the project for display on the dataset landing page.
- A citation will be generated automatically from your metadata. If you have special citation requirements, you may include them in the How-to-Cite metadata field and in th eReadMe file.
- CyVerse Curated Data generally does not publish data if a canonical repository for the data type already exists (e.g., NCBI, Treebase).
- For data deposited in CyVerse Curated Data, data depositors maintain intellectual ownership and authority over the data, but no longer have the ability to edit the data or metadata. To make changes to the data published in CyVerse Curated Data, contact email@example.com.
The DC provides documented and easy-to-use workflows for users who want to publish data through canonical repositories such as NCBI.
- For more information, see Publishing through the Data Commons.
- Data published to any external repository via CyVerse services is subject to the terms and conditions of that repository.
The DC fully supports reuse of the data it hosts. If you download or reuse any data in the DC, you must:
- follow the conditions that are stated in the data license for the dataset(s) you use.
- follow any conditions for data reuse stated in the associated metadata and REadMe files.
- cite the dataset using its DOI using the citation information available in the dataset landing page, if you use data stored in DC for work that produces a publication.
New data derived from original DC data may be distributed only under terms and conditions established by the creators of the data and stated in the license.
Long-term Preservation and Access to Data in CyVerse Curated Data
CyVerse Curated Data are stored in a high-performance storage resource that has built-in redundancy and is continuously monitored for security and failure, and they are synchronously backed up at both the University of Arizona in Tucson and at the Texas Advanced Computing Center in Austin. At ingest into CyVerse Curated Data, data are manually checked for organization, format (to ensure that they are readable by non-proprietary software), completeness of metadata, and inclusion of a ReadMe file. An md5 checksum is generated and sdisplayed as part of the file's metadata so that users can check its authenticity.
Data and metadata in CyVerse Curated Data are visible to anyone via the Data Commons web interface and via all methods described in Downloading Data with a User Account. Through a contract with EZID, CyVerse is committed to the long-term preservation of CyVerse Curated Data. If CyVerse Curated Data services are discontinued for some reason, we will make arrangements to transfer the published data free of charge to another long-term repository that will sustain access to the data and metadata, and the DOIs will be redirected to the new location. All CyVerse users will be notified of the new location of CyVerse Curated Data before the move is completed.
Data and metadata is Community Released in the DC are not guaranteed for long-term preservation. Community Released Data that are in active use (have been accessed in the past year) are available via the Data Commons, can be accessed without a CyVerse user account using any of the methods described on Downloading Data without a User Account, and are searchable to anyone with a CyVerse user account through the Discovery Environment.
Data in the Data Commons that are inactive (have not been accessed in over one year) may be moved to a long-term storage archive, where they will be available upon request.
THE SERVICES AND DATA OF THE CYVERSE DATA COMMONS ARE PROVIDED "AS IS." NO WARRANTIES OR REPRESENTATIONS ARE MADE RELATING TO THE DC OR ANY DOCUMENTATION. NO WARRANTY IS PROVIDED THAT THE DATA COMMONS PORTAL OR ANY DATA WILL SATISFY ANY REQUIREMENTS, THAT THE DC OR ANY OF THE DATA THERIN IS WITHOUT DEFECT OR ERROR OR THAT OPERATION OF CYVERSE WILL BE UNINTERRUPTED. ALL TERMS AND CONDITIONS OF THE CYVERSE SERVICE LEVEL AGREEMENT, CYVERSE DATA POLICY, AND CYVERSE ACCEPTABLE USE POLICY APPLY TO THE DATA COMMONS, INCLUDING THE FOLLOWING POINTS:
- Prohibited content: Users shall not post, transmit, or store data or content on or through CyVerse servers which in CyVerse's sole determination, constitutes a violation of any federal, state, local, or international law, regulation, ordinance, court order or other legal process, as detailed in the Acceptable Use Policy.
- Abuse and unacceptable uses: Users are prohibited from engaging in any activities that CyVerse determines, in its sole and absolute discretion, to constitute abuse or unacceptable uses, as detailed in the Acceptable Use Policy.
- Intellectual propety infringement: Users may not transmit, distribute, download, copy, cache, host, or store on a CyVerse VM or server any information, data, material, or work that infringes the intellectual property rights of others or violates any trade secret right of any other person/user. See the Acceptable Use Policy.
- Copyright: Any data, including derived data, shared with the public through CyVerse must comply with any copyright or reuse restrictions placed on the original source data (Data Policy).
- Disputes: If there is a dispute about data that is uploaded or published to the DC, we will not display the data publicly until the dispute is resolved (Data Policy).
Agreement and Policy Subject to Change
The functionalities, business model, and characteristics of the DC are continually improving; thus details of this agreement and policy are subject to revision every three months.