In the age of large, distributed, digital data, the CyVerse Data Store offers solutions to many contemporary data storage needs. CyVerse's cloud-based data storage is optimized for large data, free to most scientific researchers, accessible through multiple interfaces, and leaves access control in the hands of the data owners. This policy describes the key features of the CyVerse Data Store as well as the obligations between CyVerse and its users.
- This policy covers any data stored on the CyVerse Data Store, whether private, shared, or public.
- The Data Store provides research scientists, research groups, and research organizations with private, shared, or public storage allocations primarily for use within the CyVerse cyberinfrastructure (CI).
- The CyVerse Data Commons houses public data within the Data Store for use by the research community either within or outside CyVerse CI.
- The Data Store offers reliable, secure storage for datasets of any size that are actively being used for both research and/or education purposes.
- Data and metadata in the Data Store are stored in a high-performance storage resource that has built-in redundancy and is continuously monitored for security and failure.
- Data in the Data Store are synchronously backed up at both the University of Arizona in Tucson and at the Texas Advanced Computing Center in Austin.
Data Storage Options
- All domestic or international CyVerse users receive 100GB of storage at no cost, to be used for research data.
- Additional storage allocations can be requested using the allocation increase request form.
Private and Shared Data
- Private data is stored in a user's home folder ( /iplant/home/$user, where $user is a CyVerse username). You may share data you own with other registered users via the Discovery Environment or using iCommands or create public links to the data.
- For methods of moving data in and out of the CyVerse data store, see Downloading and Uploading Data.
- You are automatically owner of any data you upload into your hoome folder.
- Any data in your home folder counts toward your total data allocation, whether shared with others or not. Data that other users share with you does not affect your allocation, even if they make you an owner.
- For data shared within a lab group or with a few collaborators, we recommend that the project leader or PI create a shared folder within their home folder. The project leader can request a larger allocation for collaborative projects, if needed.
- For community-based projects or collaborations involving many PIs, one of the PIs can request a Community Data folder. Community Data folders are intended for large collaborative projects that will ultimately produce public datasets.
- A Community Data folder is requested by an individual CyVerse user, who becomes the owner, but multiple collaborators must be listed on the application. Community Data folders count toward the owner's allocation, but it is understood that owners of Community Data folders will have larger total data allocations, and their personal allocation will not be penalized for hosting a community folder.
- To request a Community Data folder, use the Community and Public Folder Request form. You can simultaneously request an additional allocation for your Community Data folder.
- Community Data folders are housed in the iPlant/homes/shared directory and are visible via the Discovery Environment under "Community Data." Any user who has permission to view the folder can also access its contents using any of the methods available to logged-in CyVerse users. The owner controls who can access the data within a Community Data folder, but it is expected that multiple community members will have read and write access to the folder.
- We recommend that only one and at most two people be the owners of any shared folder, as owners have the permission to permanently delete data and remove other users.
- Data in a Community Data folder can be published to a repository, but the published data should move out of the CyVerse Data Store, unless a formal request is made to continue to house them as part of the Data Commons for use with CyVerse analysis tools.
- If all or part of a Community Data folder is made public, it becomes part of the Data Commons (below) and is subject to minimum metadata requirements.
- Note: Community Data folders will be phased out. All public folders shared by a community will be considered Community Released Data in the Data Commons (see below). Private folders shared by communities will be managed as groups, with similar functionality to that now provided.
Public Data on CyVerse: The CyVerse Data Commons
- The CyVerse Data Commons manages all public data (data accessible without a user account) that is stored in Community Released Data under the directory /iplant/home/shared and allows CyVerse users to publish data through the CyVerse Curated Data. The Data Commons also supports publication to external data repositories. "Public data" on CyVerse is defined as any data that is visible to the public via datacommons.cyverse.org, whether or not the viewer has a CyVerse user account. Public data on CyVerse is also available to registered users via all methods used to access the CyVerse Data Store.
- Read the Data Commons User Agreement for more policies on sharing and reusing data in the Data Commons.
Community Released Data Folders
- Individuals or groups of researchers who want to make data public without a permanent identifier can request a Community Released Data folder. Community Released Data folders are intended for datasets that are growing or changing frequently, or that may not need long-term preservation.
- Read the Data Commons User Agreement for more policies on sharing and reusing data in Community Released Data Folders.
CyVerse Curated Data
- CyVerse supports publication of data to CyVerse Curated Data and to selected external repositories. Published data is considered stable, because it should not change unless a new version is published, and because the publisher makes a commitment to long-term access to the data.
- Datasets published in the DCR have permanent identifers - Digital Object Identifiers (DOIs) or Archival Resource Keys (ARKs).
- For more information on the DCR, see the Data Commons User Agreement.
Intellectual Property and Responsibilities
- For private storage and shared data in the CyVerse Data Store, including private Community Data folders, data depositors maintain full ownership, authority, and control over metadata and permissions, as described in the Service Level Agreement.
- For Community Released Data folders, data depositors maintain ownership, authority, and control over metadata and permissions, but the bulk of the data in the Community Released Data folder must remain publicly available.
- For data deposited in CyVerse Curated Data, data depositors maintain intellectual ownership and authority over the data, but no longer have the ability to edit the data or metadata. See the Data Commons User Agreement for information on making changes to the data in the DCR.
- Any public data in the Data Commons that is derived from another source must properly attribute the primary sources and authors before being uploaded to the Data Commons. Attribution is the responsibility of the researcher making the submission or transfer.
- Any data, including derived data, shared with the public through CyVerse must comply with any copyright or reuse restrictions placed on the original source data.
- If there is a dispute about data that is uploaded or published to the Data Commons, CyVerse will not display the data publicly until the dispute is resolved.
- CyVerse does not assert ownership or any intellectual property rights to data it makes available. For derived data, the data policy of the data originator will take precedence over the data policy of CyVerse.
- When use of data in the CyVerse Data Store that was produced by someone else, users must cite or attribute the data originator and must comply with additional terms and conditions of use, if any, set by the data provider. To cite data stored or used at CyVerse, see Acknowledging and Citing CyVerse.
- Users should be aware that U.S. and international copyright laws differ on the treatment of data collections. Users wishing to redistribute public data, for example, as a merged data product, are required to follow the applicable laws of the country where the data originated. It is the responsibility of the data user to be aware of the terms and conditions of data use.
Security and Privacy