Data-Reuse

Community-curated guidlines to increase dataset reuse potential

The following guidelines, checks and protocol documenting from the side of the data producer can add significant value to datasets for data reuse:

1. ACCESSING THE DATASET

Include and double-check all links to data repositories in submitted manuscript and proofread before publication
Avoid “Contact corresponding author for data” statements.
Make sure the raw and processed datasets are saved in an accessible location (preferably with redundancy) for the corresponding author to share if needed
When possible, include sample names and identifier information to data file names for ease of access (in addition to GEO names)
Clearly state any applicable ownership rights

2. ASSESSING REUSE SUITABILITY

Clearly state how the data was generated
Document in detail the sample type and collection methods: * Developmental stage, * Organ/tissue/cell types (can include illustrations), * Time of sampling, * Sample storage, * Sample preparation, * Other pertinent details.
Specify whether the sequence was generated from single-end or paired-end sequencing.
Make sure to differentiate which sample sequences are biological replicates and which are technical replicates within different sequencing lanes.
Indicate the type of sequencing chemistry employed, such as Illumina or ONT, and include all relevant technology information.
Provide comprehensive information about the bioinformatics software used, including specific steps involved in the analysis.
If possible, provide a link to the used code on GitHub or other platform, avoid “Contact corresponding author for used custom code” statements.
Include a data key indicating the relationship between the sample metadata and the data location in the public repository.

3. FORMATING

Do not upload large amounts of raw or processed data only in the supplementary files, make sure it is deposited to the appropriate repository (i.e., avoid .csv, .tsv, .pdf for sequencing data).
Use the most common format for the data type and make different formats available when possible (e.g., FASTAQ and .GFF3)
Think about what formats the dataset would most likely need to be interconverted in order to be (re)used and provide that format.
Utilize published metadata format standards for that file format, if available.

4. SKILLS AND RESOURCES (for data producer and reuser)

Do you have the storage capacity to store the dataset(s)?
Do you have the computational power to reuse/analyze the datasets?
Do you have the computational skills to reuse/analyze the datasets? If the answer to any of the questions above is no, seek assistance from your institution and colleagues. Universities often have cluster cloud computing capabilities. Seek education in the form of computational biology and bioinformatics courses, many can be found online.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Reuse

Community-curated guidlines to increase dataset reuse potential

1. ACCESSING THE DATASET

2. ASSESSING REUSE SUITABILITY

3. FORMATING

4. SKILLS AND RESOURCES (for data producer and reuser)

About

Releases

Packages

Contributors 2

License

AgBioData/Data-Reuse

Folders and files

Latest commit

History

Repository files navigation

Data-Reuse

Community-curated guidlines to increase dataset reuse potential

1. ACCESSING THE DATASET

2. ASSESSING REUSE SUITABILITY

3. FORMATING

4. SKILLS AND RESOURCES (for data producer and reuser)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages