Skip to content
Peter Selby edited this page Jun 4, 2024 · 14 revisions

Data federation technology overview

The AgBioData Data Federation Training Working Group was tasked with developing training material for the AgBioData Community on different solutions for data sharing and data federation. Below is the compiled results of that 12 month effort. This is not an exhaustive list, but the technologies described here represent the most used data sharing techniques within the agricultural, breeding, and scientific community.

How to use this resource

Each page listed below contains a short description of the technology, a recorded presentation by an expert, a subjective cost estimate, pros, cons, example use cases, and an assessment of how the technology promotes FAIR data.

This web page should act as a beginners guide to data sharing technologies. Use this resource to simply browse and see what is available. Or use it as a comparison tool when trying to decide which technology is best for a new use case.

Technologies to Explore

Project Website Pros Cons
Faidare Public Faidare Increases data findability and accessability
Can connect to existing systems
Supports data discovery only, no other data management features
iRODS irods.org Can manage large datasets
Robust access management and collaboration
Not suitable for small or highly structured datasets
setup and maintenance may require dedicated staff
RDF RDF 1.1 Primer Privacy preserving data sharing
very flexible metadata
Semantic models can be difficult to setup
strict data structure requirements shared between stakeholders
BrAPI brapi.org Represents a domain specific standard,
Good as an addon to existing systems
Requires custom development work,
Mapping existing data models to the standard can be time consuming
GraphQL graphql.org Robust querying language for precise results
Easy integration from multiple sources
Specific data model must be developed first
Compute time/space complexity for large or complex datasets
Globus globus.org Strong emphasis on data sharing and data transfer
Well suited for large datasets and large files
Built for file sharing, not database access
Not useful for data discovery
SOLID solidproject.org Open source standards built on existing tech
Very strong security, ownership, and access controls
Still a developing technology
No specific data standards or tools for biological or PGR data
Clone this wiki locally