This is a project to model linked data schemas (ie, lightweight ontologies for the fields of agriculture, food, agri-business, plant biology.
The work leverages mainly schema.org and bioschemas. As in those projects, we aim at very simple and practical modelling, which can be useful to share data in an interoperable way, especially by means of APIs and annotated web pages.
The work was born within the Design Future Wheat project, and for the moment it's focused on the use cases dealt with in it. In fact, so far we have been building our schemas starting from well known use cases within crop improvement research. There have been two DFW hackathons where we have done most of the work so far.
The work is mainly based on modelling from concrete use cases.
Based on the models above, We're building a prototype dataset, which includes Knetminer (biomolecular knowledge graph, which, in turn, includes a number of other data sources and data types), and the Gene Expression Atlas (experimental data about gene expression). The data are published on the Knetminer's SPARQL endpoint.
Projects and applications that use our prototype data:
KnetGraphs Gene Traits
A student project led by Menna Shehata, which uses our SPARQL endpoint to find significant traits associated to genes of interest, by means of a gene set enrichment analysis approach (GSEA), which leverage the flexibility and power of standardised knowledge graphs.
The software we are writing to produce the prototype above are being arranged into re-usable extraction/transformation/loading tools (ETL). A few pointers are:
- ETL transformations included in the prototype
- Biotools, utilities for dealing with biological data
- Knetminer Python Utils, generic utilities for the Python environment
A collection of references and links to various similar projects, hackathons, schemas, etc.