Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a sameas 'lite' api #86

Open
ColmMassey opened this issue Jan 30, 2022 · 9 comments
Open

Design a sameas 'lite' api #86

ColmMassey opened this issue Jan 30, 2022 · 9 comments
Assignees

Comments

@ColmMassey
Copy link
Collaborator

To date most uses of sameas we considering using stores of sameas triples when running big sparql queries so that multiple data sets get merged. However, we should explore a lighter version too, which may be more flexible, so only merging data about a single initiative when it is needed, so we would be doing realtime dynamic multi database sparql queries from a sea-map instance say, but retrieving much a smaller set of information.

@ColmMassey ColmMassey self-assigned this Jan 30, 2022
@ColmMassey
Copy link
Collaborator Author

We still need a source of sameas triples. Let's put how we generate that to one side, as there are many ways to do this. Let us consider the task of getting all the available data on an initiative that exists in the dotcoop and oxford data sets. Can we expose an api which takes a uri of the initiative and uris of the candidate datasets and returns a merged version of all the data available on that initiative from any of those datasets. The sparql query we would need to call should be straight forward, and the json returned would have the same structure of the data sent to sea-map instances when big data sets are requested. In this way if we used this call when we were selecting an initiative to display its data in a dialog, we could use the same code to display it.

@ColmMassey
Copy link
Collaborator Author

ColmMassey commented Jan 30, 2022

The api call would look something like...

https://lod.coop/coops/find?initiative="dotcoop/404939450" &stores="sea-lod/oxford, dotcoop"

@ColmMassey
Copy link
Collaborator Author

So let's set up a test environment for designing the sparql query

@ColmMassey
Copy link
Collaborator Author

What options are there for handling clashes in the absense of data time stamps?

  • Include duplicates and leave the requester to resolve (Can this be done with our current JSON serialisation? @wu-lee ?)
  • If there are multiple candidates for a field, choose the one earliest in the list provided in the URL parameter
    We could indicate which option we are going for in the url too, like...

https://lod.coop/coops/find?initiative="dotcoop/404939450" & stores="sea-lod/oxford, dotcoop" & mergeOption=ALL_FIELDS|FIRST_FIELD

@wu-lee
Copy link
Contributor

wu-lee commented Feb 11, 2022

For some reason I seem to have got notifications for these posts a bit late.

Not sure if I entirely understand the scenario/motivation here, which means I am not sure how to answer. The above look like heuristics which would need to be layered on top of each other.

As an aside, I recall finding a paper which suggest that sameas has problematic semantics, and does not scale to boot. Possibly this:

https://link.springer.com/chapter/10.1007/978-3-642-17746-0_20

But I'm not sure of alternatives - perhaps synthesising a new graph from two or more others?

@ColmMassey
Copy link
Collaborator Author

For some reason I seem to have got notifications for these posts a bit late.

As you are not assigned to it, you may be using different notification settings. It's not a scheduled ticket, so no hurry.

@ColmMassey
Copy link
Collaborator Author

As an aside, I recall finding a paper which suggest that sameas has problematic semantics, and does not scale to boot.

I read the abstract, but not the paper. So it seems there are a few ways if interpreting what sameAs can mean. I wonder how ambiguous it would be for how we would use it?

I think we want to say that that the two uris referenced in the triple refer to the same real world SSE initiative. How could this be ambiguous?

We need to look at how sameAs is being used elsewhere when addressing organisations.

Hard to imagine at the moment how it would change how we use it though...

@ColmMassey
Copy link
Collaborator Author

Anyway, what would the next steps.

  • Manually create a triple store of sameAs statements for initiatives in dotcoop data and sea/lod
  • Write the sparql query which would return all data about an initiative from both specified stores and using sameAs store
  • write a web query api to do this, returning the json in sea-map compatible format
  • create a web page to test this manually

@ColmMassey
Copy link
Collaborator Author

Not sure if I entirely understand the scenario/motivation here, which means I am not sure how to answer. The above look like heuristics which would need to be layered on top of each other.

Might need a chat at some point to clarify motivation and how I imagine implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants