Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test presence of a metadata SID in publish_update #77

Open
dmullen17 opened this issue Mar 21, 2018 · 10 comments
Open

Test presence of a metadata SID in publish_update #77

dmullen17 opened this issue Mar 21, 2018 · 10 comments
Assignees
Labels
blocker inhibiting support team from publishing with this feature bug

Comments

@dmullen17
Copy link
Member

dmullen17 commented Mar 21, 2018

When a package has an SID attached to a metadata object publish_update might fail.

This seems to happen whether the user specifies the SID or the version (pid) as the metadata_pid argument. It's possible this isn't actually an issue and was due to slow indexing. But it's worth creating a dummy SID to test this, and determining the correct workflow for updating a package with an SID if there is no bug.

@dmullen17 dmullen17 changed the title Presence of a metadata SID breaks publish_update Test presence of a metadata SID in publish_update Mar 21, 2018
@amoeba
Copy link
Contributor

amoeba commented Mar 22, 2018

Interesting. SIDs are less well-tested territory.

publish_update might fail.

How does it fail?

@jagoldstein
Copy link

That is the question... @dmullen17 will try to reproduce this failure using a UUID as a SID on test.arcticdata.io.

It may fail by not aggregating the EML with the rest of the package.
OR it may appear to fail silently on the front end by simply not updating.
OR it may take longer than usual to update (or index) a package with a SID (4 to 16 hours) and we just think it failed after the first few hours of waiting and refreshing the landing pages.
OR something else...

@dmullen17
Copy link
Member Author

It disassociates the metadata from the resource map + data objects. The way Sharis and I tried it was calling publish_update on a parent with an SID and using the version pid for the metadata_pid argument. It seems like this failed in both cases, although we haven't ruled out slow indexing (it took 4+ hours for both of ours).

Let me see if I can come up with an MRE on test to rule out slow indexing

@dmullen17
Copy link
Member Author

dmullen17 commented Mar 23, 2018

@amoeba @jagoldstein
Here's the first version of the package with a SID: https://test.arcticdata.io/#view/urn:uuid:385710b2-4cbe-43b4-b83a-9d1e6c21d7bd

Second version after publish_update (not working as of this comment - no resource map, data pids, or children present): https://test.arcticdata.io/#view/urn:uuid:8c96da2e-b2c5-42f7-a4ae-a85eefd0b2bb

Hopefully this will resolve by tomorrow and we can chalk it up to an indexing issue.

Here's the MRE:

## MRE publish_update SID Issue 

## Set test token

## Load nodes
cn <- CNode('PROD')
mn_prod <- getMNode(cn,'urn:node:ARCTIC')
cnTest <- CNode('STAGING')
mn_test <- getMNode(cnTest,'urn:node:mnTestARCTIC')

## Load working version of clone_package
devtools::install_github("dmullen17/datamgmt@clone_package")
library(datamgmt)

pkg <- clone_package("resource_map_urn:uuid:8c4cb5f6-9b11-4975-9668-c875dc4bbc2a",
                     from = mn_prod,
                     to = mn_test,
                     clone_child_packages = TRUE)

## Add SID
sid <-generateIdentifier(mn_test)
s1 <- getSystemMetadata(mn_test, pkg$metadata)
s1@seriesId <- sid
updateSystemMetadata(mn_test, pkg$metadata, s1)

## Check that no pids changed 
pkg_with_sid <- get_package(mn_test, pkg$resource_map)
expect_true(all(unlist(pkg) %in% unlist(pkg_with_sid)))

## Update metadata 
eml <- read_eml(rawToChar(getObject(mn_test, pkg$metadata)))
eml@dataset@title <- c(new("title", .Data = "Updated package with SID"))
eml_path <- file.path(tempdir(), "science_metadata.xml")
write_eml(eml, eml_path)

update <- publish_update(mn_test,
                         pkg$metadata,
                         pkg$resource_map,
                         metadata_path = eml_path,
                         child_pids = pkg$child_packages)

@amoeba
Copy link
Contributor

amoeba commented Apr 23, 2018

Harumph, I didn't update this Issue when I did some research a while back. We do have a bug related to SIDs somewhere in here. I'll look for and link in relevant tickets/notes.

@amoeba
Copy link
Contributor

amoeba commented Apr 23, 2018

Found some SID-related bugs:

https://redmine.dataone.org/issues/8520: Causes metadata objects with SIDs to not get indexed sometimes. A dev can fix this after-the-fact but it does require an intervention.
https://redmine.dataone.org/issues/8536
https://redmine.dataone.org/issues/8537

It looks like SIDs are pretty touchy right now.

@jagoldstein jagoldstein added the blocker inhibiting support team from publishing with this feature label Apr 25, 2018
@jagoldstein
Copy link

Yes, the SIDs have been pretty touchy and we are reluctant to suggest or apply them as is.

We have a publishing work-around for SIDs right now, but using it does not instill a lot of confidence in me, thus we are only using SIDs sparingly (when contributors really push us for one or if it seem like it's the only good solution). That is why I have applied my new label "blocker" here.

@amoeba
Copy link
Contributor

amoeba commented Apr 25, 2018 via email

@jagoldstein
Copy link

thanks @amoeba

@amoeba
Copy link
Contributor

amoeba commented Apr 26, 2018

We talked about this today on our weekly dev call, which some DataONE staff joined for another reason. Dave at DataONE will take a look at this and other SID issues and we'll work from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker inhibiting support team from publishing with this feature bug
Projects
None yet
Development

No branches or pull requests

3 participants