Come up with strategy for upgrading Is Part Of field values #43

karenmajewicz · 2022-08-04T18:59:50Z

One of the main incompatibilities between Metadata 1.0 and Aardvark is the Is Part Of field. In the 1.0, this was a string value. In Aardvark, this is an ID that is read by the GeoBlacklight application to link records together.

To upgrade, users would need to create new collection records for each unique value and replace the strings with the new IDs.

Pros:

collections can be fully described with their own metadata record
provides more stability by using nonliteral URIs instead of strings (if a name changed, only the collection record would need to be updated)

Cons:

The way GBL is set up, a user would have to make more "clicks" to get to other items in the same collection.
Since it is not a direct crosswalk, there is labor involved in creating new collection records and updating existing ones.

karenmajewicz · 2022-08-12T17:05:50Z

kgjenkins · 2022-08-19T17:05:17Z

The metadata converter at https://kgjenkins.github.io/gbl2aardvark/ will now automatically create new "Collections" records, using information from all the existing child records. Some of the fields (subject, keyword, etc.) aggregate all the unique values found in the child records, and the bbox (dcat_bbox, locn_geometry) is automatically expanded to include all the child record bboxes.

I've documented the process a bit in the README

I think this could be a viable approach, although one would certainly want to review the new collection records -- the descriptions will certainly need editing to better reflect the whole collection. And you may not really want every placename from all the child records to be listed in the collection record.

Date values may also require clean-up -- the script keeps every unique value (which works well for single years in gbl_indexYear_im) but dct_temporal_sm may have things like this:

   "dct_temporal_sm": [
      "1998-2013",
      "1998-2014",
      "1998-2015",
      "1998-2016",
      "1999-2013",
      "1999-2014", etc.

The collection records may also reveal spelling or capitalization inconsistencies in the child records. For example:

   "dct_subject_sm": [
      "Land Cover",
      "Land Use",
      "Land cover",
      "Land use",
      "Tree canopy", etc.

Of course, it could be nice to retain a "simple" collection field that just contains a string (similar to subject or keyword), but also have the option of the new relations-based dct_isPartOf_sm field.

karenmajewicz · 2022-08-22T22:08:03Z

In this case, dct_isPartOf_sm probably maps better to pcdm_memberOf_sm.

From the OGM documentation:
Is Part Of: To link items that are a subset of another item (e.g. a page in a book)
Member Of: To link items that are part of a collection

thatbudakguy · 2023-03-03T18:07:55Z

Another possible strategy that is supported by OpenGeoMetadata/GeoCombine#143 is to assume that it's possible to get a list of all collection records (in v1 format) before attempting the conversion from v1 to Aardvark. In Earthworks, we apparently use a layer_geom_type_s of "Collection" to indicate collections (which might not be valid in v1, but that's another story). You can export all the Collection records this way by making a query to solr.

Once you have a list of collection records and their layer_slug_s, you can make any kind of structured data (JSON directly from solr, CSV, etc.), and then parse it and pass it into the converter:

id_map = {
  'My Collection 1' => 'institution:my-collection-1',
  'My Collection 2' => 'institution:my-collection-2'
}

GeoCombine::Migrators::V1AardvarkMigrator.new(v1_hash: record, collection_id_map: id_map).run

This way, you can convert all records (including collections) at the same time:

Non-collection records will look up their collection IDs using the data and replace the collection name in dct_isPartOf_sm
Collection records will be converted to Aardvark just like the non-collection records

An interesting and debatably useful side-effect of this is that it collapses collections with the same name into a single collection. While testing out this strategy, I discovered that several collections in Earthworks are duplicated, probably accidentally. The "2010 China province population census data with GIS maps" collection has this version, with only one member, and this version with several members. While it's possible to have collections with the same name, it doesn't seem desirable from a user standpoint, so using this strategy is an easy way to consolidate duplicate collections at the same time you convert to Aardvark.

karenmajewicz · 2024-08-06T19:23:58Z

Do some research on a new field for this that would be a plain text value.

kgjenkins added this to OGM issues Feb 27, 2023

kgjenkins added the enhancement New feature or request label Feb 27, 2023

rmseifried moved this to Todo in OGM issues Feb 27, 2023

karenmajewicz mentioned this issue Aug 7, 2024

New proposal: add a plain text field analogous to isPartOf in GBL 1.0 #63

Open

karenmajewicz moved this from Priority To Do to In Progress in OGM issues Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Come up with strategy for upgrading Is Part Of field values #43

Come up with strategy for upgrading Is Part Of field values #43

karenmajewicz commented Aug 4, 2022

karenmajewicz commented Aug 12, 2022

kgjenkins commented Aug 19, 2022 •

edited

Loading

karenmajewicz commented Aug 22, 2022

thatbudakguy commented Mar 3, 2023 •

edited

Loading

karenmajewicz commented Aug 6, 2024

Come up with strategy for upgrading Is Part Of field values #43

Come up with strategy for upgrading Is Part Of field values #43

Comments

karenmajewicz commented Aug 4, 2022

karenmajewicz commented Aug 12, 2022

kgjenkins commented Aug 19, 2022 • edited Loading

karenmajewicz commented Aug 22, 2022

thatbudakguy commented Mar 3, 2023 • edited Loading

karenmajewicz commented Aug 6, 2024

kgjenkins commented Aug 19, 2022 •

edited

Loading

thatbudakguy commented Mar 3, 2023 •

edited

Loading