-
Notifications
You must be signed in to change notification settings - Fork 0
Data Analysis Process (Proposed)
- Manually obtain & analyze sample or real data.
- Write harvester script to programmatically obtain real data, with documentation on method, frequency, status of agreement, etc. Harvester script(s) should get and normalize source data for feeding from this repository into a Traject+ Pipeline.
- Use analysis to document proposed mapping of harvested data to the DLME IR MAP, including requested normalizations.
- Attempt writing a Traject+ Mapping Config for the above mapping, using the existing patterns. Ticket any mappings or normalizations that aren't currently supported. Use the mapping config only for this first pass (i.e. don't try to write modules). Consider need to write tests for mapping.
- Attempt running DLME application locally with above traject+ mappings being used, and analyze output to ticket any needed actions or questions on using the system.
All of the above then feeds into the work cycle, when the development team can review the documentation (with the intended goals of the harvesting & mapping), the attempted configs, and the tickets, and work on pipeline issues that ease the process for staging and production.
Run the following from wherever you may have Traject + Traject+ installed (for ease of use, the following uses the DLME codebase for this).
$ bundle exec traject -w JsonWriter -c config/traject.rb -c lib/traject/fgdc_config.rb -s source='harvard_fgdc' spec/fixtures/fgdc/HARVARD.SDE2.AFRICOVER_EG_RIVERS.fgdc.xml
$ bundle exec traject -w DebugWriter -c config/traject.rb -c lib/traject/fgdc_config.rb -s source='harvard_fgdc' spec/fixtures/fgdc/HARVARD.SDE2.AFRICOVER_EG_RIVERS.fgdc.xml
In a separate shell, start up the DLME application's Solr (presuming you've run bundle install
already & are running this from where you have the DLME codebase locally):
$ bundle exec solr_wrapper
In another separate shell, start up the DLME Rails application (same presumptions as above):
$ bundle exec rails s
Now in a third shell, same presumptions, run the DLME application's Traject+ installation with whatever mapping and a pointer to whatever harvested data you want to transform (note the difference here is the lack of a declared Writer):
$ bundle exec traject -w SolrWriter -c config/traject.rb -c lib/traject/fgdc_config.rb -s source='harvard_fgdc' spec/fixtures/fgdc/HARVARD.SDE2.AFRICOVER_EG_RIVERS.fgdc.xml
Alternatively for that very last step, you could run the DLME application's bin scripts as well, to load all the prototype data:
$ ./bin/fetch_and_import
$ bundle exec traject -w JsonWriter -c config/traject.rb -c lib/traject/fgdc_config.rb -s source='harvard_fgdc' spec/fixtures/fgdc/HARVARD.SDE2.AFRICOVER_EG_RIVERS.fgdc.xml