diff --git a/notes/2024-05-22_design-meeting.md b/notes/2024-05-22_design-meeting.md new file mode 100644 index 0000000..001310f --- /dev/null +++ b/notes/2024-05-22_design-meeting.md @@ -0,0 +1,106 @@ +--- +title: "OGDC Design Meeting" +date: "2024-05-22" +--- + +## Background + +We identified a need for a longer (2+ hours) collaborative design session to flesh out a +design vision and technical roadmap. We also identified a need to avoid a Big Design Up +Front (BDUF) approach. + +[Previous discussion notes about dev milestones](https://docs.google.com/document/d/1IOWOMqCkb7HLzh2Gq0I0LjreoPm5xymBO67Utly-pks/edit) + + +## Discussion + +The sub-headers added here are not intended to be exhaustive. Please add more. + + +### Design goals and requirements + +Documentation: https://qgreenland-net.github.io/requirements.html + +* Matt J: Be able to do everything in: + * https://github.com/PermafrostDiscoveryGateway/viz-staging + * https://github.com/PermafrostDiscoveryGateway/viz-raster + * https://github.com/PermafrostDiscoveryGateway/viz-3d-tiles + * https://github.com/PermafrostDiscoveryGateway/viz-points + * Be able to generate a processing pipeline based on examining incoming data +* Matt J: What is it we’re going to produce? + * Envisioning a set of services running and waiting for requests. + * Or a workflow platform where you submit steps to be executed. + * **Workflow platform seems like where we’re headed.** +* Matt J: Cluster configuration can drastically affect the design of workflows +* Matt J: Dynamic workflow generation + * Based on input data, generate a workflow DAG + * Currently static from config file. Real world example: + +* Matt J summary: Deal with transformations for existing PDG visualization + challenges. Think we’re on the right track. + + +## Tool selection + +https://qgreenland-net.github.io/evaluations/orchestrator/ + +* Considering: + * Argo + * How is storage managed? Dynamic PVCs? + * Continued evaluation: What does the parallel version of one of our example workflows + look like? Run on a tileset of N files + * Experiment with drone imagery? + * What UX tools are there? Can we generate an SVG graph from a YAML? + * Parsl + * Ray + * Promising for ML-specific stuff + + + + + + +## Implementation roadmap + +NOTE: Pre-populated by Trey & Matt, but we didn't get to discussing it today. + +Let’s break this into milestones! Brainstorm: + +* End-to-end data test (simple case): take some data, apply transformations, and publish + results as DataONE dataset + * Implemented some workflows in Argo & Parsl; remaining tasks are publishing to DataONE + and triggering automatically from GitHub events. +* Migrate QGreenland workflows to selected orchestrator + * One existing workflow (arctic circle) successfully migrated programmatically to Argo + YAML, ~20 others still need testing, ~200 more still need implementing. +* Implement big and complex processing case for Cesium 3D tiles using e.g., drone imagery + data as input +* Implement community accessibility functionality, e.g. bots, checks, and other + automations on GitHub +* ...? +* Build QGreenland using data transformed and published to DataONE using OGDC +* Extract QGreenland’s framework code to a “QAnywhere” library for compiling regional QGIS + data projects + + + +## What other decisions do we need to finalize in this meeting? + + +## What are the next decision points? Do we need a follow-up meeting? + +* Decide on a workflow tech! Depends on action items. + + +## Action items + +- [ ] Rushiraj & Matt: Pick 3 datasets (small serial, medium, large parallel), build workflows + for them and _publish_ to same PVC the visualization app is using to read + from (see Rushiraj’ PDG branch). Evaluate at the end of each step (small, + medium, large). Medium: Hydrology Ice Basins; Large: drone imagery dataset (see + notes from last architecture meeting)?. + - [ ] Rushiraj: Work with ADC k8s admins to install Argo Workflows to “argo” namespace +- [x] Matt: Set up a new daily standup meeting (10 minutes) without Trey. Reach out if we need + him! + + diff --git a/requirements.md b/requirements.md index 2a29565..7628e91 100644 --- a/requirements.md +++ b/requirements.md @@ -2,7 +2,13 @@ title: "Requirements" --- -[See related GitHub issue](https://github.com/QGreenland-Net/.github/issues/31) +* [See related GitHub issue](https://github.com/QGreenland-Net/.github/issues/31) +* PDG transformation workflows + * [Staging/Tiling](https://github.com/PermafrostDiscoveryGateway/viz-staging) + * [Rasterization](https://github.com/PermafrostDiscoveryGateway/viz-raster) + * [3d-tiles](https://github.com/PermafrostDiscoveryGateway/viz-3d-tiles) + * [point clouds](https://github.com/PermafrostDiscoveryGateway/viz-points) + * [overview](https://github.com/PermafrostDiscoveryGateway/viz-info) ## Data transformations @@ -15,34 +21,44 @@ title: "Requirements" * Subset * Resample (down/upsample or re-grid) * File-level metadata changes, e.g.: - * Assignment or correction of projection - * `gdal_edit` operations + * Assignment or correction of projection + * `gdal_edit` operations * Raster math, e.g.: - * `gdal_calc.py` + * `gdal_calc.py` * Compression, e.g.: - * Apply `DEFLATE` compression to geotiff -* Build overviews, e.g.: - * `gdaladdo` + * Apply `DEFLATE` compression to geotiff +* Generate overviews / tile pyramids, e.g.: + * `gdaladdo` * Vector geometry operations - * Make valid + * Feature deduplication (_expensive_) + * Make valid * Simplify (less points) * Segmentize (more points) - * Filtering (e.g. SQL `WHERE`) - * Changing / adding attributes (e.g. calculating a `label` attribute from a - `value` and `unit` attribute) + * Filtering (e.g. SQL `WHERE`) + * Changing / adding attributes (e.g. calculating a `label` attribute from a + `value` and `unit` attribute) * Generating / combining data - * Contourize (raster data -> vector contours) + * Vector <-> raster transforms + * Contourize (raster data -> vector contours) * Climatological mean or other data-reductions - * Enriching datasets / data fusion / data integration (e.g. combining - attributes from at least 2 vector data sources) + * Enriching datasets / data fusion / data integration (e.g. combining + attributes from at least 2 vector data sources) * Tiling (large dataset -> many chunks) * Mosaicing (many chunks -> unified dataset) +* Tiling/Mosaicing specific challenges: + * Managing "edge effects": When feature spans a tile boundary, how is it managed? Keep + it in tile of centroid. Split. Keep whole polygon in all tiles it intersects. Other + algorithms. All trade-offs. ## Service-y stuff +* Workflow service for running arbitrary geospatial workflows + * libraries of transformation functions + * workflow libraries for composition + * gdal and ogr as base building blocks * User submitted recipes that trigger a workflow that results in downloadable - data file(s) archived as a new DataONE dataset + data file(s) archived as a new DataONE dataset :::{.callout-important} We've been making the assumption that we'd be archiving our outputs, even if the @@ -55,4 +71,4 @@ title: "Requirements" ::: * Creation of [3D Tiles](https://www.ogc.org/standard/3dtiles/) for geospatial - datasets to enable fast viz in portal cesium app + datasets to enable fast viz in portal cesium app