You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integration tests of DataStore Load (in cypress) e.g. we upload a file to CKAN staging instance and 5m later data is in datastore. We have the tests, but they are assuming a local npm test. Right now they are pointing to a temporary CKAN instance (not to DX) and they run the entire flow. No CI for this on github atm
CD of AirCan and CKAN extension into staging (includes terraform setup of GCC) Have automated deployment script - see MVP DX actions including continuous deployment #66 but we don't have CD such that changes to AirCan DAGs or the ckan extension get auto re-deployed.
Refactor DAGs and ckanext-aircan etc to take a run_id which you can pass in to the DAG and which it uses in logging etc when running it so we can reliably track logs etc. Also move airflow status info into logs (so we don't depend on AirFlow API).
Research how others solve this problem of getting unique run ids per DAG run in AirFlow (and how we could pass this info down into stackdriver so that we can filter logs). Goal is that we have a reliable aircan_status(run_id) function that can be turned into an API in CKAN (or elsewhere)
Instance of Google Cloud Composer and a way to update DAGs there.
Should it be a test instance OR do we could use production (think this is OK in part because we can create new DAGs if we need so we don’t interfere with existing ones. E.g. Suppose we want to update datastore_load_dag and that is being used by production CKAN instances … Well, we can create datastore_load_dag_v2)?ANS: Use Production
graph TD
v1[v0.1 CSV load working, CI/CD setup with rich tests]
v2[v0.2 errors, logging and UI integration]
v3[UI integration]
v3[v0.3 expand the tasks and e.g. xlsx, google sheets loading]
v4[v0.4 harvesting ...]
v1 --> v2
v2 --> v3
v3 --> v4
Loading
Detailed
graph TD
deploytotest[Deploy DAGs to test GCC]
deploydags[Deploy DAGs into this AirFlow<br/>starting with CKAN data load]
deploygcc[Deploy Airflow<br/>i.e. Google Cloud Composer]
nhsdag[NHS DAG for loading to bigquery]
nhs[NHS Done: instance updated<br/>with extension and working in production]
logging[Logging]
reporting[Reporting]
othersite["Other Site Done"]
start[Start] --> deploygcc
start --> logging
multinodedag --> deploytotest
subgraph General Dev of AirCan
errors[Error Handling]
aircanlib[AirCan lib refactoring]
multinodedag[Multi Node DAG]
logging --> reporting
end
subgraph Deploy into Datopian Cluster
deploytotest[Deploy DAGs to test GCC] --> deploydags
deploygcc --> deploydags
end
subgraph CKAN Integration
setschema[Set Schema from Resource]
endckan[End CKAN work]
setschema --> endckan
end
deploydags --> nhsdag
deploydags --> othersite
endckan --> nhs
subgraph NHS
nhsdag --> nhs
end
classDef done fill:#21bf73,stroke:#333,stroke-width:1px;
classDef nearlydone fill:lightgreen,stroke:#333,stroke-width:1px;
classDef inprogress fill:orange,stroke:#333,stroke-width:1px;
classDef next fill:lightblue,stroke:#333,stroke-width:1px;
class multinodedag done;
class versioning nearlydone;
class setschema,errors,deploydags,nhsdag,deploygcc inprogress;
Loading
The text was updated successfully, but these errors were encountered:
This the uber-epic for the complete evolution of CKAN DataStore load to AirCan.
Acceptance
Tasks
npm test
. Right now they are pointing to a temporary CKAN instance (not to DX) and they run the entire flow. No CI for this on github atmrun_id
which you can pass in to the DAG and which it uses in logging etc when running it so we can reliably track logs etc. Also move airflow status info into logs (so we don't depend on AirFlow API).aircan_status(run_id)
function that can be turned into an API in CKAN (or elsewhere)Plan of work (from 4 nov)
FUTURE after this
Detailed
The text was updated successfully, but these errors were encountered: