Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daily Check List 09/12/2024 - 13/12/2024 #1141

Open
5 tasks done
rosemaryjoconnor opened this issue Dec 8, 2024 · 5 comments
Open
5 tasks done

Daily Check List 09/12/2024 - 13/12/2024 #1141

rosemaryjoconnor opened this issue Dec 8, 2024 · 5 comments
Assignees

Comments

@rosemaryjoconnor
Copy link
Contributor

rosemaryjoconnor commented Dec 8, 2024

Every day:

  • monitor the index switch

    • openstdout in step k. Check index and update collection alias
    • scroll to bottom and record Current#, NEW#, DIFF# values
  • all scheduled jobs for the day completed successfully (raise on Slack if not)

  • assertions sync has run - check user assertion counts are in DQ Profile
    https://biocache.ala.org.au/occurrences/search?q=

    Description
  • images processing is progressing and not stalled - take note of the lastUpdated field in a loading batch

  • check there are no long-running (ie > 24 hours) clusters

    • In Airflow - select Browse->Dag Runs and filter on running state

      Description
  • Raise any issues on the data management internal channel

  • Monday

  • Tuesday

  • Wednesday

  • Thursday

  • Friday

Once a week (whenever):

  • sensitive data check (script in GitHub)

Job Schedule

Scheduled jobs - Github config

Daily

Time Job Job Name
3am Index to SOLR
7am OBIS dp5183-obis-weekly

Weekly

Day Time Job DR Job Name
Mon 12am BIS Weeds dr27665 dr27665-BISFetcher-weekly
Mon 7am iNaturalist dr1411 dr1411-inaturalist-weekly
Mon 7pm FeralScan dr19813 dr19813-feralscan-weekly
Tue 9am cPlatypus dr7973 dr7973-cplatypus-weekly
Tue 9am APII dr413 dr413-apii-weekly
Tue 9am Butterflies Aust dr16457 dr16457-butterflies-weekly
Tue 5pm Biocollect dp3903 dp3903-biocollect-weekly
Tue 8pm Bionet dr368 dr368-bionet-weekly
Wed 12am Perth AVH dr15863 dr15863-perth-avh-weekly
Wed 8am Questagame dr1902 dr1902-questagame-weekly
Wed 9am NatureMapr dr19123 dr19123-naturemapr-weekly
Wed 1pm QM dr344 dr344-QM-weekly
Thu 12am CANB AVH dr15860 dr15860-canb-avh-weekly
Thu 12.25am CNS AVH dr15867 dr15867-cns-avh-weekly
Thu 12.40am JCT AVH dr15868 dr15868-jct-avh-weekly

Monthly

Day Time Job DR Job Name
First Thu 5.15pm MV dr342 dr342-MV-monthly
First Wed 9am TurtleSat dr26141 dr26141-turtleSat-monthly
First Wed 9am UC Genomics dr25075 dr25075-UCGenomics-monthly
5th 12am UniMelb AVH dr13282 dr13282-melu-avh-monthly
6th 12am Allan Herbarium dr27654 dr27654-chr-avh-monthly
6th 12.01am WELT AVH dr26642 dr26642-welt-avh-monthly
6th 12.15am Auckland Museum Botanical dr26650 dr26650-ak-avh-monthly
6th 12.45am NZ PDD dr26651 dr26651-pdd-avh-monthly
19th 12am NSW AVH dr15861 dr15861-nsw-avh-monthly
20th 12am BRI AVH dr2287 dr2287-bri-avh-monthly
21st 12am AD AVH dr15865 dr15865-ad-avh-monthly
@rosemaryjoconnor rosemaryjoconnor self-assigned this Dec 8, 2024
@rosemaryjoconnor rosemaryjoconnor changed the title Daily Check List Daily Check List - 09/12/2024 Dec 8, 2024
@rosemaryjoconnor rosemaryjoconnor changed the title Daily Check List - 09/12/2024 Daily Check List 09/12/2024 to 13/12/2024 Dec 8, 2024
@rosemaryjoconnor rosemaryjoconnor changed the title Daily Check List 09/12/2024 to 13/12/2024 Daily Check List 09/12/2024 - 13/12/2024 Dec 8, 2024
@rosemaryjoconnor
Copy link
Contributor Author

09/12/2024
All daily jobs successful.

@rosemaryjoconnor
Copy link
Contributor Author

10/12/2024

  • SOLR Index successful: 52,094 new records, 51,098 from iNaturalist
  • dp3903 - BioCollect didn't run

@rosemaryjoconnor
Copy link
Contributor Author

11/12/2024

  • Namesmatching issue - ingest datasets DAGs all failed
  • Reran successfully later in day
  • Check index tomorrow

@rosemaryjoconnor
Copy link
Contributor Author

rosemaryjoconnor commented Dec 12, 2024

12/11/2204

  • All daily jobs successfully run
  • Index failed with species list error
SEVERE- The check_species_list_uid failed. Please check the logs for more details.
SEVERE- Some checks failed. Please check the logs for more details.
SEVERE- 12% of records don't have speciesListUid="dr26948" . Counts- Current#:8,981,248 New#: 7,860,313 Diff#:-1,120,935
  • Fix: TL - New list had been uploaded to prod causing the problem.
  • Decision made to switch index and fix list issue for tomorrow's load

Record Count
INFO - There are more records in the new index than the current one. CURRENT#: 146,499,269 NEW#: 147,092,540 DIFF#: 593,271

Async/Assertions: Old: 10,259 New: 10,211

New GBIF DRs

  • dr22496 - 126,223 records
  • dr22497 - 177,263 records
  • dr22501 - 122,982 records
  • dr22525 - 159,518 records

@rosemaryjoconnor
Copy link
Contributor Author

13/12/2024

Index failed - manual switch.

INFO- There are same number of records for speciesListUid="dr491" in both indexes. Counts#:473
INFO- There are equal number of records in the new index to the current one. CURRENT#: 147,092,540 NEW#: 147,092,540 DIFF#: 0
SEVERE- The check_min_fields_for_random_records failed. Please check the logs for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant