-
Notifications
You must be signed in to change notification settings - Fork 1
Make structured hours information available to the API #721
Comments
A few of our scrapers populate the Data looks like this: [
{
"day": "monday",
"opens": "09:00",
"closes": "15:00"
},
{
"day": "tuesday",
"opens": "09:00",
"closes": "15:00"
},
{
"day": "wednesday",
"opens": "09:00",
"closes": "15:00"
},
{
"day": "thursday",
"opens": "09:00",
"closes": "15:00"
},
{
"day": "friday",
"opens": "09:00",
"closes": "15:00"
}
] |
Running this long query returns source names that have select source_name, count(*)
from source_location
where json_array_length(json_extract_path(import_json::json, 'opening_hours')) > 0
group by source_name
|
The Lines 255 to 259 in 8ae0065
I'm going to add |
I'll also add |
Also hours_json_provenance_source_location and hours_json_last_updated_at
Back-populating this with a migration would be interesting. Normally this is risky because of the reduced bandwidth allowed between Cloud Build and Cloud SQL, but in this case I may be able to compose a SQL "UPDATE" that does all of the work with minimal bandwidth between the two - using this pattern: https://til.simonwillison.net/django/migration-using-cte |
Some queries I used to figure this out: with scraped_opening_hours as (select
matched_location_id,
json_extract_path(import_json::json, 'opening_hours') as opening_hours
from
source_location
where
source_name = 'vaccinefinder_org'
and json_array_length(json_extract_path(import_json::json, 'opening_hours')) > 0
and matched_location_id is not null
order by matched_location_id, last_imported_at)
select count(*) from scraped_opening_hours Returns 41562 What if a location (in this query represented by with scraped_opening_hours as (select distinct on (matched_location_id)
matched_location_id,
json_extract_path(import_json::json, 'opening_hours') as opening_hours
from
source_location
where
source_name = 'vaccinefinder_org'
and json_array_length(json_extract_path(import_json::json, 'opening_hours')) > 0
and matched_location_id is not null
order by matched_location_id, last_imported_at)
select count(*) from scraped_opening_hours Returns 40931. This confirms that there are indeed some locations with multiple with scraped_opening_hours as (select
matched_location_id,
json_extract_path(import_json::json, 'opening_hours') as opening_hours
from
source_location
where
source_name = 'vaccinefinder_org'
and json_array_length(json_extract_path(import_json::json, 'opening_hours')) > 0
and matched_location_id is not null
order by matched_location_id, last_imported_at)
select matched_location_id, count(*) from scraped_opening_hours
group by matched_location_id having count(*) > 1 This returned 613 rows. Checked that against a simpler version of a similar query: select matched_location_id, count(*)
from source_location
where source_name = 'vaccinefinder_org'
group by matched_location_id
having count(*) > 1 That returned 1719 rows - then I added the filter for just ones with opening hours: select matched_location_id, count(*)
from source_location
where source_name = 'vaccinefinder_org'
and json_array_length(json_extract_path(import_json::json, 'opening_hours')) > 0
group by matched_location_id
having count(*) > 1 Which returned 614 rows. That one row difference is because of the |
Here's the update query: with scraped_opening_hours as (
select distinct on (matched_location_id)
id as source_location_id,
matched_location_id,
json_extract_path(import_json::json, 'opening_hours') as opening_hours,
last_imported_at
from
source_location
where
source_name = 'vaccinefinder_org'
and json_array_length(json_extract_path(import_json::json, 'opening_hours')) > 0
and matched_location_id is not null
order by matched_location_id, last_imported_at
)
update location
set
hours_json = scraped_opening_hours.opening_hours,
hours_json_last_updated_at=scraped_opening_hours.last_imported_at,
hours_json_provenance_source_location_id=scraped_opening_hours.source_location_id
from
scraped_opening_hours
where
location.id = scraped_opening_hours.matched_location_id |
It took 31.4s to run against my local development environment, updating 40,396 locations. I'm going to try this on staging. |
Before running the data migration on staging, this query returns 0 rows: select id, name, full_address, hours, hours_json, hours_json_last_updated_at, hours_json_provenance_source_location_id
from location
where hours_json_provenance_source_location_id is not null
limit 100 |
Next steps:
|
I'm going to rename the |
https://vial-staging.calltheshots.us/admin/core/location/45185/change/ is an example record that now has this data: From this report: https://vial-staging.calltheshots.us/admin/core/sourcelocation/36015/change/ |
Actually I'll go with |
Here's an example of the new "derive_details" debug tool for a location with hours on staging: https://vial-staging.calltheshots.us/location/ltbpc |
That same page also shows the new APIv0 JSON preview: https://vial-staging.calltheshots.us/location/ltbpc {
"id": "ltbpc",
"name": "San Bernardino Health Center",
"provider": null,
"state": "CA",
"latitude": 34.0678,
"longitude": -117.28565,
"location_type": "Unknown",
"phone_number": "(714) 922-4100",
"full_address": "1873 Commercenter W;\nSan Bernardino, CA 92408",
"city": "San Bernardino",
"county": "San Bernardino",
"zip_code": "92408",
"hours": {
"unstructured": null,
"structured": [
{
"day": "thursday",
"opens": "07:00",
"closes": "18:00"
},
{
"day": "sunday",
"opens": "07:00",
"closes": "16:30"
}
]
},
"website": "https://myturn.ca.gov/",
"vaccines_offered": null,
"concordances": [
"vaccinefinder_org:04d8dfb6-5c63-4ab4-9a87-8b9d2dc498eb",
"us_carbon_health:a9db417a-f0b7-4e6c-b20e-a81935982974"
],
"last_verified_by_vts": null,
"vts_url": "https://www.vaccinatethestates.com/?lng=-117.28565&lat=34.06780#ltbpc"
} Note that there are no guarantees that the |
The staging API export failed after I deployed this code. No error in Sentry though. {
"insertId": "hhq9h4g3g8rhqh",
"jsonPayload": {
"targetType": "HTTP",
"url": "https://vial-staging.calltheshots.us/api/exportVaccinateTheStates",
"jobName": "projects/django-vaccinateca/locations/us-west2/jobs/vaccinatethestates-api-export-staging",
"@type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished",
"status": "UNKNOWN"
},
"httpRequest": {},
"resource": {
"type": "cloud_scheduler_job",
"labels": {
"job_id": "vaccinatethestates-api-export-staging",
"project_id": "django-vaccinateca",
"location": "us-west2"
}
},
"timestamp": "2021-07-08T20:33:00.654933771Z",
"severity": "ERROR",
"logName": "projects/django-vaccinateca/logs/cloudscheduler.googleapis.com%2Fexecutions",
"receiveTimestamp": "2021-07-08T20:33:00.654933771Z"
} |
I bet it's because I forgot to add vial/vaccinate/api/serialize.py Lines 68 to 100 in 41f388c
|
The export worked! {
"id": "rec00SICtL8KJiLim",
"name": "RITE AID PHARMACY 06466",
"provider": {
"name": "Rite-Aid Pharmacy",
"provider_type": "Pharmacy",
"vaccine_info_url": "https://www.riteaid.com/Covid-19"
},
"state": "CA",
"latitude": 32.71998,
"longitude": -117.16902,
"location_type": "Pharmacy",
"phone_number": "619-231-7405",
"full_address": "1411 KETTNER BOULEVARD, SAN DIEGO, CA 92101",
"city": null,
"county": "San Diego",
"zip_code": null,
"hours": {
"unstructured": "Monday - Sunday: 8:00 am-8:00 pm",
"structured": [
{
"day": "monday",
"opens": "08:00",
"closes": "20:00"
},
{
"day": "tuesday",
"opens": "08:00",
"closes": "20:00"
},
{
"day": "wednesday",
"opens": "08:00",
"closes": "20:00"
},
{
"day": "thursday",
"opens": "08:00",
"closes": "20:00"
},
{
"day": "friday",
"opens": "08:00",
"closes": "20:00"
},
{
"day": "saturday",
"opens": "10:00",
"closes": "18:00"
},
{
"day": "sunday",
"opens": "10:00",
"closes": "17:00"
}
]
},
"website": null,
"vaccines_offered": null,
"concordances": [
"google_places:ChIJt9skOKxU2YARERJMhNf4QfA",
"vaccinefinder:f8bd637a-1a6e-4262-b3f0-7c7a6b9b887d",
"vaccinespotter_org:7382057",
"vaccinefinder_org:f8bd637a-1a6e-4262-b3f0-7c7a6b9b887d",
"rite_aid:106466",
"_tag_provider:rite_aid",
"us_carbon_health:48063e1f-820f-4770-a9c5-3c4cb67077c7"
],
"last_verified_by_vts": "2021-04-22T22:28:33.637831+00:00",
"vts_url": "https://www.vaccinatethestates.com/?lng=-117.16902&lat=32.71998#rec00SICtL8KJiLim"
} |
Deployed to production. Here are some locations on production that now have |
Part of #705 - we have this from our scrapers.
The text was updated successfully, but these errors were encountered: