Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent annotations #248

Open
gfr10598 opened this issue May 23, 2019 · 2 comments
Open

Inconsistent annotations #248

gfr10598 opened this issue May 23, 2019 · 2 comments

Comments

@gfr10598
Copy link
Contributor

A join between NDT tables in staging and prod shows about 650 rows that have inconsistent geo annotations, specifically the city annotation.

https://console.cloud.google.com/bigquery?sq=240028626237:5657b748c8f3448493a06333a5606bf8

Also, manual queries against the legacy annotator interface produce strange annotations that don't match what is in bigquery.
This shows up as Fremont, CA, but maxmind geo2 website places it in MN.
https://annotator-dot-mlab-sandbox.appspot.com/annotate?since_epoch=40000000&ip_addr=184.105.50.34
-> {"Geo":{"continent_code":"NA","country_code":"US","country_code3":"USA","country_name":"United States","region":"CA","metro_code":807,"city":"Fremont","area_code":510,"postal_code":"94539","latitude":37.515,"longitude":-121.896},"Network":null}

@autolabel autolabel bot added the review/triage Team should review and assign priority label May 23, 2019
@gfr10598
Copy link
Contributor Author

gfr10598 commented May 23, 2019

In some cases, this inconsistency may be consistent with using different timestamps.
For example, "173.17.228.84" apparently switched from Albany, GA to Adel on April 2. Even though both staging and prod re-processed this in May, it appears that staging is used an older annotation.

Here are current results from staging and oti for different dates, using v2 api. First is OTI, second is Staging.
Note these are different lat/long, in different states.
***NOTE: Running this multiple times sometimes produces slightly different results!!!

(time.Time) 2019-04-04 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-03 00:00:00 +0000 UTC,
City: (string) (len=4) "Adel",
Latitude: (float64) 41.6221,
Longitude: (float64) -94.038
(time.Time) 2019-04-04 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-03 00:00:00 +0000 UTC,
City: (string) (len=4) "Adel",
Latitude: (float64) 41.6221,
Longitude: (float64) -94.038
(time.Time) 2019-04-03 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-03 00:00:00 +0000 UTC,
City: (string) (len=4) "Adel",
Latitude: (float64) 41.6221,
Longitude: (float64) -94.038
(time.Time) 2019-04-03 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-03 00:00:00 +0000 UTC,
City: (string) (len=4) "Adel",
Latitude: (float64) 41.6221,
Longitude: (float64) -94.038
(time.Time) 2019-04-02 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-01 00:00:00 +0000 UTC,
City: (string) (len=6) "Albany",
Latitude: (float64) 31.5105,
Longitude: (float64) -84.3087
(time.Time) 2019-04-02 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-01 00:00:00 +0000 UTC,
City: (string) (len=4) "Adel",
Latitude: (float64) 41.6221,
Longitude: (float64) -94.038
(time.Time) 2019-04-01 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-01 00:00:00 +0000 UTC,
City: (string) (len=4) "Adel",
Latitude: (float64) 41.6221,
Longitude: (float64) -94.038
(time.Time) 2019-04-01 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-04-01 00:00:00 +0000 UTC,
City: (string) (len=6) "Albany",
Latitude: (float64) 31.5105,
Longitude: (float64) -84.3087
(time.Time) 2019-03-31 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-03-25 00:00:00 +0000 UTC,
City: (string) (len=6) "Albany",
Latitude: (float64) 31.5105,
Longitude: (float64) -84.3087
(time.Time) 2019-03-31 00:00:00 -0400 EDT
AnnotatorDate: (time.Time) 2019-03-25 00:00:00 +0000 UTC,
City: (string) (len=6) "Albany",
Latitude: (float64) 31.5105,
Longitude: (float64) -84.3087

@gfr10598
Copy link
Contributor Author

So, it looks like there is an instability at dates where the dataset changes. Perhaps one of the instances reports one value, and the other may report a different value? Probably should allow dumping of the directory so that we can assess whether there are differences between instances.

@autolabel autolabel bot removed the review/triage Team should review and assign priority label Jun 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants