-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Facilitating Redeploying this Model in Other Cities #80
Comments
Also, is there any documentation for the calculate_heat_values.R? Bit opaque and really just curious to see the underlying math driving the calculation. Imagine it's some sort of inverse distance logic though curious what metric you used (Euclidean, Manhattan, whatever) and how that's justified. The report provided is very impressive though "local intensity" leaves a lot to the imagination. |
From my understanding of the code, there is a function called calculate_heat_values.R in the Functions folder. It appears to use a kernel density estimation with a grid of .01. |
Anticipating this question I actually already have a comparison between the Ironically I was initially annoyed that Allstate had created a new KDE This heatmap function is a great example of a limitation of the On Thu, Jul 30, 2015 at 7:49 AM, Rajiv Shah [email protected]
|
Thanks for the detail and ya I know from experience how it's tough to think about generalizing when you're in the thick of just getting it right for the immediate task at hand i.e. Chicago! Do you know off hand how the kde bandwith was selected? h <- if (missing(h)) Mostly just curious for that. And do you know how much computational efficiency is gained from that code line improvement? Is it material in generating the data to run the model? We're talking about an upper bound of 337k observations for the crime dataset so is run time really that much of an issue? More broadly though it does seem that best practice as we look to apply these sorts of city analytics projects to more than just a single city would be to use a generic package (ideally off CRAN) so it's easier to redeploy. Cheers, PA |
Out of all the scripts the heatmap calculations are by far the most time intensive. For example last night it took 668 seconds to run the heat map script, and the next longest time was the business download which only took 142 seconds. I have not benchmarked how much time the alternative KDE calculation saves. There is a discussion on the original |
BTW, I do think it would be good to test the effectiveness of different assumptions in the density estimation, and I'm not sure how much of this was done at Allstate. I was thinking it would be good to test
|
Part of the challenge with redeploying this cool model is that Chicago's food inspection data is as you can imagine somewhat different from say LA's, which is run by the County rather than the city. This list on the readme is a great conceptual starting point for digging into what data is need though more granularity is required for redeployment:
Business Licenses
Food Inspections
Crime
Garbage Cart Complaints
Sanitation Complaints
Weather
Sanitarian Information
The report is some help though really it'd be nice to have simple, clear metadata on what data fields are used, what values they take on and some quick data provenance about how those values were measured.
I suppose you can get that by digging into and comparing say this restaurant inspection field from LA: https://data.lacounty.gov/Public-Health/2014-Restaurants-And-Markets-Violations/kbia-7mpx
With what you used at Chicago though it'd be nice to know right up front Food Inspection [First field used] [Second field used] etc. Mostly just food for thought as a few of us play with redeploying this model and as a field we develop best practices for redeploying these sorts of pioneering tools in other cities.
Cheers,
PA
PS Say hi to Tom for me. Gonna see if in the MacArthur event in NYC next week.
The text was updated successfully, but these errors were encountered: