Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geocoding_ban/ : French Geocoding with BAN #45

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

geocoding_ban/ : French Geocoding with BAN #45

wants to merge 1 commit into from

Conversation

Tristramg
Copy link

Ban is a French geocoding api http://adresse.data.gouv.fr/

@tainmar
Copy link

tainmar commented Feb 7, 2017

J'ai l'impression que tu as commité 2 plugins sur cette PR @Tristramg
Sinon bravo ça semble hyper utile.

@Tristramg
Copy link
Author

Oups, oui, effectivement, merci ! C’set corrigé !

@Tristramg Tristramg closed this Feb 9, 2017
@Tristramg Tristramg reopened this Feb 9, 2017
@Tristramg
Copy link
Author

geocoding_plugin

@cstenac
Copy link
Member

cstenac commented Feb 19, 2017

Hi,

Thanks for this contribution! Ever since BAN / BANO appeared, I had wanted to add this kind of features to DSS.

First of all, do you confirm that you wish us to publish this plugin?

A few things that I noticed:

  • Since the plugin dumps the data as returned by Addok, the "latitude" and "longitude" columns are not prefixed by the user-specified prefix, which might be a bit confusing
  • If the first "sampling" query fails (API error for example), only the "result_score" column will be written to the output dataset schema. Further queries that succeed will only have their "latitude" column written out in the result_score (since it's the first returned column). Probably we could just make the whole recipe fail if the sampling query fails, so as to ensure that we have a proper schema
  • It would make debugging easier to add a "result_error" (or something like that) column with failure details when result_score = -1 (HTTP code and response text)

I'll open a review for minor nitpicks, thanks again!

Copy link
Member

@cstenac cstenac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"name": "columns",
"label" : "Address columns",
"type": "COLUMNS",
"description":"Multiple columns will be concatenated",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "columnRole": "input" (name of an input role), and you'll have autocompletion on the column names. Same for postcode and citycode.

"name": "post_code",
"label" : "Column of the postcode",
"type": "COLUMN",
"description":"Only results of that postcode will be returned.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to state explicitely that this is optional

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in the text description? not only "mandatory": false ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Currently, we don't really do a good job at showing visual hints from the "mandatory" field, so we tend to be explicit in descriptions.

"type": "INT",
"defaultValue": 1000,
"mandatory": true,
"description": "Sending multiple requests in one iteration saves time"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the name is "Lines per request", we might want to avoid "requests per iteration", and repeat line per request in the description. Since the API limits to 8MB (and since I guess returns are diminishing), it might be useful to state that you should generally not go above 1000

{
// The identifier of the plugin.
// This must be globally unique, and only contain A-Za-z0-9_-
"id" : "geocoding_ban",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSS plugins generally have names with dashes rather than underscore, so you might want to name that geocoding-ban

// Meta data for display purposes
"meta" : {
// Name of this plugin that appears in the interface.
"label": "BAN Geocoding plugin",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "French geocoding (BAN)"

@Tristramg
Copy link
Author

Thank you for the review.

Yes, we would like to publish this plugin. I also asked the people behind Addok and BANO and they are happy to know that there is a plugin.

  • I fixed the missing prefix for the coordinates
  • The is an optional column for the error code
  • I included your suggestions

Now a few questions:

  • What would be the best way to abort the job (e.g. for the sampling query)
  • How should I log? To give info and warning during the process
  • Could it be made a processor instead of a recipe? I found no documentation how to do it
  • How could the plugin appear as in the right-hand menu when selecting a database?

@cstenac
Copy link
Member

cstenac commented Feb 20, 2017

Thanks!

  • To abort the job, simply raise a Python exception (with a clear error message), this will cause the job to fail
  • At the moment, we don't have the ability for plugins to write warnings in the "Warnings" tab, this is something which we plan to add. In the meantime, use the "logging" package and use logging.info / logging.warning (no need to configure it, the framework does it)
  • Very slow processes like this one (or any process which calls to an external API) does not work really well as a processor, because processors are called very often, so it's better as a recipe. We plan to do in the coming months a set of higher level APIs for custom recipes that only do "1 for 1" enrichment of lines, like most API-calling recipes. This will make it much easier to write this kind of plugins, handling everything related to errors, parallelism, batching, writing the output, ...
  • To make the recipe creatable from a dataset's "Actions" menu, add the following to your recipe.json file "selectableFromDataset" : "input",. This will make it appear in the menu, and the context dataset will be pre-added to the "input" role.

@Tristramg
Copy link
Author

Thank you. I hope I applied all your suggestions, and that I didn’t add some rubbish while doing so.

@cstenac cstenac changed the title Geocoding French Geocoding with BAN Mar 16, 2017
@Tristramg
Copy link
Author

🆙 ;)

@tdesfont tdesfont changed the title French Geocoding with BAN geocoding_ban/ : French Geocoding with BAN Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants