Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding various patterns to to training data #222

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ecatkins
Copy link

@ecatkins ecatkins commented Apr 5, 2018

Hi,

I've been working with the usaddress library, and have added some patterns that I have seen fail in my datasets. This commit includes the xml files for training (training/dealstat_addresses_v1.xml) and test sets (measure_performance/test_data/dealstat_tests_v1.xml). The csv files were excluded by the .gitignore file, I'm not sure if you require these?

Patterns

  1. Unknown Illinois pattern PlaceName in StateName #221: see the referenced Issue, I'm not sure why this was failing
  2. No StreetNamePostType: Sometimes common streets will be referenced without a StreetNamePostType e.g. "200 East Main, San Diego California"
  3. StreetNamePostType = "Grade": Not something I have come across more than once, I don't think it is very common. But I included the specific example "19 Hargrove Grade, Palm Coast FL 32137" in the training data (without a corresponding test).
  4. Rhode Island: "Rhode Island" is occasionally being picked up as a PlaceName not a StateName
  5. Direction in PlaceName: Sometimes a Direction in the PlaceName is being read as a StreetNamePostDirection e.g. "5548 Elmer Avenue, N. Hollywood, CA 91601"
  6. Fort Lauderdale: If the address does not have a StreetNamePostType, the "Fort" is being read in as such, rather than as part of the PlaceName e.g. "225 West Elm, Fort Lauderdale, FL 33301"

Both the nose tests and my tests are passing. Let me know how else I can be of assistance. I'm hoping to continue to add new patterns and make pull requests as I work through my datasets.

@ecatkins ecatkins changed the title no street post type pattern & unknown illinois pattern Adding various patterns to to training data Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant