Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trivial "Do Do" false-positives #54

Open
mubaldino opened this issue Jun 30, 2020 · 3 comments
Open

Trivial "Do Do" false-positives #54

mubaldino opened this issue Jun 30, 2020 · 3 comments
Assignees
Milestone

Comments

@mubaldino
Copy link
Member

Describe the bug
"Do. Do", "do. Do", "in Do"`, etc. are common false positives found still.

To Reproduce
Xponents 3.3

Expected behavior
Better filtering of these. Likely use a spaCy NER model to offer POS tags and eliminate obvious errs.

@mubaldino mubaldino self-assigned this Jun 30, 2020
@mubaldino
Copy link
Member Author

Add "text_norm" to indexer to review common false-pos still appearing.

@mubaldino
Copy link
Member Author

Addressed in part by NonSenseFilter -- removing lowercase matches.

@mubaldino mubaldino added this to the Xponents 3.5 milestone Dec 15, 2021
@mubaldino
Copy link
Member Author

Seems more like gazetteer ETL fixes than a pattern generalization. If such trivial gazetteer entries should never be tagged, then we mark them search_only=1

@mubaldino mubaldino modified the milestones: Xponents 3.5, Xponents 3.6 Feb 7, 2022
@mubaldino mubaldino modified the milestones: Xponents 3.6, Xponents 3.7 Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant