Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Latest TextTagger in other languages/scripts #36

Open
mubaldino opened this issue Jul 25, 2019 · 0 comments
Open

Test Latest TextTagger in other languages/scripts #36

mubaldino opened this issue Jul 25, 2019 · 0 comments
Assignees
Labels
Milestone

Comments

@mubaldino
Copy link
Member

mubaldino commented Jul 25, 2019

Describe the bug
TextTagger usage with languages other than English.

To Reproduce

  • Java or Python version: Any Java (openjdk 8 and 12)
  • Usage: Arabic text produces a "zero-length token" exception from TextTagger process()
  • Data input:
  • Did you enable logging (level = DEBUG)?
  • Other notes:
15:59:47.288 [main] ERROR org.apache.solr.handler.RequestHandlerBase - java.lang.IllegalArgumentException: term:  analyzed to a zero-length token
	at org.apache.solr.handler.tagger.Tagger.process(Tagger.java:142)
	at org.apache.solr.handler.tagger.TaggerRequestHandler.handleRequestBody(TaggerRequestHandler.java:231)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2551)
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:191)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
	at org.opensextant.extraction.SolrMatcherSupport.tagTextCallSolrTagger(SolrMatcherSupport.java:181)
	at org.opensextant.extractors.geo.GazetteerMatcher.tagText(GazetteerMatcher.java:444)
	at org.opensextant.extractors.geo.GazetteerMatcher.tagText(GazetteerMatcher.java:404)
	at org.opensextant.extractors.geo.PlaceGeocoder.extract(PlaceGeocoder.java:475)
	at org.opensextant.extractors.test.TestPlaceGeocoder.tagFile(TestPlaceGeocoder.java:57)
	at org.opensextant.extractors.test.TestPlaceGeocoder.main(TestPlaceGeocoder.java:164)

Expected behavior

More reasonable behavior is expected from TextTagger -- its possible the whole Solr 7.x assembly needs to be replaced with a clean setup and fully reindex data.

@mubaldino mubaldino self-assigned this Jul 25, 2019
@mubaldino mubaldino added the bug label Jul 25, 2019
@mubaldino mubaldino added this to the Xponents 3.6 milestone Dec 15, 2021
@mubaldino mubaldino modified the milestones: Xponents 3.6, Xponents 3.7 Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant