You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
TextTagger usage with languages other than English.
To Reproduce
Java or Python version: Any Java (openjdk 8 and 12)
Usage: Arabic text produces a "zero-length token" exception from TextTagger process()
Data input:
Did you enable logging (level = DEBUG)?
Other notes:
15:59:47.288 [main] ERROR org.apache.solr.handler.RequestHandlerBase - java.lang.IllegalArgumentException: term: analyzed to a zero-length token
at org.apache.solr.handler.tagger.Tagger.process(Tagger.java:142)
at org.apache.solr.handler.tagger.TaggerRequestHandler.handleRequestBody(TaggerRequestHandler.java:231)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2551)
at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:191)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
at org.opensextant.extraction.SolrMatcherSupport.tagTextCallSolrTagger(SolrMatcherSupport.java:181)
at org.opensextant.extractors.geo.GazetteerMatcher.tagText(GazetteerMatcher.java:444)
at org.opensextant.extractors.geo.GazetteerMatcher.tagText(GazetteerMatcher.java:404)
at org.opensextant.extractors.geo.PlaceGeocoder.extract(PlaceGeocoder.java:475)
at org.opensextant.extractors.test.TestPlaceGeocoder.tagFile(TestPlaceGeocoder.java:57)
at org.opensextant.extractors.test.TestPlaceGeocoder.main(TestPlaceGeocoder.java:164)
Expected behavior
More reasonable behavior is expected from TextTagger -- its possible the whole Solr 7.x assembly needs to be replaced with a clean setup and fully reindex data.
The text was updated successfully, but these errors were encountered:
Describe the bug
TextTagger usage with languages other than English.
To Reproduce
DEBUG
)?Expected behavior
More reasonable behavior is expected from TextTagger -- its possible the whole Solr 7.x assembly needs to be replaced with a clean setup and fully reindex data.
The text was updated successfully, but these errors were encountered: