Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Prevent word truncation with enabled resultsHighlighting #4091

Open
loeffe1 opened this issue Jun 10, 2024 · 9 comments · May be fixed by #4114
Open

[BUG] Prevent word truncation with enabled resultsHighlighting #4091

loeffe1 opened this issue Jun 10, 2024 · 9 comments · May be fixed by #4114

Comments

@loeffe1
Copy link

loeffe1 commented Jun 10, 2024

When resultsHighlighting is enabled, words will be truncated at the start and / or end.

result
page

  • TYPO3 Version: 11.5.26
  • EXT:solr Version: 11.5.2
  • Used Apache Solr Version: 8.11.1
  • PHP Version: 8.1.28
  • MySQL Version: 8.1.28
@dkd-kaehm
Copy link
Collaborator

Thanks for reporting.
Which field is set in plugin.tx_solr.search.results.resultsHighlighting.highlightFields?

@loeffe1
Copy link
Author

loeffe1 commented Jun 10, 2024

Thanks for the quick reply!
The default value content is used.

@dkd-kaehm
Copy link
Collaborator

The one question more. What was the search term?

@loeffe1
Copy link
Author

loeffe1 commented Jun 10, 2024

In this particular case the search term actually occurs way later in the resulted text and is not part of the truncated text.

@dkd-kaehm
Copy link
Collaborator

Hmm, don't know how to avoid Widows and orphans
within EXT:solr, because the crop comes from Apache Solr in response see https://solr.apache.org/guide/solr/latest/query-guide/highlighting.html
Please look into the docs, if you find something I did not, let me know, we'll reopen that issue.
If nothing possible, please open a issue on Apache Solr tracker https://issues.apache.org/jira/projects/SOLR/issues/SOLR-17298?filter=allopenissues

PS: Deutsch ist aber auch lustig: https://de.wikipedia.org/wiki/Hurenkind_und_Schusterjunge

@loeffe1
Copy link
Author

loeffe1 commented Jun 14, 2024

In case anybody is having this issue aswell, I was able to achieve much better results with a few modifications in solrconfig.xml. So far I have not experienced any word truncation.

Within <requestHandler name="/select" class="solr.SearchHandler"> I added

<str name="hl.usePhraseHighlighter">false</str>
<str name="hl.useFastVectorHighlighter">true</str>
<str name="hl.boundaryScanner">breakIterator</str>

Within <searchComponent name="highlight" class="solr.HighlightComponent"><highlighting> I set the SimpleBoundaryScanner as non-default: <boundaryScanner name="default" default="false" class="solr.highlight.SimpleBoundaryScanner"> and added this:

<boundaryScanner name="breakIterator" default="true" class="solr.highlight.BreakIteratorBoundaryScanner">
  <lst name="defaults">
	  <!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -->
	  <str name="hl.bs.type">SENTENCE</str>
	  <!-- language and country are used when constructing Locale object.  -->
           <!-- And the Locale object will be used when getting instance of BreakIterator -->
	  <str name="hl.bs.language">de</str>
	  <str name="hl.bs.country">DE</str>
  </lst>	
</boundaryScanner>

@dkd-kaehm
Copy link
Collaborator

@loeffe1
Would you like to provide the pull request?

@dkd-kaehm dkd-kaehm reopened this Jun 14, 2024
@loeffe1
Copy link
Author

loeffe1 commented Jun 20, 2024

I can, if you'd like me to. I'm just wondering if this needs more testing.

@loeffe1
Copy link
Author

loeffe1 commented Jul 24, 2024

I have created a pull request for a basic version supporting english and german cores. The hl.bs.language and hl.bs.country fields are set dynamically via core properties. This should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants