#community-help

Customizing Snippets and Highlighting in Document Searches

TLDR bnfd wants to customize snippeting, highlighting and show multiple snippets in a search. Jason suggests using snippet_threshold, highlight_full_fields, opening a GitHub issue and breaking long documents into smaller parts.

Powered by Struct AI

1

Aug 24, 2021 (27 months ago)
bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
06:42 PM
is there a way to customize snippeting? Like show 1 line before and 1 line after the match, total 3 lines. Or in terms of words instead of lines
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:04 PM
Not at the moment. There's snippet_threshold which might be helpful.

There's also highlight_full_fields to get the full field and then you could snippet on the client-side
07:04
Jason
07:04 PM
Could you open a github issue for this?
bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
07:06 PM
yes
07:23
bnfd
07:23 PM
Jason Something relevant to this, I opened an issue regarding multiple snippets, maybe I misunderstood your reply but wouldn't the highlights helper just show the whole document with all highlights applied? the use case is that in a long document (let's say 20 pages) there are 10 occurrences of "foo" so I'd like to show 10 snippets for document1 instead of the whole document. Is this possible?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:27 PM
> wouldn't the highlights helper just show the whole document with all highlights applied?
That's correct.

> long document (let's say 20 pages) there are 10 occurrences of "foo" so I'd like to show 10 snippets for document1 instead of the whole document. Is this possible?
I guess I misunderstood your original ask. This is not possible at the moment. But for long documents, to keep search relevancy and speed good, I'd recommend breaking them up into say one paragraph per record.

That will also solve what you're looking to do indirectly, because then you can show 10 different snippets, 1 snippet from each document (paragraph)
07:29
Jason
07:29 PM
This is good reading material from Algolia on the topic of indexing long documents: https://www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/how-to/indexing-long-documents/

The different in Typesense is that there is no limit to number of characters per document, but Algolia enforces a hard limit of 1K per document. So with Typesense we let you make the call on how large a document should be

1

bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
07:34 PM
I was between breaking it up or implementing it on the frontend. The problem is with some documents there are no paragraphs so it's not straightforward how to break it into parts.