Addressing TS Cloud Highlight Issues

TLDR Orion expressed concerns about TS Cloud's highlight handling in large documents. Kishore Nallan suggested a workaround by segmenting long texts into smaller documents.

Powered by Struct AI
Apr 24, 2022 (20 months ago)
Photo of md5-8e802b48c0369226a7b50a22ab6e9e0c
04:01 PM
Hey all! Been working with TS Cloud for an internal knowledge management project and it's been great, however, it's important for our use-case to show context around multiple (sometimes all) highlights. Currently this means sending the entire highlighted field over the wire and handling snippets clientside, which for large docs is an immense amount of data and is slowing everything down.

There are a couple issues on GH mentioning this, such as this one. I was wondering if there's been any further discussion, or any robust workarounds found. As the current approach is not sustainable for us.
Apr 25, 2022 (20 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:02 AM
Typesense is optimized to find the best matched text segment that contains most/all keywords in the query to show as highlight. It will require significant effort to rewire that to handle multiple snippets within a large document without compromising on performance.

Splitting a long piece of text into smaller documents and then doing group_by on the document_id is the best work around at the moment. If there are no paragraphs to split on, maybe just using a ballpark of 200 words per document will be sufficient.