Addressing TS Cloud Highlight Issues
TLDR Orion expressed concerns about TS Cloud's highlight handling in large documents. Kishore Nallan suggested a workaround by segmenting long texts into smaller documents.
Apr 24, 2022 (20 months ago)
Orion
04:01 PMThere are a couple issues on GH mentioning this, such as this one. I was wondering if there's been any further discussion, or any robust workarounds found. As the current approach is not sustainable for us.
Apr 25, 2022 (20 months ago)
Kishore Nallan
02:02 AMSplitting a long piece of text into smaller documents and then doing
group_by
on the document_id
is the best work around at the moment. If there are no paragraphs to split on, maybe just using a ballpark of 200 words per document will be sufficient.Typesense
Indexed 3005 threads (79% resolved)
Similar Threads
Discussion on Snippeting Multiple Matches
bnfd inquired about highlighting multiple matches with snippeting. Kishore Nallan stated it was possible by listing the fields in `highlight_full_fields`, but multiple snippets from various parts of a document couldn't be achieved due to Typesense's scoring parameters.
Custom Tokenization and Search Issues in Chinese Text
crapthings inquired about custom tokenizer for Chinese which Kishore Nallan mentioned is unsupported. They discussed tokenization affecting vector search and hybrid search. Testing by crapthings raised issues with certain words not working and problems with larger documents. Kishore Nallan advised splitting larger documents for indexing and suggested `group_by=parent_doc_id` for deduplication.
Issues with Displaying Paragraphs Using Typesense and React-instantsearch
Mark is struggling with displaying only relevant paragraphs of indexed body text in a UI with react-instantsearch and Typesense. Jason suggests modifications to the TypesenseInstantSearchAdapter instantiation, highlighting only relevant fields. The issue remains unresolved.