Ignoring HTML Tags in Typesense Document Search
TLDR Shouvik inquired about avoiding HTML tags in Typesense searches. Kishore Nallan and Ricardo suggested storing HTML in non-searchable fields. Kishore Nallan proposed adding an HTML-skip flag at indexing, to which Shouvik agreed, and initiated an issue tracking on Github.
1
May 01, 2021 (31 months ago)
Shouvik
02:05 PMKishore Nallan
02:06 PMShouvik
02:12 PMShouvik
02:13 PMShouvik
05:45 PMMay 02, 2021 (31 months ago)
Ricardo
06:16 AMhttps://typesense.org/docs/0.20.0/api/collections.html#with-pre-defined-schema
"Your documents can contain other fields not mentioned in the collection's schema - they will be stored on disk but not indexed in memory."
That said your
query_by
will define what gets searched on.Kishore Nallan
09:53 AMShouvik
01:29 PMShouvik
01:30 PMKishore Nallan
01:47 PM1
Shouvik
01:49 PMShouvik
02:03 PMTypesense
Indexed 2786 threads (79% resolved)
Similar Threads
Using Highlights in typesense-go
Oliver worried about using highlights involving HTML tags in `typesense-go`, as they mix trusted and untrusted content. Jason advises HTML sanitization before ingesting data and using arbitrary strings as highlighters.
Issue with escapeHTML and Search Highlighting
Digamber is having trouble with the search highlighting not working when escapeHTML is set to false. Kishore Nallan and Jason try to help but the issue remains unresolved.
Discussing Typesense Search Highlighting Capabilities
Jack enquiries about getting highlight data to include all fields in an object on Typesense. Jason clarifies that only specific fields in 'query_by' will be returned, which resolves the issue for Jack.
Keyword Highlighting Issue on Docusaurus with Typesense
Inas experienced trouble with keyword highlighting on a Docusaurus doc site using Typesense. After a detailed discussion, Jason clarified that the desired functionality isn't possible to implement in a cross-browser compatible way without modifications to Docusaurus core.
Resolving HTML Content Search Issues
Ramy encountered issues with HTML content search within tags. Jason initially suggested adding special characters to the `token_separators` config but later recommended storing plain text of the HTML content. Ramy appreciated the advice. Ed also weighed in.