Using Highlights in typesense-go
TLDR Oliver worried about using highlights involving HTML tags in typesense-go
, as they mix trusted and untrusted content. Jason advises HTML sanitization before ingesting data and using arbitrary strings as highlighters.
2
Sep 28, 2023 (2 months ago)
Oliver
05:22 PMtypesense-go
but I am not sure how much that matters.I note that in
typesense-go
, two search parameters HighlightStartTag
and HighlightEndTag
default to <mark>
and </mark>
respectively. This suggests to me that the expectation is that you should be able to use HTML tags as the highlight markers. However, I also note that the Typesense results are not HTML escaped. So if we have a document that says something like "did you know that <script>alert(1)</script> is an xss payload" then if I search for the word "know", I end up with did you <mark>know</mark> that <script>alert(1)</script> is an xss payload
and I don't know what to do with this. If I render it, the script fires, which is obviously not what I want. If I escape it, the highlight doesn't work.I could run each of the results through an HTML sanitizer, but before going down that route I just want to do a quick sanity check here that i'm not doing something silly, because I have a hard time believing that the expected usage case involves receiving a string that mixes both trusted and untrusted content.
Jason
05:38 PMOliver
05:41 PMOliver
05:44 PMOliver
05:44 PMdid you
know
that <script>alert(1)</script> is an xss payloadnot like this:
did you
know
that is an xss payloadOliver
05:47 PMHighlightStartTag
and HighlightEndTag
, HTML-escaping the result, then swapping in the hightlight HTML tags for the unlikely-to-appear strings that we used. I just assumed there would be a simpler way, since typesense-go
defaults to using HTML tags directly as these values, and I was having a hard time reconciling that with needing to do thisJason
05:48 PMJason
05:48 PMYeah, that's the other thing I was going to suggest
Jason
05:48 PMJason
05:49 PMOliver
05:51 PM1
Oliver
05:51 PMJason
05:54 PMOliver
06:05 PM1
Typesense
Indexed 3015 threads (79% resolved)
Similar Threads
Ignoring HTML Tags in Typesense Document Search
Shouvik inquired about avoiding HTML tags in Typesense searches. Kishore Nallan and Ricardo suggested storing HTML in non-searchable fields. Kishore Nallan proposed adding an HTML-skip flag at indexing, to which Shouvik agreed, and initiated an issue tracking on Github.
Transitioning from Meilisearch to Typesense - Questions and Suggestions
Al is moving from Meilisearch to Typesense and asked for similar matching information features. They also proposed adding daily backups. Kishore Nallan helped them find a workaround, while noting expected complexities, and agreed to include their suggestions in their backlog.
Issue with escapeHTML and Search Highlighting
Digamber is having trouble with the search highlighting not working when escapeHTML is set to false. Kishore Nallan and Jason try to help but the issue remains unresolved.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
JavaScript Client's Return of Highlights Inquiry
Daniel questioned why the JavaScript client returned highlights in an array instead of an object. Kishore Nallan explained it was due to specific issues with statically typed languages needing defined JSON structures for parsing.