Using Highlights in typesense-go
TLDR Oliver worried about using highlights involving HTML tags in
typesense-go, as they mix trusted and untrusted content. Jason advises HTML sanitization before ingesting data and using arbitrary strings as highlighters.
Sep 28, 2023 (2 months ago)
typesense-gobut I am not sure how much that matters.
I note that in
typesense-go, two search parameters
</mark>respectively. This suggests to me that the expectation is that you should be able to use HTML tags as the highlight markers. However, I also note that the Typesense results are not HTML escaped. So if we have a document that says something like "did you know that <script>alert(1)</script> is an xss payload" then if I search for the word "know", I end up with
did you <mark>know</mark> that <script>alert(1)</script> is an xss payloadand I don't know what to do with this. If I render it, the script fires, which is obviously not what I want. If I escape it, the highlight doesn't work.
I could run each of the results through an HTML sanitizer, but before going down that route I just want to do a quick sanity check here that i'm not doing something silly, because I have a hard time believing that the expected usage case involves receiving a string that mixes both trusted and untrusted content.
knowthat <script>alert(1)</script> is an xss payload
not like this:
knowthat is an xss payload
HighlightEndTag, HTML-escaping the result, then swapping in the hightlight HTML tags for the unlikely-to-appear strings that we used. I just assumed there would be a simpler way, since
typesense-godefaults to using HTML tags directly as these values, and I was having a hard time reconciling that with needing to do this
Yeah, that's the other thing I was going to suggest
Indexed 3015 threads (79% resolved)
Ignoring HTML Tags in Typesense Document Search
Shouvik inquired about avoiding HTML tags in Typesense searches. Kishore Nallan and Ricardo suggested storing HTML in non-searchable fields. Kishore Nallan proposed adding an HTML-skip flag at indexing, to which Shouvik agreed, and initiated an issue tracking on Github.
Transitioning from Meilisearch to Typesense - Questions and Suggestions
Al is moving from Meilisearch to Typesense and asked for similar matching information features. They also proposed adding daily backups. Kishore Nallan helped them find a workaround, while noting expected complexities, and agreed to include their suggestions in their backlog.
Issue with escapeHTML and Search Highlighting
Digamber is having trouble with the search highlighting not working when escapeHTML is set to false. Kishore Nallan and Jason try to help but the issue remains unresolved.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.