Resolving HTML Content Search Issues
TLDR Ramy encountered issues with HTML content search within tags. Jason initially suggested adding special characters to the
token_separators config but later recommended storing plain text of the HTML content. Ramy appreciated the advice. Ed also weighed in.
Sep 19, 2023 (2 months ago)
I am seeing a weird behavior (I am sure it can be fixed via some config)
We have some HTML content saved and indexed, but if we do a search by a word within tags with no space between, it will not be matched (although it can be matched if we include the
>or the full tag`)
token_separatorsconfig when creating the collection
network engineerand the html tag is in between two words”network <b> engineer”
Indexed 3011 threads (79% resolved)
Docsearch Scrapper Metadata Configuration and Filter Problem
Marcos faced issues with Docsearch scrapper not adding metadata attributes and filtering out documents without content. Jason helped fix the issue by updating the scraper and providing filtering instructions.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Ignoring HTML Tags in Typesense Document Search
Shouvik inquired about avoiding HTML tags in Typesense searches. Kishore Nallan and Ricardo suggested storing HTML in non-searchable fields. Kishore Nallan proposed adding an HTML-skip flag at indexing, to which Shouvik agreed, and initiated an issue tracking on Github.
Troubleshooting Issues with DocSearch Hits and Scraper Configuration
Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.