Resolving HTML Content Search Issues
TLDR Ramy encountered issues with HTML content search within tags. Jason initially suggested adding special characters to the token_separators
config but later recommended storing plain text of the HTML content. Ramy appreciated the advice. Ed also weighed in.
2
1
1
Sep 19, 2023 (2 months ago)
Ramy
12:57 AMI am seeing a weird behavior (I am sure it can be fixed via some config)
We have some HTML content saved and indexed, but if we do a search by a word within tags with no space between, it will not be matched (although it can be matched if we include the
>
or the full tag`)Jason
12:59 AM<
and >
to the token_separators
config when creating the collection1
Jason
12:59 AMRamy
12:59 AM1
Ramy
01:00 AM/
in the </p>
?Jason
01:14 AMRamy
01:15 AMJason
01:52 AM1
Jason
01:54 AMRamy
01:54 AM1
Ed
09:04 AMEd
09:11 AMnetwork engineer
and the html tag is in between two wordsโnetwork <b> engineerโTypesense
Indexed 3011 threads (79% resolved)
Similar Threads
Docsearch Scrapper Metadata Configuration and Filter Problem
Marcos faced issues with Docsearch scrapper not adding metadata attributes and filtering out documents without content. Jason helped fix the issue by updating the scraper and providing filtering instructions.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Ignoring HTML Tags in Typesense Document Search
Shouvik inquired about avoiding HTML tags in Typesense searches. Kishore Nallan and Ricardo suggested storing HTML in non-searchable fields. Kishore Nallan proposed adding an HTML-skip flag at indexing, to which Shouvik agreed, and initiated an issue tracking on Github.
Troubleshooting Issues with DocSearch Hits and Scraper Configuration
Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.