Discussing Typesense's Tokenization Feature
TLDR Roshan seeks to understand typesense's tokenization feature. Kishore Nallan explains that it tokenizes on spaces and suggests using a special character as a separator.
Feb 18, 2022 (21 months ago)
Roshan
11:06 AMRoshan
11:07 AMKishore Nallan
11:53 AMKishore Nallan
11:54 AMRoshan
12:09 PMtext, then I want
Hello world
to be one and everyone
to be another token Kishore NallanKishore Nallan
12:10 PMRoshan
12:16 PMKishore Nallan
12:29 PMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Resolving Typesense Search Issues
Conversation started by Maximilian about Typesense search behavior led to Users Kishore Nallan and Mike discussing and suggesting workaround, with Kishore Nallan promising an official solution soon. No final confirmation of resolution provided.
Restricting `token_separators` to a Specific Field in Typesense
Loic asked Jason about applying `token_separators` to a specific field in Typesense. Jason suggested opening a github issue to add this feature.
Tokenization and Indexing Fields with Typesense
kam wanted to understand how to control tokenization and indexing for certain fields. Jason explained that tokenization is applied during search queries and not during the indexing phase, and shared how to delete a document using an indexed unique value under `id`.
Handling Two-Word Queries with Custom Separators
Dima proposes adding a parameter to API for handling two-word queries. Jason suggests opening a GitHub issue for the feature request.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.