#community-help

Discussing Typesense's Tokenization Feature

TLDR Roshan seeks to understand typesense's tokenization feature. Kishore Nallan explains that it tokenizes on spaces and suggests using a special character as a separator.

Powered by Struct AI
8
21mo
Solved
Join the chat
Feb 18, 2022 (21 months ago)
Roshan
Photo of md5-34813b759bbed85074c3a458c4a2a053
Roshan
11:06 AM
Kishore Nallan Kishore Akash Ankur Does typesense has tokenization feature? I want to split search query into tokens(for grouping of text etc)
11:07
Roshan
11:07 AM
If it doesn't have this feature natively, how can we use tokenization while querying and indexing?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:53 AM
Can you give me an example of tokenization you wish to do?
11:54
Kishore Nallan
11:54 AM
Typesense tokenizes on space and also if you define custom separators, it considers those as well.
Roshan
Photo of md5-34813b759bbed85074c3a458c4a2a053
Roshan
12:09 PM
ok, like if I want to tokenize "Hello word everyone"
text, then I want Hello world to be one and everyone to be another token Kishore Nallan
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:10 PM
We don't have a way to customize that behavior. You can always choose to tokenize before indexing by combining words using a symbol like "hello_world" and likewise do the same with the query as well.
Roshan
Photo of md5-34813b759bbed85074c3a458c4a2a053
Roshan
12:16 PM
ok, suppose I have my own tokenizer , then how can i pass those tokenized list of words while indexing to typesense? and how it will work on search? Kishore Nallan
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:29 PM
You've to use a special character as a separator rather than space. Then add that character as symbols_to_index configuration.