Tokenization and Indexing Fields with Typesense
TLDR kam wanted to understand how to control tokenization and indexing for certain fields. Jason explained that tokenization is applied during search queries and not during the indexing phase, and shared how to delete a document using an indexed unique value under id
.
1
1
Aug 28, 2023 (3 months ago)
kam
06:53 PM1
Jason
06:54 PMkam
06:55 PMraw_field
I want to index the value as is without tokenizationJason
06:56 PMpre_segmented_query: true
documented here: https://typesense.org/docs/0.25.0/api/search.html#query-parametersJason
06:56 PMkam
06:57 PMkam
06:58 PMpre_segmented_query
, is there similar thing for index field that is applied during indexing phase?Jason
06:59 PMid
when you send the document to Typesense, you can then reference that document by ID to do crud operations on itJason
06:59 PMpre_segmented_query
has historically only been useful for CJK locales - most likely not required for the use-case you mentioned abovekam
07:00 PMid
field specific such that it won't tokenize?kam
07:00 PMJason
07:00 PMkam
07:01 PMkam
07:01 PMkam
07:01 PMkam
07:01 PMJason
07:02 PMfilter_by
which doesn't do any tokenization and will use a separate non-tokenized index to filter resultsJason
07:02 PMq
(full text search) parameterkam
07:02 PMkam
07:03 PM1
Typesense
Indexed 3015 threads (79% resolved)
Similar Threads
Custom Tokenization and Search Issues in Chinese Text
crapthings inquired about custom tokenizer for Chinese which Kishore Nallan mentioned is unsupported. They discussed tokenization affecting vector search and hybrid search. Testing by crapthings raised issues with certain words not working and problems with larger documents. Kishore Nallan advised splitting larger documents for indexing and suggested `group_by=parent_doc_id` for deduplication.
Troubleshooting "drop_tokens_threshold" and Typo Tolerance in Typesense
Joe had issues with "drop_tokens_threshold" = 0 and typo tolerance in Typesense, after which Kishore Nallan provided solutions and clarifications on feature functionality. Their issues with the search result limit and tokens were resolved after discussion and testing.
Resolving Typesense Search Issues
Conversation started by Maximilian about Typesense search behavior led to Users Kishore Nallan and Mike discussing and suggesting workaround, with Kishore Nallan promising an official solution soon. No final confirmation of resolution provided.
Issue with Query Expectations on Typesense Search
Sean was having an issue with their search query on Typesense. Kishore Nallan suggested adjusting the 'drop_tokens_threshold' parameter. After making the adjustment, Sean found an issue with the order of the results, which was resolved by updating Typesense version.
Performance Characteristics of Filtering Search Results
Oskar queries the performance difference in filtering search results. Jason clarifies how filters work and provides performance improvement suggestions like increasing vCPUs and sharding the collection. Kishore Nallan explains filter IDs and document ID matching. The thread concludes with discussions on performance tradeoffs in filter implementation.