Achieving Stemming Support with Typesense
TLDR Sabyasachi asked how to implement stemming in Typesense, which Kishore Nallan explained must be handled externally. Sabyasachi later shared they created an extra field for storing stemmed content.
2
Jan 08, 2022 (22 months ago)
Sabyasachi
06:08 AMKishore Nallan
01:52 PM1
Jan 10, 2022 (22 months ago)
Sabyasachi
03:58 AMJan 11, 2022 (22 months ago)
Sabyasachi
04:05 AMHere is what I did: added a separate field in the schema:
text_stemmed
. I used nltk.PorterStemmer
for stemming the content and store the resultant string in text_stemmed
.While querying I use the same method to generate stemmed query string. I concatenate the original query and the stemmed query. In the search params, I added the
text_stemmed
at the last of the query_by
param. So the exact matches are still prioritized higher.1
Typesense
Indexed 2786 threads (79% resolved)
Similar Threads
Implementing Stemming, Lemmatization, Stopwords with Typesense
Carlo asked about implementing stemming, lemmatization, stopwords with Typesense. Kishore Nallan suggested the Porter stemmer and mentioned stopwords is under development. Gustavo suggested using GPT-3.5-Turbo.
Typesense Support for Non-English Languages
omega enquired about Typesense's support for non-English languages. Kishore Nallan suggested using separate fields for different languages.
Understanding Query Parsing & Search Algorithm in Typesense
robert inquired about typesense's query parsing and search algorithm. Kishore Nallan explained that typesense supports prefix matching and typo tolerance, but does not stem queries.