Achieving Stemming Support with Typesense
TLDR Sabyasachi asked how to implement stemming in Typesense, which Kishore Nallan explained must be handled externally. Sabyasachi later shared they created an extra field for storing stemmed content.
2
Jan 08, 2022 (24 months ago)
Sabyasachi
06:08 AMKishore Nallan
01:52 PM1
Jan 10, 2022 (24 months ago)
Sabyasachi
03:58 AMJan 11, 2022 (24 months ago)
Sabyasachi
04:05 AMHere is what I did: added a separate field in the schema:
text_stemmed
. I used nltk.PorterStemmer
for stemming the content and store the resultant string in text_stemmed
.While querying I use the same method to generate stemmed query string. I concatenate the original query and the stemmed query. In the search params, I added the
text_stemmed
at the last of the query_by
param. So the exact matches are still prioritized higher.1
Typesense
Indexed 3015 threads (79% resolved)
Similar Threads
Discussion on Implementing Stemming in TS
Max asked about the possibility of implementing stemming in TS, referencing the Snowball stemmer library. Kishore Nallan acknowledged it could be useful but had not been planned yet, asking to upvote the linked issue.
Implementing Stemming, Lemmatization, Stopwords with Typesense
Carlo asked about implementing stemming, lemmatization, stopwords with Typesense. Kishore Nallan suggested the Porter stemmer and mentioned stopwords is under development. Gustavo suggested using GPT-3.5-Turbo.
Typesense Support for Non-English Languages
omega enquired about Typesense's support for non-English languages. Kishore Nallan suggested using separate fields for different languages.