Achieving Stemming Support with Typesense
TLDR Sabyasachi asked how to implement stemming in Typesense, which Kishore Nallan explained must be handled externally. Sabyasachi later shared they created an extra field for storing stemmed content.
Jan 08, 2022 (24 months ago)
Kishore Nallan01:52 PM
Jan 10, 2022 (24 months ago)
Jan 11, 2022 (24 months ago)
Here is what I did: added a separate field in the schema:
text_stemmed. I used
nltk.PorterStemmerfor stemming the content and store the resultant string in
While querying I use the same method to generate stemmed query string. I concatenate the original query and the stemmed query. In the search params, I added the
text_stemmedat the last of the
query_byparam. So the exact matches are still prioritized higher.
Indexed 3015 threads (79% resolved)
Discussion on Implementing Stemming in TS
Max asked about the possibility of implementing stemming in TS, referencing the Snowball stemmer library. Kishore Nallan acknowledged it could be useful but had not been planned yet, asking to upvote the linked issue.
Implementing Stemming, Lemmatization, Stopwords with Typesense
Carlo asked about implementing stemming, lemmatization, stopwords with Typesense. Kishore Nallan suggested the Porter stemmer and mentioned stopwords is under development. Gustavo suggested using GPT-3.5-Turbo.
Typesense Support for Non-English Languages
omega enquired about Typesense's support for non-English languages. Kishore Nallan suggested using separate fields for different languages.