Implementing Stemming, Lemmatization, Stopwords with Typesense

TLDR Carlo asked about implementing stemming, lemmatization, stopwords with Typesense. Kishore Nallan suggested the Porter stemmer and mentioned stopwords is under development. Gustavo suggested using GPT-3.5-Turbo.

Photo of Carlo
Carlo
Thu, 13 Jul 2023 06:10:14 UTC

I've seen a ticket that stekming, lemmatization, stopwords aren't currently supported by typesense. Has anyone succesfully implemented that before it reaches typesense, or know a good workaround?

Photo of Kishore Nallan
Kishore Nallan
Thu, 13 Jul 2023 08:13:36 UTC

Porter stemmer is the most popular stemming library used. You have to stem the values during indexing and also stem the queries before sending to Typesense. However, I suspect that most people don't use stemmers because prefix searching & typo correction is usually enough to handle plurals etc.

Photo of Kishore Nallan
Kishore Nallan
Thu, 13 Jul 2023 08:13:57 UTC

Stopwords is under development.

Photo of Carlo
Carlo
Thu, 13 Jul 2023 08:31:12 UTC

thnx!

Photo of Gustavo
Gustavo
Thu, 13 Jul 2023 12:30:14 UTC

You can also use GPT-3.5-Turbo to do that as well as add synonyms, labels, categories, etc.