#community-help

Searching English Singular/Plural Pairs with Typesense

TLDR Adrian sought a source for common English singular/plural pairs. Kishore Nallan recommended using synonyms for compatibility with highlighting in Typesense.

Powered by Struct AI
+13
8
5mo
Solved
Join the chat
Apr 26, 2023 (5 months ago)
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
07:47 PM
Related to this github issue. Does anybody know of a source with lists of common english singular/plurals pairs (e.g. mice - mouse)? My application searches general user uploaded data, so hard to create a custom dictionary since I don't know what users may upload ahead of time.
Apr 27, 2023 (5 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:42 AM
For stemming, people use porter stemmer. However for just singular/plurals I'm not aware of any lists.
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
03:04 PM
where in the flow do people apply porter stemmer? I assume it would have to be at query and index time. But if the indexed data is stemmed then the snippets would be effect I believe, which we don't want
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:05 PM
Right so you have to index the stemmed version but also have the original version. Typesense can highlight any field even if it's not part of schema.
+11
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
03:19 PM
how does typesense highlight a field not in the schema? I don't fully follow how a stemmed query could map to highlighting the original version thats not in the schema
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:21 PM
Yeah stemmed query will not be matched to words in text so that won't work...
+11
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
08:51 PM
do you have any other ideas on how to make this use case work with typesense? Would setting up synonyms for common plural/singular pairs be the best bet?
Apr 28, 2023 (5 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:53 AM
Synonyms is probably the only way I can think of doing it that's also compatible with highlighting.
+11