#community-help

Using Typesense for Product Similarity Search

TLDR em1nos asked about finding similar documents with Typesense. Kishore Nallan suggested using data deduplication tools like Dedupe.io, instead of solely relying on search.

Powered by Struct AI
Apr 06, 2022 (21 months ago)
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
11:25 AM
is there a clever way to use Typesense to find "similar documents". I have a a fairly big collection of 30-ish web shops products, and I would like to find, group and return similarly named ones - or actually to find "same products".
I could loop through every document (product), and use its name to do an other typesense search to find similarly named ones. But is there a more clever way to do this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:28 AM
This is broadly known as data deduplication, for which there are tools that are a better fit and solve the entire process end to end. Take a look at: https://github.com/dedupeio/dedupe
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
11:35 AM
yea, I have that link on my "ToDo" list, to take a look at 🙂 - it's a bigger threshold for me to dig into python and machine learning - I already know and love typesense, so I hoped I could achieve at least some degree of success with it
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:39 AM
It's a tricky problem to solve reliably with just search I think. I've done this exact work about 8-10 years ago at a previous job. Back then, we began with Elasticsearch and while it worked initially we discovered many limitations on a search based approach.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
06:54 PM
thank you so much Kishore Nallan! I appreciate your insights.