is there a clever way to use Typesense to find "si...
# community-help
e
is there a clever way to use Typesense to find "similar documents". I have a a fairly big collection of 30-ish web shops products, and I would like to find, group and return similarly named ones - or actually to find "same products". I could loop through every document (product), and use its name to do an other typesense search to find similarly named ones. But is there a more clever way to do this?
k
This is broadly known as data deduplication, for which there are tools that are a better fit and solve the entire process end to end. Take a look at: https://github.com/dedupeio/dedupe
e
yea, I have that link on my "ToDo" list, to take a look at 🙂 - it's a bigger threshold for me to dig into python and machine learning - I already know and love typesense, so I hoped I could achieve at least some degree of success with it
k
It's a tricky problem to solve reliably with just search I think. I've done this exact work about 8-10 years ago at a previous job. Back then, we began with Elasticsearch and while it worked initially we discovered many limitations on a search based approach.
e
thank you so much @Kishore Nallan! I appreciate your insights.