#community-help

Deduplicating Usernames in a Comments Collection

TLDR Todd had concerns about searching duplicated usernames in comments. Jason suggested facet_query, and SamHendley recommended group_by. Todd appreciated the help.

Powered by Struct AI

1

Dec 02, 2022 (13 months ago)
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
07:07 PM
We’re trying to search some data that is pretty heavily duplicated in a collection. Is there anyway to deduplicate matches? To give context, we have a collection of comments with a username field, but we want to try and search usernames using this collection. The concern we have is that we’d maybe only get one user back because the username field is so duplicated across comments that searching matches for a username will only return all comments from only the user with the highest match score.

Is there a way to only get back distinct usernames, or should we just have a separate collection of user information we search on instead?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:15 PM
You could do a facet_query for this
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
07:17 PM
Right, but won’t that not give me ranking on my results in this case?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:18 PM
It will be ranked by the most popular usernames…
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
07:18 PM
Most entry matches?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:18 PM
Correct
07:18
Jason
07:18 PM
If you need to control ranking beyond that, then you would have to put it in a separate collection
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
07:19 PM
Very interesting stuff! Thank you very much for helping us find our way around.

1

SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
07:47 PM
also group_by might be approriate

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Discussions on Typesense, Collections, and Dynamic Fields

Tugay shares plans to use Typesense for their SaaS platform and asks about collection sizes and sharding. Jason clarifies Typesense's capabilities and shares a beta feature. They discuss using unique collections per customer and new improvements. Kishore Nallan and Gabe comment on threading and data protection respectively.

3

45
35mo

Updating Collections Strategy and Faceting New Field

Nithin asked about strategies for updating collections and faceting new fields. Kishore Nallan suggested creating another collection, indexing in the background and using aliases to switch live traffic over, and shared details about the upcoming release.

10
34mo

User-Specific Tagging and Filtering in UI

bnfd asked for the best way to create user-specific tags available on the UI. Jason suggested using personalized filters and creating a separate collection for each user's movies. The duo clarified the use of 'tags' in schemas and the refinementList widget in instantsearch. They also discussed various approaches to import and search large document collections.

1

46
29mo

Troubleshooting Typesense API Analytics Query Suggestions

Md was confused about implementing Typesense's Analytics Query Suggestions and experienced issues with collections returning no hits. Assistance from Kishore Nallan eventually led to the identification that analytics had to be enabled. They also discussed tracking duplicate and empty queries, resulting in Md creating a Github issue.

3

27
3mo

Grouping and Faceting Denormalized Data

Phillip asked about grouping and faceting denormalized data. Viji provided a specific example for clarity. Jason confirmed Phillip's plan and suggested a different approach for consideration. Phillip acknowledged the advice.

1

7
19mo