Hi Team we are trying to restrict search results content pro typesense #community-help

Hi Team, we are trying to restrict search results ...

Stephen Njau

12/10/2024, 7:32 AM

Hi Team, we are trying to restrict search results content (programs, meditations) to what each user has access to. Below, I’ve outlined our context, the approaches we’ve tried, and the challenges we’re encountering. We would greatly appreciate your guidance and recommendations. And or experience. Context We already . When users search for quests, meditations, or soundscapes, the search should return results only from content the user has access to. Current Collections and Structures 1. user_accesseses
Collection

Copy code

{
  "name": "user_accesseses",
  "fields": [
    { "name": "user_uid", "type": "string"}, // e.g "auth|2343243434"
    { "name": "content_ids", "type": "string[]"}, // e.g ["media/45", "program/67", "media/567", "channel/45"]
    { "name": "inserted_at", "type": "int64"},
    { "name": "updated_at", "type": "int64"}
  ],
  "default_sorting_field": "inserted_at"
}

2. programs
Collection

Copy code

{
  "name": "programs",
  "fields": [
    { "name": "title"},
    { "name": "content_id", "type": "string"}, // e.g "media/45"
    { "name": "rating", "type": "float"},
    { "name": "duration", "type": "float" }
    // ... other fields
  ],
  "default_sorting_field": "rating"
}

3. meditations
Collection

Copy code

{
  "name": "meditations",
  "fields": [
    { "name": "content_id", "type": "string"},
    { "name": "title", "type": "string" }
    // ... other fields
  ],
  "default_sorting_field": "title"
}

Data Volume - Users: 1,000,000+ - Content IDs: ~20,000 (combination of programs and meditations) Environment - Typesense deployed in production with separate collections for user access and content metadata. - Utilized Typesense’s API for indexing and searching. - Two synchronization mechanisms: - Real-Time Updates: On every CRUD operation, the specific collection is updated in Typesense. - Sync-All Functionality: On demand: clears and re-syncs all records for bulk updates, primarily used during schema changes. What We Have Tried 1. Using the
string
Type for
content_id
in Content Schemas Approach: We initially defined the

content_id

field in both

programs

and

meditations

collections as a single

string

. This was intended to match the entries in the

content_ids

array of the

user_accesses

collection, facilitating filter-based searches. Schema Example: programs
Collection

Copy code

{
  "name": "programs",
  "fields": [
    { "name": "content_id", "type": "string", "reference": "user_accesses.content_ids"}, // reference as a string
    { "name": "rating", "type": "float" },
    { "name": "duration", "type": "float" }
    // ... other fields
  ],
  "default_sorting_field": "rating"
}

Issues We Encountered a. Type Mismatch: The

content_ids

user_accesses

are arrays of strings (

string[]

), while

content_id

in content collections was a single

string

. This mismatch indexing errors. I noticed that this required there to an entry on the user_accesses with the same same id before indexing. `Reference document having`content_ids:= media/155` not found in the collection

user_accesses

b. Synchronization Order Dependency: Indexing content collections before

user_accesses

led to scenarios where content was not correctly associated with user access rights. Related to (a) above. c. Incomplete Joins: When performing filter queries based on

content_ids

, the single

string

type in content collections did not align properly with the array type, leading to inaccurate search results. 2. Using the
string[]
Type for
content_id
in Content Schemas Approach: To address the type mismatch, we modified the

content_id

field in both

programs

and

meditations

collections to be an array of strings (

string[]

). This was intended to align with the

content_ids

user_accesses

and facilitate proper filtering. Schema Example: programs
Collection

Copy code

{
  "name": "programs",
  "fields": [
    { "name": "content_id", "type": "string[]", "reference": "user_accesses.content_ids" }, // reference on string[] field     
    { "name": "rating", "type": "float" },
    { "name": "duration", "type": "float" }
    // ... other fields
  ],
  "default_sorting_field": "rating"
}

Issues We Encountered: a. Handling New Users: when CRUD operations happen e.g on newly onboarded users' access rights were not reflected in search results without reindexing, as the existing synchronization mechanisms did not account for dynamic updates to

content_ids

. Needs all resyncing of related schema that reference the user_access's content_ids at the time of reindexing. Request for Assistance Given the above challenges, we would greatly appreciate your guidance on the following: 1. Efficient Implementation of JOIN-like Filtering: - Best practices for implementing user-based filtering with JOIN operations factoring in that we have CRUD operations that update single records separately. - Recommendations on schema design or query structuring to achieve our filtering goals effectively. 2. Handling Real-Time Updates and Scalability: - Strategies to reflect user access changes in real-time without necessitating full reindexing. - Suggestions for managing large-scale user and content datasets (1M+ users, ~20k

content_ids

Kishore Nallan

12/10/2024, 10:01 AM

cc @Harpreet Sangar

Harpreet Sangar

12/13/2024, 6:32 AM

@Stephen Njau In the user_accesseses
collection,

Copy code

{ "name": "content_ids", "type": "string[]"}, // e.g ["media/45", "program/67", "media/567", "channel/45"]

is this field storing all the

content_ids

of the programs
and meditations
that a particular user has access to?

Stephen Njau

12/16/2024, 12:32 PM

@Harpreet Sangar Correct

Harpreet Sangar

12/16/2024, 12:36 PM

Okay. You should create separate fields in the user_accesseses
collection for each content_ids like:

program_content_ids

meditation_content_ids

, etc. Also, make these fields reference the respective collections like:

programs.content_id

meditations.content_id

, etc.

✅ 1

2 Views

Open in Slack

Previous Next