Hi guys I used `tsense collections colleciton name update fi typesense #community-help

Hi guys, I used `tsense.collections(colleciton_nam...

Mingyuan (Jasper) Li

12/05/2024, 7:28 PM

Hi guys, I used

tsense.collections(colleciton_name).update({ fields: fieldsToAdd });

to update the schema, I have set range_index: true. I saw the columns are added in the schema on the website, but i didn't seed anything about range_index in the schema, is this normal?

Mingyuan (Jasper) Li

12/05/2024, 7:33 PM

Is it because of the error: ObjectUnprocessable: Request failed with HTTP code 422 | Server said: Another collection update operation is in progress.

Mingyuan (Jasper) Li

12/05/2024, 7:33 PM

But why the fields are inserted

Jason Bosco

12/05/2024, 7:49 PM

In a few versions of Typesense we didn't return the range_index property in the GET /collections endpoint, even though it was added to the schema behind the scenes. That could be one reason. Another reason is that the change hasn't completed yet. Depending on the size of your dataset, schema changes can take anywhere from a few minutes for tens of thousands of records to hours for tens of millions of records

Mingyuan (Jasper) Li

12/05/2024, 7:51 PM

Is there a way to confirm range_index are added to our field?

Jason Bosco

12/05/2024, 7:52 PM

Queries with > and < operators should be noticeably faster. Otherwise, the only other way would be to upgrade v27.1 or above

Mingyuan (Jasper) Li

12/05/2024, 8:00 PM

Gotcha, thanks

Dhruv Vora

12/12/2024, 9:32 PM

Hey Jason. Wanted to reopen this thread for a related question. You mentioned that "depending on the size of the dataset it could take minutes to hours". Does this cause all writes that happen at the same time to be in a pending buffer while the change is happening?

Jason Bosco

12/12/2024, 9:34 PM

No writes to a collection will be rejected when a schema change for that collection is in progress, but reads will be serviced (as of v26.0 and above)

👍 1

Dhruv Vora

12/12/2024, 9:34 PM

Will writes still be added to the db? or will it be in a pending state?

Dhruv Vora

12/12/2024, 9:34 PM

We're currently running on HA mode with ~5 million docs in our collection. We added an indexed numerical field to our collection and it caused all our writes to be pending for around 20 minutes. We do have 190 fields though and not sure if its reindexing everything after we add a field

Jason Bosco

12/12/2024, 9:35 PM

Even in an HA cluster, for schema changes, they are applied to all the nodes in parallel (since it's just like any other write), so other writes will be blocked to that collection on all the nodes

Dhruv Vora

12/12/2024, 9:36 PM

Gotcha. Is this for all types of fields added? or primarily for indexed fields? numerical/non-numerical?

Dhruv Vora

12/12/2024, 9:36 PM

^ sorry to clarify

Jason Bosco

12/12/2024, 9:36 PM

This applies to any type of field added to the schema (so indexed fields)

Dhruv Vora

12/12/2024, 9:37 PM

so we'll experience the same amount of downtime right?

Dhruv Vora

12/12/2024, 9:37 PM

ahh

Dhruv Vora

12/12/2024, 9:37 PM

got it

Jason Bosco

12/12/2024, 9:37 PM

Writes downtime - yes, but reads will continue

Dhruv Vora

12/12/2024, 9:37 PM

Yep, our reads were working fine 🔥

Dhruv Vora

12/12/2024, 9:38 PM

Is the downtime a function of just document # in the collection? or will # of fields affect this as well

Dhruv Vora

12/12/2024, 9:38 PM

(we have tons of deprecated fields)

Jason Bosco

12/12/2024, 9:38 PM

It's both a function of the number of documents, and also the number of net new fields added in that schema change operation. The existing fields already in the collection won't affect the schema change speed

Dhruv Vora

12/12/2024, 9:39 PM

Ahh i see, so i'm assuming removing old fields wont really help us out then right? Do you have any suggestions on how we can minimize write downtime?

Jason Bosco

12/12/2024, 9:41 PM

so i'm assuming removing old fields wont really help us out then right?

Correct, that won't help with speeding up schema changes. But if you're able to drop unused fields, then in general you'll conserve RAM and will speed up indexing.

Do you have any suggestions on how we can minimize write downtime?

The more CPU cores you have the faster the operation will be, and you also want to combine multiple field changes into a single schema change operation, instead of doing one field at a time.

Dhruv Vora

12/12/2024, 9:43 PM

Ah i see. This is very helpful. Thank you!!! party parrot

👍 1

Dhruv Vora

12/16/2024, 6:16 PM

Hey @Jason Bosco. We're currently trying to figure out how we can do a schema update to our collection while experiencing zero downtime (so no pending writes). The strategy we're hovering over is having an alias that points to a collection. If a schema were to be updated, we just make a new collection with the schema, export all docs from old collection, import into new collection, then switch alias. Only downside to this is our collection is pretty large so exports and imports take a really long time. Was wondering if there was an efficient way to stream data or duplicate a collection with data in a way that wouldn't cause any downtime

Jason Bosco

12/17/2024, 6:09 AM

If a schema were to be updated, we just make a new collection with the schema, ~~export all docs from old collection~~, import into new collection, then switch alias.

This would be the recommended way to do it, except for the strikethrough. Instead of that, you want to just resync the data from your primary database into the new collection. That way you don't have touch the existing collection that's already serving traffic.

Open in Slack

Previous Next