Hi guys, I used `tsense.collections(colleciton_nam...
# community-help
m
Hi guys, I used
tsense.collections(colleciton_name).update({ fields: fieldsToAdd });
to update the schema, I have set range_index: true. I saw the columns are added in the schema on the website, but i didn't seed anything about range_index in the schema, is this normal?
Is it because of the error: ObjectUnprocessable: Request failed with HTTP code 422 | Server said: Another collection update operation is in progress.
But why the fields are inserted
j
In a few versions of Typesense we didn't return the range_index property in the GET /collections endpoint, even though it was added to the schema behind the scenes. That could be one reason. Another reason is that the change hasn't completed yet. Depending on the size of your dataset, schema changes can take anywhere from a few minutes for tens of thousands of records to hours for tens of millions of records
m
Is there a way to confirm range_index are added to our field?
j
Queries with > and < operators should be noticeably faster. Otherwise, the only other way would be to upgrade v27.1 or above
m
Gotcha, thanks
d
Hey Jason. Wanted to reopen this thread for a related question. You mentioned that "depending on the size of the dataset it could take minutes to hours". Does this cause all writes that happen at the same time to be in a pending buffer while the change is happening?
j
No writes to a collection will be rejected when a schema change for that collection is in progress, but reads will be serviced (as of v26.0 and above)
👍 1
d
Will writes still be added to the db? or will it be in a pending state?
We're currently running on HA mode with ~5 million docs in our collection. We added an indexed numerical field to our collection and it caused all our writes to be pending for around 20 minutes. We do have 190 fields though and not sure if its reindexing everything after we add a field
j
Even in an HA cluster, for schema changes, they are applied to all the nodes in parallel (since it's just like any other write), so other writes will be blocked to that collection on all the nodes
d
Gotcha. Is this for all types of fields added? or primarily for indexed fields? numerical/non-numerical?
^ sorry to clarify
j
This applies to any type of field added to the schema (so indexed fields)
d
so we'll experience the same amount of downtime right?
ahh
got it
j
Writes downtime - yes, but reads will continue
d
Yep, our reads were working fine 🔥
Is the downtime a function of just document # in the collection? or will # of fields affect this as well
(we have tons of deprecated fields)
j
It's both a function of the number of documents, and also the number of net new fields added in that schema change operation. The existing fields already in the collection won't affect the schema change speed
d
Ahh i see, so i'm assuming removing old fields wont really help us out then right? Do you have any suggestions on how we can minimize write downtime?
j
so i'm assuming removing old fields wont really help us out then right?
Correct, that won't help with speeding up schema changes. But if you're able to drop unused fields, then in general you'll conserve RAM and will speed up indexing.
Do you have any suggestions on how we can minimize write downtime?
The more CPU cores you have the faster the operation will be, and you also want to combine multiple field changes into a single schema change operation, instead of doing one field at a time.
d
Ah i see. This is very helpful. Thank you!!! party parrot
👍 1
Hey @Jason Bosco. We're currently trying to figure out how we can do a schema update to our collection while experiencing zero downtime (so no pending writes). The strategy we're hovering over is having an alias that points to a collection. If a schema were to be updated, we just make a new collection with the schema, export all docs from old collection, import into new collection, then switch alias. Only downside to this is our collection is pretty large so exports and imports take a really long time. Was wondering if there was an efficient way to stream data or duplicate a collection with data in a way that wouldn't cause any downtime
j
If a schema were to be updated, we just make a new collection with the schema, export all docs from old collection, import into new collection, then switch alias.
This would be the recommended way to do it, except for the strikethrough. Instead of that, you want to just resync the data from your primary database into the new collection. That way you don't have touch the existing collection that's already serving traffic.