#community-help

Implementing Typesense Updates with JSONL Import and Aliases

TLDR Ken is building a search solution for a website using Typesense. They consulted Kishore Nallan about the implementation of updates using JSONL import and aliases and how to know when the new collection is indexed and ready. Measures, such as dividing large imports into small batches, were suggested to address the issue.

Powered by Struct AI

2

16
21mo
Solved
Join the chat
Feb 15, 2022 (21 months ago)
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
03:31 PM
👋 Hi – new Typesense user here. I am building the search solution for tradingstrategy.ai and have identified Typesense as the likely b/end. I have a question about how to best implement updates using JSONL import and aliases ----->

1

03:31
Ken
03:31 PM
My plan is to follow the pattern described here:
https://typesense.org/docs/0.22.2/api/collection-alias.html
03:32
Ken
03:32 PM
(populate new collection daily using documents/import, toggle collection alias to point to new collection)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:33 PM
👋 Very interesting site!

The alias-based approach works well if your content refresh happens only periodically so you can just a full refresh of the index.

1

03:34
Kishore Nallan
03:34 PM
For streaming or adhoc updates, you can use the upsert or update action of the import end-point.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
03:35 PM
My question is… how do I know when the new collection is indexed and ready to use? Do I need to poll with GET /collections/foo or is there a callback that can notify me when the new collection is ready?
03:37
Ken
03:37 PM
My concern with using incremental updates – it adds some complexity (need to account for adds, updates, deletes) and you almost always get some “downstream drift” of your data… requiring periodic full updates anyway… so starting with full updates seems simpler.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:52 PM
If that works for your update frequency then certainly the easiest approach 👍
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
03:54 PM
Thanks. Any feedback on my question regarding how to know when new collection index is ready (when to toggle alias)?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:59 PM
When your import call ends, that indicates that the collection has been indexed.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
04:03 PM
OK… so I can synchronously toggle the alias as soon as the import request is complete? I noticed previously when importing a large collection (1.6M records) that I was getting some type of “not ready” response to GET collection/foo requests as well as search requests.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:16 PM
When you import a large batch at one go, your writes can lag and this can trigger the max-read-lag and max-write-lag configuration values at which point the system will think it is lagging behind heavily and so will return not ready to prevent stale results from being returned.

To prevent this from happening, split your imports into batches that are not too large. We have some work planned to make the import endpoint automatically slow down for large uploads which should make this easier.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
04:35 PM
:thumbsup: thanks – that helps! Any suggestion on max batch size?
Feb 16, 2022 (21 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:31 AM
It will depend on how many fields you are indexing and whether the fields are large text etc.
03:32
Kishore Nallan
03:32 AM
I will recommend starting with about 2000-3000 documents per batch and then revising based on observations.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
09:19 PM
:thumbsup: thanks – appreciate the support!