#community-help

Implementing Typesense Updates with JSONL Import and Aliases

TLDR Ken is building a search solution for a website using Typesense. They consulted Kishore Nallan about the implementation of updates using JSONL import and aliases and how to know when the new collection is indexed and ready. Measures, such as dividing large imports into small batches, were suggested to address the issue.

Powered by Struct AI

2

Feb 15, 2022 (23 months ago)
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
03:31 PM
👋 Hi – new Typesense user here. I am building the search solution for tradingstrategy.ai and have identified Typesense as the likely b/end. I have a question about how to best implement updates using JSONL import and aliases ----->

1

03:31
Ken
03:31 PM
My plan is to follow the pattern described here:
https://typesense.org/docs/0.22.2/api/collection-alias.html
03:32
Ken
03:32 PM
(populate new collection daily using documents/import, toggle collection alias to point to new collection)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:33 PM
👋 Very interesting site!

The alias-based approach works well if your content refresh happens only periodically so you can just a full refresh of the index.

1

03:34
Kishore Nallan
03:34 PM
For streaming or adhoc updates, you can use the upsert or update action of the import end-point.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
03:35 PM
My question is… how do I know when the new collection is indexed and ready to use? Do I need to poll with GET /collections/foo or is there a callback that can notify me when the new collection is ready?
03:37
Ken
03:37 PM
My concern with using incremental updates – it adds some complexity (need to account for adds, updates, deletes) and you almost always get some “downstream drift” of your data… requiring periodic full updates anyway… so starting with full updates seems simpler.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:52 PM
If that works for your update frequency then certainly the easiest approach 👍
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
03:54 PM
Thanks. Any feedback on my question regarding how to know when new collection index is ready (when to toggle alias)?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:59 PM
When your import call ends, that indicates that the collection has been indexed.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
04:03 PM
OK… so I can synchronously toggle the alias as soon as the import request is complete? I noticed previously when importing a large collection (1.6M records) that I was getting some type of “not ready” response to GET collection/foo requests as well as search requests.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:16 PM
When you import a large batch at one go, your writes can lag and this can trigger the max-read-lag and max-write-lag configuration values at which point the system will think it is lagging behind heavily and so will return not ready to prevent stale results from being returned.

To prevent this from happening, split your imports into batches that are not too large. We have some work planned to make the import endpoint automatically slow down for large uploads which should make this easier.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
04:35 PM
:thumbsup: thanks – that helps! Any suggestion on max batch size?
Feb 16, 2022 (23 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:31 AM
It will depend on how many fields you are indexing and whether the fields are large text etc.
03:32
Kishore Nallan
03:32 AM
I will recommend starting with about 2000-3000 documents per batch and then revising based on observations.
Ken
Photo of md5-e647a31295419f1362630a69cf274ae3
Ken
09:19 PM
:thumbsup: thanks – appreciate the support!

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Revisiting Typesense for Efficient DB Indexing and Querying

kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.

1

46
9mo

Typesense Server Bulk Import/Upsert Issue Resolved

Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.

2

22
7mo

Troubleshooting Indexing Duration in Typesense Import

Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.

5

43
15mo

Implementing Semantic Search with Typesense

Erik sought advice for semantic search implementation in Typesense and raised issues around slow document import and excessive latency. Upon implementing advice from Kishore Nallan to try different models, Erik reported faster times, ultimately deciding to rate-limit imports.

1

17
1mo

Troubleshooting Typesense Document Import Error

Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.

3

30
11mo