#community-help

Discussing Dataset Indexing and Instance Reboots

TLDR Thomas asked questions about dataset indexing and instance reboots. Kishore Nallan clarified that endpoints are synchronous for indexing, re-indexing happens on instance restarts, and upgrades shouldn't cause issues. CPU speed is identified as a bottleneck during this process. They suggested using CRIU for periodic RAM dumps to avoid re-indexing on reboot.

Powered by Struct AI

2

1

24
22mo
Solved
Join the chat
Mar 02, 2022 (22 months ago)
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
11:44 AM
How can I know that indexing completed on a dataset?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:45 AM
When the endpoint response arrives, indexing is done. All endpoints are synchronous.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
11:46 AM
If we restart the instance, it take a long time before it's usable, like 30 minutes. It re-index on reboot?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:47 AM
Yes, only raw documents are stored on disk and indexing happens in memory on restart.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
11:48 AM
Alright, what's the bottleneck on that? CPU or Disk speed?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:48 AM
CPU. The latest 0.23 RC builds are faster in this respect.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
11:51 AM
How much faster is 0.23 RC?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:51 AM
Depends on dataset. Primary work is around numerical fields.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
11:52 AM
Dataset is majority text
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:52 AM
We recommend running a 3 node configuration so rotationse can be done without a single point of failure.
11:53
Kishore Nallan
11:53 AM
There might be a few other things we can still do to optimize text fields.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
11:59 AM
Yeah we're starting with 3 nodes in a cluster
11:59
Thomas
11:59 AM
What's the optimal size in terms of keys?
12:00
Thomas
12:00 PM
We have 60 datapoints that need to filter on
12:11
Thomas
12:11 PM
Kishore Nallan How often do the API break, could rolling upgrades on a cluster be a problem?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:11 PM
Every node stores all the data so nodes help in increasing throughput acorss many users.
12:12
Kishore Nallan
12:12 PM
We've successfully done 5 versions so far on Typesense cloud across hundreds of deployments. We take care about backward compatibility.
12:14
Kishore Nallan
12:14 PM
We store nothing but documents on disk so not much problem with upgrades.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
12:18 PM
Superb 🙂
12:18
Thomas
12:18 PM
Thank you
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:18 PM
👍
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
12:47 PM
Kishore Nallan It's not possible to dump, periodically the RAM that's the index, to disk, so it doesn't need to be re-indexed on reboot?

2

Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:37 PM
You could try doing this via CRIU: https://criu.org/Main_Page

1

Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
02:48 PM
Yeah that's how we currently do it with KVM, but this doesn't help if there's a hardware issue

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Crash and Recovery Issues with Node Reindexing

Greg encountered issues with node health during reindexing, with service unresponsive and recovery taking significant time. Jason and Kishore Nallan suggested it might be a case of high volume writes and not a crash. Problem wasn't fully resolved after attempted solutions and data sharing for further debugging.

2

40
yesterday

Large JSONL Documents Import Issue & Resolution

Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.

run

4

94
9mo
Solved

Resolving Typesense Result Issue in Document Collection Queries

Mike was encountering errors when searching for specific query in their Typesense document collection. Jason suggested it may be due to the `drop_tokens_threshold` setting. There was a misunderstanding but after further explanation from Jason, Mike understood and decided to continue the conversation via email.

1

19
21mo

Resolving Typesense Cloud Cluster Issue with Cron Job

Issei reported a problem with an unhealthy Typesense Cloud cluster. With the particular help of Jason and Kishore Nallan, they discovered that a problematic cron job was responsible. A solution, using a different endpoint for data export, was agreed on and implemented.

5

65
31mo
Solved

Troubleshooting Stalled Writes in TypeSense Instance

Robert was experiencing typesense instances getting stuck after trying to import documents. Kishore Nallan provided suggestions and added specific logs to diagnose the issue. The two identified queries causing troubles but the issues had not been fully resolved yet.

7

57
12mo