Flushing Queue for Schema Altering in Typesense
TLDR Dima experienced repeated schema alterations due to a misconfiguration, causing server timeouts. Kishore Nallan recommended increasing the timeout and adjusting max_retries to resolve the issue.
3
Mar 28, 2023 (6 months ago)
Dima
10:29 AMProcessing field additions and deletions first...
every few minutesKishore Nallan
10:31 AMDima
10:31 AMDima
10:31 AMRunning GC for aborted requests, req map size: 38
Kishore Nallan
10:31 AMDima
10:32 AMKishore Nallan
10:32 AMThis is different, not related.
Kishore Nallan
10:32 AMKishore Nallan
10:33 AMDima
10:35 AMKishore Nallan
10:37 AMAFAIK this should not happen at all because there is no inherent looping in the update schema code. So either of the two are happening:
a) Server is restarting after a crash and replaying recent writes from the write ahead log
b) The client is getting timed out and the official Typesense client retries the API call if there is a timeout
Kishore Nallan
10:37 AMKishore Nallan
10:38 AMDima
10:39 AMAre you sure? It decreases every time after succesful processing:
Alter payload validation is successful... Processing field additions and deletions first...
Running GC for aborted requests, req map size: 34 Finished altering 105713 document(s). Processing field modifications now...
Finished altering 105713 document(s).
Alter payload validation is successful...
Processing field additions and deletions first...
Running GC for aborted requests, req map size: 33
Finished altering 105713 document(s).
Processing field modifications now...
Finished altering 105713 document(s).
Kishore Nallan
10:39 AMSo indeed the client seems to have sent a schema alter request repeatedly.
1
Kishore Nallan
10:40 AM--skip-writes
flag.Kishore Nallan
10:41 AMDima
10:41 AMThis is exactly what happened
Kishore Nallan
10:42 AMKishore Nallan
10:42 AMDima
10:48 AM• I have statefulset with 1 node of typesense + post-update configurator job
• I ran heavy schema altering by error (delete two rows, index them again)
• It caused deploy timeout, so deploy was reverted (typesense restarted, post-update configurator job ran again)
• I retry this process a few times until found what’s going on
Kishore Nallan
10:51 AMDima
10:53 AM> HTTPConnectionPool(host=‘typesense’, port=1234): Read timed out. (read timeout=20)
Dima
10:53 AMKishore Nallan
10:54 AMcurl
which does not timeout by default.Dima
10:58 AMKishore Nallan
10:59 AM1
Dima
10:59 AM1
Typesense
Indexed 2786 threads (79% resolved)
Similar Threads
Slow, High CPU Write Operations After Collection Drop in Typesense
Himank discussed an issue in Typesense where deleting and recreating a collection led to slow write operations and high CPU usage. Kishore Nallan suggested using an alias to avoid this issue. Numerous tests and debugging was conducted as pboros contributed with local testing. Kishore Nallan aimed to start implementing a range delete and full db compaction after deletion to potentially solve the issue.
Troubleshooting Stalled Writes in TypeSense Instance
Robert was experiencing typesense instances getting stuck after trying to import documents. Kishore Nallan provided suggestions and added specific logs to diagnose the issue. The two identified queries causing troubles but the issues had not been fully resolved yet.
Issue with Field Indexing and Multiple Data Types
Raymond encountered an issue where a field seemed to be indexed twice and hence couldn't be deleted. Jason advised upgrading to a patch version, but the problem remained. Kishore Nallan suspected a race condition and an issue with conflicting data types. An effective solution wasn't achieved.
Troubleshooting Write Timeouts in Typesense with Large CSVs
Agustin had issues with Typesense getting write timeouts while loading large CSV files. Kishore Nallan suggested chunking data or converting to JSONL before loading. Through troubleshooting, they identified a possible network problem at AWS and found a workaround.
Resolve Connection Error on Records Upsert
Jainil faced consistent connection errors while upserting records. Jason identified an OOM issue, suggesting a capacity upgrade. Auto-scaling was discussed and the upgrade implementation process, which was in progress, was clarified.