Hi everyone! I stumbled upon an issue while import...
# community-help
l
Hi everyone! I stumbled upon an issue while importing data in my cloud Typesense collection where some data got duplicated. I used the batch insert and did client-side batching. There seemed to be no error while inserting. My
connection_timeout_seconds
is set to 5000 so I wonder if that could be a source of problems ? I searched in the GH issues and found #1261 which mentions duplicated documents but it’s from an old version of the Typesense server. Has anyone encoutered the same problem recently ?
f
Hey, could you provide a small reproducible example with some data?
l
Sorry, I can’t publicly share the dataset I’m using unfortunately and this issue doesn’t happen every time. It only happens for 5-10% of batches out of ~100 (batch size was set to 1000).
Is there a way for me to access some logs when using Typesense cloud ? All metrics (health, metrics, stats seemed ok when doing the inserts)
f
This sounds like a timeout issue with your larger batch operations. I'd recommend increasing the timeout to ~10 minutes (600,000ms)
l
Thank you for the suggestion. We tried to increase the timeout a bit and reduced our batch size to 500 but still got duplicates. Is the timeout parameter of the client connection in milliseconds or in seconds as the name suggests?