#community-help

Querying Issues with Newly Created Collection

TLDR User Dima reported issues with missed search results in a new collection he created. User Kishore Nallan offered solutions but the problem persists. Dima cannot share the data, which makes it harder to resolve the issue.

Powered by Struct AI
+11
Jul 27, 2023 (2 months ago)
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
12:51 PM
Hi team! Do you know about bugs with missed hits in search? Maybe it is already known. Details in the thread:
12:51
Dima
12:51 PM
I’ve created new collection with one indexed field searchable: string[]. I filled the collection and after one or multiple update with action=emplace some results are missing from the search results.

Problem looks like this:
• One or multiple hits are missing for known simple search query consists of two words
• If I add "exhaustive_search": true it helps, but only partially — for some reason hit now has very small text match score
• Restart helps to return it back to normal
• Next import with action=emplace will again cause the same problem
12:52
Dima
12:52 PM
Before restart, search without exhaustive search and search with exhaustive search
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:52 PM
What version are you on?
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
12:53 PM
0.25.0.rc53
12:55
Dima
12:55 PM
After restart
12:58
Dima
12:58 PM
The main difference between results before and after:
• “tokens_matched”: 1 vs “tokens_matched”: 2
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:59 PM
Can you tell me something about the update?
12:59
Kishore Nallan
12:59 PM
Meaning is it removing a value or adding etc.
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
01:00 PM
Actually nothing is changed in the object during update. It’s our regular cron job, which re-uploads every object from remote storage with action=emplace
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:23 PM
Are you able to reproduce on a smaller set?
01:24
Kishore Nallan
01:24 PM
Basically:

1. Create X records
2. Query, find N results
3. Update X records
4. Restart
5. Query, find M results
01:55
Kishore Nallan
01:55 PM
Worst case, if you are able to take a copy of the data directory right after a cron (and you notice the data changing) and share it with me, I can see if I can find something going wrong.
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
02:18 PM
I tried to create reproducible example with simple bash script, but had no luck. I will try again
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:21 PM
If you don't have restrictions around sharing your data then that would be easiest.
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
02:41 PM
Unfortunately I cannot share this data :face_with_peeking_eye:
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:43 PM
Can you try on rc55?
02:43
Kishore Nallan
02:43 PM
We did fix one weird memory regression. Not related but memory issues tend to end in surprising behaviour.
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
03:17 PM
Can you share it 🙏
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:31 PM
How do you deploy?
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
03:31 PM
raw binary, but I can extract it from deb
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:34 PM
03:34
Kishore Nallan
03:34 PM
That's the latest.
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
03:35 PM
Will check it in a hour
+11
04:00
Dima
04:00 PM
Still the same 🙃
Jul 28, 2023 (2 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:47 AM
Hmm some way for you to setup a parallel setup where you can somehow send censored values as input into the cluster? This way we can reproduce the issue but on dummy data.

Just saw your DM.