#community-help

Production Typesense Issue with Unexpected Filter Behavior

TLDR Ankit flagged a problem with a specific filter on the production server of Typesense. After several exchanges regarding optimisation and version checks, Kishore Nallan provided latest builds to troubleshoot. The filtering within facets issue persists and potential edge cases are being investigated.

Powered by Struct AI
+14
white_check_mark1
Aug 02, 2023 (1 month ago)
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
12:04 PM
Hey, this was a recent issue that we faced on our Production Typesense. Version 0.26.0.rc1.

The index has 600K records, with 30+ facets and takes up about 4GB on RAM when restarted/reindexed.
There is a field of type string[], optional, facet, index all true. Suddenly filtering on this one specific field stopped working as expected. This array field would have only 3 values expected and the responses just using these 3 options were inconsistent. All other array-type filters were working as expected. The only thing that helped fixed was restarting the Typesense service which lead to a reindex.
We also tried replicating it on a smaller dataset and weren't able to do so. The Production server has been live for 2-3 months and hadn't been restarted.

Have there been other cases like that of some kind of data/index corruption leading to filters not working? Is a regular restart recommended?
Let me know if any other information is needed to debug this as it's been hard to recreate it on our end.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:16 PM
Is there a reason for using 0.26.0.rc1?
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
12:17 PM
Yeah, Jason had recommended that one for better optimizations on facets when we initially started using it.

Has facet optimizations made it to any 0.25 RCs?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:18 PM
Got it. Yes 0.26 branch has facets improvements. But this build is now fairly old. Let me merge some recent fixes on 0.25 and share an updated 0.26 build which will be superior.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
12:20 PM
Thanks! Let me know I'll update it on our end.

But have you heard of any such issue before or needed restarts to fix it?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:21 PM
There was one issue which was reported about an array value missing on filtering which has been fixed on 0.25 but not yet on 0.26.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
12:22 PM
Alright, we will upgrade and keep an eye out. Thanks!

Also, any other updates on facet optimizations when using a high number of facets like our use case?
Aug 03, 2023 (1 month ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:10 AM
Please try on 0.26.0.rc9 -- this also contains the facet optimizations.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
10:40 AM
Thanks!
Will try it out.
01:01
Ankit
01:01 PM
Hey, is facet_query working in this version? Seems to be broken for me after upgrading.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:02 PM
Can you give me a query that reproduces the issue?
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
01:06 PM
curl 'URL' \
  -H 'authority: URL' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-US,en-IN;q=0.9,en;q=0.8' \
  -H 'content-type: text/plain' \
  -H 'origin: ' \
  -H 'referer: ' \
  -H 'sec-ch-ua: "Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"query_by":"full_name","min_len_2typo":8,"split_join_tokens":"always","sort_by":"update_time:asc","per_page":0,"highlight_full_fields":"full_name","collection":"people_v2","q":"*","facet_by":"location_state","max_facet_values":20,"page":1,"facet_query":"location_state:new"}]}' \
  --compressed

Response:
[
    {
        "facet_counts": [
            {
                "counts": [],
                "field_name": "location_state",
                "sampled": false,
                "stats": {
                    "total_values": 0
                }
            }
        ],
        "found": 4517283,
        "hits": [],
        "out_of": 4517283,
        "page": 1,
        "request_params": {
            "collection_name": "people_v2",
            "per_page": 0,
            "q": "*"
        },
        "search_cutoff": false,
        "search_time_ms": 1772
    }
]

Searching for "New" in state. Should return values.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:12 PM
Same query without the facet filter works?
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
01:12 PM
Yeah
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:13 PM
Ok let me look into this and get back to you.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
01:13 PM
Thanks!
02:04
Ankit
02:04 PM
Just FYI, when I downgrade to 0.26.0.rc1 The facet_query works as expected.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:16 PM
Yes 0.26.rc1 was a very early build so probably doesn't contain many of the facet changes
+11
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
02:19 PM
Got it. Let us know which build we can try out which would have things from 0.25 and facet optimizations.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:22 PM
Yes I'll let you know once I've had a chance to troubleshoot.
+11
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
02:23 PM
In the meantime, do you recommend we continue using 0.26.0.rc1 or should we switch to a 0.25 rc version?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:24 PM
How long have you been using 0.26.0.rc1 ?
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
02:24 PM
2-3 months. Didn't notice any issues with that.
02:24
Ankit
02:24 PM
Except the one from the start of the thread.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:25 PM
Ah got it. If you can wait potentially till early next week for a fix, then I suggest being on 0.26.0.rc1
02:25
Kishore Nallan
02:25 PM
Hope to get to it earlier if possible.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
02:26 PM
Yeah, sounds fine. We'll stick to that for now. Thanks!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:26 PM
👍 thanks
Aug 04, 2023 (1 month ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:32 PM
I've a new build which has fixed some issues related to faceting in general. Would you be able to try that? 0.26.0.rc10
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
01:47 PM
Thanks! Will try it our shortly and report back.
+11
03:11
Ankit
03:11 PM
Thanks for the quick turn around! The facet optimizations look good and fast!
But the facet_query still doesn't work. It's the same behavior as rc9.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:12 PM
Got it, then it must be a different issue. Is it trivially reproduceable? If so, would you be able to share a small dataset and a query that produces the issue?
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
03:15 PM
Similar to the curl request and response shared above, I just use a searchable RefinementList from instant search on any facet field - Typing anything in the facet search field doesn't return any results for facets.
This should be reproducible on any data set. Do you still need me to share specific data?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:20 PM
I'm not able to reproduce that on a dataset that I usually use for testing facet query. We also have unit tests around this. So it's some edge case that's not covered under tests, so I will appreciate if you can share it.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
03:23 PM
Oh, interesting. Okay happy to share something.
What would be the best way to do that?
We have a typesense cluster on cloud as well do you want to upgrade that and I can create the scenario there?
Or I can share my schema, sample data and the curl request here written out.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:25 PM
Sample data + schema + curl is fine. Or if that data already exists on Typesense cloud in a cluster I can also just load that up on 0.26 if you can give me permission.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
03:43 PM
INDEX_SCHEMA = {
    "enable_nested_fields": True,
    "fields": [{
        "name": "location_state",
        "type": "string",
        "optional": True,
        "index": True,
        "facet": True
    }]
}
// DATA
[
  {
    "location_state": "New Hampshire"
  },
  {
    "location_state": "New Jersey"
  },
  {
    "location_state": "New Mexico"
  },
  {
    "location_state": "New York"
  }
]
//REQUEST
curl 'XXX' \
  -H 'authority: XXX' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-US,en-IN;q=0.9,en;q=0.8' \
  -H 'content-type: text/plain' \
  -H 'origin: XXX' \
  -H 'referer: XXX' \
  -H 'sec-ch-ua: "Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"exhaustive_search":true,"query_by":"location_state","highlight_full_fields":"location_state","collection":"facet_state_test","q":"*","facet_by":"location_state","max_facet_values":10,"page":1,"per_page":0,"facet_query":"location_state:New"}]}' \
  --compressed
//RESPONSE
{
    "results": [
        {
            "facet_counts": [
                {
                    "counts": [],
                    "field_name": "location_state",
                    "sampled": false,
                    "stats": {
                        "total_values": 0
                    }
                }
            ],
            "found": 4,
            "hits": [],
            "out_of": 4,
            "page": 1,
            "request_params": {
                "collection_name": "facet_state_test",
                "per_page": 0,
                "q": "*"
            },
            "search_cutoff": false,
            "search_time_ms": 0
        }
    ]
}

03:43
Ankit
03:43 PM
Create a new schema locally and tried it out. Hope it helps.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:52 PM
Thanks, able to reproduce. I will get back to you.
white_check_mark1
Aug 11, 2023 (1 month ago)
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
11:42 AM
Hey Kishore Nallan, any update on the 26.0 rcs?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:43 AM
Hi Ankit, yes will share a build by tonight
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
11:43 AM
Sounds good thanks! You can let me know here and will test it out at my end.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:03 PM
Done: 0.26.0.rc12 please try with this build.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
01:34 PM
Thanks, trying this out.
02:53
Ankit
02:53 PM
Update: The facet_query which wasn't working is working again. Functionally this version looks good. In terms of facet optimizations that still slow for us with about 20 facets with 4 millions records which makes filtering within the facets super slow. up to 2 seconds to load results without filters and up to 30 seconds to do a facet_query.
08:10
Ankit
08:10 PM
There's something strange with this version. We did a full index rebuild and the results without filters loads fast. But as soon as any facet_query is done all CPUs on the server hit 100% and it becomes very slow post that.
Aug 12, 2023 (1 month ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:30 AM
I'll be happy to debug this further if you can share the dataset and query with us. I'm sure there are some hidden data oriented edge cases that must be fine tuned. Especially for filtering within facets. Is pure faceting fine otherwise?
Aug 14, 2023 (1 month ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:12 AM
Ankit would you be able to share a sample dataset and a query exhibiting the issue?
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
11:30 AM
Hey, yes, will be doing so later today. I need to check if it's the volume of data causing this or if just a small dataset also leads to this. If it's the volume was thinking of giving you some kind of access via a temp admin key on my stage server. If it happens on a smaller dataset I can share a sample dataset as well.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:30 AM
Noted, thanks
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
12:46 PM
Hey, so I checked on a small dataset of a 1000 records, it's not an issue with that. The issue comes up on our server with 4M+ records.
I can share the host and an admin key with you in DM to check. What other information would you need to help debug?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:55 PM
To begin with, just a search only api key and query is fine. That should help me a get a good idea of what's happening. If I need further access, I'll ask.
Ankit
Photo of md5-d9ca032e3941589aafa8433269974f96
Ankit
12:59 PM
Okay. DM'ing you details.
+11