I can't reproduce it at all in a smaller dataset. Here's what I've done:
I tested in my local machine with both version the following scenarios:
• A collection with only the fields used in the filter_by indexed, but with all the rest of fields in the documents even if they are not used. This is a 46Gb jsonl.
• A collection with only those fields indexed, and only those fields + id in the documents. This is a 2Gb jsonl.
Both of them are 3M+ documents. The issue only happens with the first one, but it's totally reproducible and both collections takes more or less the same memory.
How can I share this with you?