I'm having trouble with <query_by_weights>. I hav...
# community-help
s
I'm having trouble with query_by_weights. I have 3 code fields (barcode, SKU, manfacturer_number) and I want to search all 3 equally. But even if I use query_by_weights with equal values, I get results for exact matches of earlier items. Product 1: • SKU: 12345 • MFR#: [some string] • Barcode: [some string] Product 2: • SKU: [some string] • MFR#: 12345 • Barcode: [some string] In that example if I search for "12345" and have "query_by=sku, mfr#, barcode" and "query_by_weights=100,100,100" I still get back just 1 hit for Product 1. If I make the weights "100,120,100" then I get 1 hit for Product 2. Based in the documentation, I expected that equal weights would return both items since they're both an exact match. Is there some other nuance I need to account for, or some bug?
a
Hi @Scott Nei! What typesense version are you using?
w
I work with Scott. Here is your answer
1
a
@Willie and @Scott Nei, Im trying to replicate the issue you are having. The following code worked as expected in the version given. Could you please review and confirm me is the exact situation as yours? Also, can you give me your cluster name? I will try checking the configurations in the schema to see if some other field or configuration could be interfering. Lastly, if you'd like, you can give me a curl replicating the issue you are having so we can look closely.
Sem título.sh
s
@Alan Martini Here are the cURLs, sanitized for security: • cURL 1 returns 4 items, that matched on a value from the SKU array. • cURL 2 returns 9 items, that matched on a value from the mfrNumber array. • They both include query_by_weights set to equal values, and only the order of the query_by field is different. • I can make cURL 1 return the same 9 documents if I change the weights to 100,100,120,100,100 and emphasize the mfrNumber over the sku. cURL 1:
Copy code
curl --location '<https://yuke>...-1.a1.typesense.net/collections/products_prod/documents/search?q=42750&query_by=barcodes%2Cskus%2Cmanufacturer.mfrNumbers%2Cdescription%2CsupplierDescriptions&query_by_weights=100%2C100%2C100%2C100%2C100' \
--header 'accept: application/json, text/plain, */*' \
--header 'accept-language: en-US,en;q=0.9' \
--header 'origin: <https://app.abc.com>' \
--header 'priority: u=1, i' \
--header 'referer: <https://app.abc.com/productSearch=42750>' \
--header 'sec-ch-ua: "Google Chrome";v="135", "Not-A.Brand";v="8", "Chromium";v="135"' \
--header 'sec-ch-ua-mobile: ?0' \
--header 'sec-ch-ua-platform: "Windows"' \
--header 'sec-fetch-dest: empty' \
--header 'sec-fetch-mode: cors' \
--header 'sec-fetch-site: cross-site' \
--header 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36' \
--header 'x-typesense-api-key: abc'
cURL 2:
Copy code
curl --location '<https://yuke>...-1.a1.typesense.net/collections/products_prod/documents/search?q=42750&query_by=barcodes%2Cmanufacturer.mfrNumbers%2Cskus%2Cdescription%2CsupplierDescriptions&query_by_weights=100%2C100%2C100%2C100%2C100' \
--header 'accept: application/json, text/plain, */*' \
--header 'accept-language: en-US,en;q=0.9' \
--header 'origin: <https://app.abc.com>' \
--header 'priority: u=1, i' \
--header 'referer: <https://app.abc.com/productSearch=42750>' \
--header 'sec-ch-ua: "Google Chrome";v="135", "Not-A.Brand";v="8", "Chromium";v="135"' \
--header 'sec-ch-ua-mobile: ?0' \
--header 'sec-ch-ua-platform: "Windows"' \
--header 'sec-fetch-dest: empty' \
--header 'sec-fetch-mode: cors' \
--header 'sec-fetch-site: cross-site' \
--header 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36' \
--header 'x-typesense-api-key: abc'
a
Thank you @Scott Nei
s
@Alan Martini Could you advise on what you've found today? We are in a Beta launch phase for our production users now, but we plan to fully rollout this new search engine next week. But I don't think we can release with this inconsistency. If there is a discrete bug, we could temporarily get around it by creating a new concatenated field of "codes" so they all get equal weight. Or if there is a parameter we need to adjust or something to differently to get the behavior we want, we'll adjust that. But as it is I don't know if there is some broader issue that might stop us from releasing next week.
a
Hi @Scott Nei, It looks like the problem isn’t in
query_by_weights
itself, but rather in
query_by
. Changing the order of itens in it seems to change the results. The team is actively investigating the root cause. The concatenated approach you mentioned is a good work around for now!
s
@Alan Martini My only concern with the concatenated workaround now, is not knowing the root cause. It would resolve this immediate issue, but if the root cause has other side effects we just haven't stumbled on yet, I'm hesitant to add a quick fix for this scenario and launch with the risk of other scenarios revealing themselves. Could you let me know as soon as you have some progress on the root cause, or an expected time frame for it?
Hi @Alan Martini , just checking in again. Is there progress on root cause, to confirm if a simple concatenation fully works around the issue?
a
Hey @Scott Nei, We’ve been trying to reproduce the issue on our test dataset for several hours but haven’t had any luck so far. It seems like it might be something specific to your data, as we haven’t seen similar reports from other users. To get to the bottom of this, we’re now cloning your cluster into a debug environment on our side with enhanced tracing enabled. This will help us narrow down the root cause. Timing-wise, we probably won't be able to get to an actual fix until later next week.
👍 1
Hi @Scott Nei, We’ve identified a better fix for the issue you’re running into. Setting the
max_candidates
parameter to
100
should yield the expected results. Here’s more info on the max_candidates parameter. It looks like the order of the
query_by
values is affecting the default value (
4
) of
max_candidates
some way, which we're looking into separately.
s
@Alan Martini did that technique work for you? I tried it last night in our app by including it in our preset, and there was no difference. I’ll try again with the simplified cURL I shared above.
a
Hey @Scott Nei, It did! I will share with out the test we made, one moment
Output:
Copy code
--------------------------------
Querying '42750' skus with max_candidates
Amount of hits: 77
--------------------------------
Querying '42750' description with max_candidates
Amount of hits: 15
--------------------------------
Querying '42750' skus,description with max_candidates
Amount of hits: 92
--------------------------------
Querying '42750' description,skus with max_candidates
Amount of hits: 92
If you can share the code snippet your app is using, I can help debug it with you.
s
I tried again this morning and it is working with max_candidates. Maybe I misspelled something, or hit a cache somewhere. But I think we're unblocked for now. Is there a bug issue for this? Should query_by + query_by_weights normally not need this max_candidates parameter to do what we want?
a
Hi @Scott Nei, After a deeper look, it turns out that what you observed is actually the expected behavior in Typesense. By default,
max_candidates
is set to 4, and this value applies across all fields listed in
query_by
. That means once 4 candidates are found in total, the search stops, and priority is given based on the order of fields in
query_by
. So when you set the weights to
100,120,100
, you were explicitly boosting the second field, which caused Typesense to prioritize it—leading to the results you saw. Given the structure of your dataset, adjusting
max_candidates
is the right move. The way it's currently working aligns with how the system is designed.
👍 1
thankyou 1