I am trying to get documents with exact matches. H...
# community-help
r
I am trying to get documents with exact matches. Here is my code:
Copy code
self.typesense_client.collections[self.default_collection_name].documents.search({
    'q': '*',
    'filter_by': 'url_without_anchor:=`' + url +'`'
})
There is the schema:
Copy code
{
  "name": "url_without_anchor",
  "type": "string",
  "facet": true,
  "index": true,
  "store": true
}
Problem is when I am searching for url: /en/products/transducers/inertial-sensors/inertial-measurement-units--imu-/3dm-cv5-imu It returns /en/products/transducers/inertial-sensors/inertial-measurement-units--imu-/3dm-cv5-imu/p-xxx /en/products/transducers/inertial-sensors/inertial-measurement-units--imu-/3dm-cv5-imu/p-yyy etc. But search is working fine for following urls • /en/products/transducers/force/c10 • /en/products/instruments/sound-vibration-daq/microphone--calibration/9721-b For those it returns exact matching document, not /en/products/transducers/force/c10/p-.... Please help to to identity the problem, or better way to query. If you required more information about my setup let me know. Thanks in Advance
j
Could you try setting
"token_separators": ["/", ".", "-"]
in the collection schema?
r
@Jason Bosco token separators are already added.
j
Could you share a set of curl commands like this that replicates the issue with a minimal collection and a few sample documents?
r
Please find the schema, documents, curl, response in this zip
Please, let me anything you found. I am counting on you
f
Hey Rohan, The documents file you shared seems to be from a search response. Could you use the export API to export the documents as they are indexed in the collection?
r
hbkworld_documents.jsonl
this contains full collection, nothing changed
f
Could you share how many documents it should be returning?
You need to add
:
to
symbols_to_index
. That's what's causing the issue.
r
Okay, Let me try this. and I'll let you know.
Still not working šŸ™
f
Could you provide a script that indexes the data, creates the collection and searches? It worked on my testing.
r
call we have a short call?
f
Due to our bandwidth, we can't provide calls to users in the public Slack community. If you need prioritized support, you can sign up for a support plan on Typesense Cloud: https://cloud.typesense.org/support-plans
r
Tried all the solution you have mentioned earlier. Still not working. Can you suggest me an workaround for this ?
f
Like Jason mentioned, we'd need a reproducible example like this. On my testing, adding
:
to your collection's
symbols_to_index
parameter fixed the issue. There may be other factors to your current setup that affect this
r
Reproduced the issue using the following file. Note: Please check the last two curls
f
Hey Rohan, thank you for this. I've identified the issue being related to the truncation logic that occurs if a word token is larger than 100 characters when indexing and filtering. We'll mention you when we have a fix
r
@Fanis Tharropoulos Thank you so much, this really comes as a relief.
Hello @Fanis Tharropoulos, Do you have an estimate on when I can expect the fix to be available? Our client dying for this šŸ˜µā€šŸ’«
f
It's this PR: https://github.com/typesense/typesense/pull/2549. You can set notifications by subscribing to the updates of the PR.
šŸ‘ 1