Hello! I have an issue with the new `enable_typos_...
# community-help
j
Hello! I have an issue with the new
enable_typos_for_alphanumeric_tokens
parameter. I do a search for '_Something (e290)_' and i get a hit on Propane refrigerant (R290) which I want to avoid. For the alpha-numerical identifiers (R290) i want to avoid typo correction, but i cannot get the setting to work. The only way i can get this query not to match is to turn up the
min_len_1typo
parameter some debug info I have hybrid search configured, but turned it off with parameters for this example. Version 27.1
Copy code
options
{'q': 'Something  (e290)', 'query_by': 'display_name,embedding_openai_3', 'prefix': 'false,false', 'filter_by': 'version_id:=24.0.0 && path_normid:=01010000000000 && hidden:=false', 'include_fields': 'normid,display_name', 'vector_query': 'embedding_openai_3:([], distance_threshold:0.0, alpha:0.0)', 'num_typos': 1, 'enable_typos_for_numerical_tokens': 'false', 'enable_typos_for_alpha_numerical_tokens': 'false', 'min_len_1typo': 2, 'min_len_2typo': 7, 'limit': 5, 'prioritize_token_position': 'true'}

response
'{"facet_counts": [], "found": 1, "hits": [{"document": {"display_name": "Propane refrigerant (R290)", "normid": "01010305000000"}, "highlight": {"display_name": {"matched_tokens": ["R290"], "snippet": "Propane refrigerant (<mark>R290</mark>)"}}, "highlights": [{"field": "display_name", "matched_tokens": ["R290"], "snippet": "Propane refrigerant (<mark>R290</mark>)"}], "hybrid_search_info": {"rank_fusion_score": 1.0}, "text_match": 578730054646227065, "text_match_info": {"best_field_score": "1108057784572", "best_field_weight": 15, "fields_matched": 1, "num_tokens_dropped": 1, "score": "578730054646227065", "tokens_matched": 1, "typo_prefix_score": 2}}], "out_of": 38225, "page": 1, "request_params": {"collection_name": "taxonomy:2024-11-06_09-55-35", "first_q": "", "per_page": 5, "q": "Something  (e290)"}, "search_cutoff": false, "search_time_ms": 841}'
j
Could you help reproduce this issue on a minimal dataset using a set of curl commands like this: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b
j
Copy code
### Run Typesense via Docker ########################################
export TYPESENSE_API_KEY=xyz
    
mkdir "$(pwd)"/typesense-data

docker run -p 8108:8108 \
            -v"$(pwd)"/typesense-data:/data typesense/typesense:27.1 \
            --data-dir /data \
            --api-key=$TYPESENSE_API_KEY \
            --enable-cors

### Reproduction Steps ###############################################
export TYPESENSE_API_KEY=xyz

curl "<http://localhost:8108/debug>" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"


curl "<http://localhost:8108/collections>" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "companies",
         "fields": [
           {"name": "company_name", "type": "string" },
           {"name": "num_employees", "type": "int32" },
           {"name": "country", "type": "string", "facet": true }
         ],
         "default_sorting_field": "num_employees"
       }'
       
curl "<http://localhost:8108/collections/companies/documents/import?action=create>" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"id": "124","company_name": "Stark Industries (a123)","num_employees": 5215,"country": "USA"}
            {"id": "125","company_name": "Acme Corp","num_employees": 2133,"country": "CA"}'
            
curl "<http://localhost:8108/multi_search>" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "companies",
              "q": "something b123",
              "query_by": "company_name",
              "enable_typos_for_alphanumeric_tokens": false

            }
          ]
        }'

### Documentation ######################################################################################
# Visit the API reference section: <https://typesense.org/docs/27.1/api/collections.html>
# Click on the "Shell" tab under each API resource's docs, to get shell commands for other API endpoints
I could not confirm wether or not this reproduces the issue. I do not see this info when running the code
j
I do not see this info
May I know which "info" you're referring to?
j
yes i was running the code but i could not tell if the query was returning anything
ok running this more carefully i can confirm that the query returns the document, which reproduces the issue
Copy code
{"results":[{"facet_counts":[],"found":1,"hits":[{"document":{"company_name":"Stark Industries (a123)","country":"USA","id":"124","num_employees":5215},"highlight":{"company_name":{"matched_tokens":["a123"],"snippet":"Stark Industries (<mark>a123</mark>)"}},"highlights":[{"field":"company_name","matched_tokens":["a123"],"snippet":"Stark Industries (<mark>a123</mark>)"}],"text_match":578730054645710969,"text_match_info":{"best_field_score":"1108057784320","best_field_weight":15,"fields_matched":1,"num_tokens_dropped":1,"score":"578730054645710969","tokens_matched":1,"typo_prefix_score":2}}],"out_of":2,"page":1,"request_params":{"collection_name":"companies","first_q":"something b123","per_page":10,"q":"something b123"},"search_cutoff":false,"search_time_ms":1}]}%
@Jason Bosco did you have the chance to look at this? As far as i can tell, it is a bug,
k
I will be looking into this and getting back to you.
👍 1
There is typo in the flag name you've used. Correct name of the flag is:
Copy code
enable_typos_for_alpha_numerical_tokens