Manav Kothari
11/05/2024, 1:27 PM{
highlight_fields: 'none',
collection: 'people,
include_fields: 'id,organization_id,linkedin_url,title,seniority,job_start_date,first_name,last_name,name,city,state,country,country_code,country_region,$organizations(id,name,website_url, primary_domain, founded_year, linkedin_url, phone, industry_name) as organizations,$development(stage) AS "stage"',
filter_by: "title:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200']) && (id:* || $development(workspace_id : ws30ngrb03luqxcl27))",
page: 2,
per_page: 25,
q: '*'
}
Kishore Nallan
11/05/2024, 1:29 PM['51 to 200']
is not a valid Typesense range filter syntax. It must be [51 .. 200]
Manav Kothari
11/05/2024, 1:30 PM1 to 10
, 11 to 100
etc.Kishore Nallan
11/05/2024, 3:20 PMid:*
condition. We've an idea to optimize that. I will get back to you in a few days with a new build where we address that.Manav Kothari
11/07/2024, 4:59 AMHarpreet Sangar
11/07/2024, 9:51 AM"(title:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200'])) || $development(workspace_id : ws30ngrb03luqxcl27)"
Harpreet Sangar
11/07/2024, 9:58 AMcould you share relevant pr, so I can keep an eye on this update.This PR does improve the evaluation of
id: *
filter when enable_lazy_filter
is true but it won't work with your query since you're doing a wildcard search (q: '*'
).
The change in filter_by should be more relevant to you.Manav Kothari
11/07/2024, 10:10 AM"(title:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200'])) || $development(workspace_id : ws30ngrb03luqxcl27)"
@Harpreet Sangar the meaning of this filter is different right as I want to perform given conditions on records as well as wanna fetch the details from the development table if that record existsHarpreet Sangar
11/07/2024, 10:34 AMtitle:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200'])
is going to only match 1 document out of 100M, so this filter:
title:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200']) && (id:* || $development(workspace_id : ws30ngrb03luqxcl27))
is effectively doing:
1 doc && (100M docs || ...)
which is wasteful.
The docs mention doing id: * || $JoinCollectionName(...)
for left join in case you don't have any filter to apply on the collection you're searching.Manav Kothari
11/07/2024, 10:45 AMManav Kothari
11/07/2024, 1:18 PM"(title:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200'])) || $development(workspace_id : ws30ngrb03luqxcl27)"
i have tried this filter but no luck actually it gets timed out.Harpreet Sangar
11/07/2024, 2:02 PMHarpreet Sangar
11/07/2024, 2:04 PMfound
count is with:
"(title:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200']))"
Manav Kothari
11/07/2024, 2:08 PM99497
this is the no record with above queryHarpreet Sangar
11/08/2024, 2:46 AMfound
count with just
$development(workspace_id : ws30ngrb03luqxcl27)
Kishore Nallan
11/08/2024, 3:04 AMManav Kothari
11/08/2024, 3:40 AM$development(workspace_id : ws30ngrb03luqxcl27)
is approx 100Harpreet Sangar
11/08/2024, 3:44 AM"(title:['Sales Engineer'] && city:['mumbai'] && $organizations(employee_count_range:['51 to 200'])) || $development(workspace_id : ws30ngrb03luqxcl27)"
filter_by times out.
Can you check if sending enable_lazy_filter: true
makes any difference?Manav Kothari
11/08/2024, 3:53 AMManav Kothari
11/08/2024, 3:53 AMenable_lazy_filter: true
no there is no improvement in latency with thisHarpreet Sangar
11/08/2024, 4:08 AMany idea why? i have make index: true, infix: true, facet: true in field
filter_by
only requires to have index: true
. Is it possible for you to share the data so I can analyse and figure out any possible improvement?
Documents having only title
from people
and country
field from organizations
collection will do.Manav Kothari
11/08/2024, 4:11 AMManav Kothari
11/08/2024, 4:12 AM{
"city": "",
"country": "China",
"country_code": "CN",
"country_region": "APAC",
"headquarters.country_name": "China",
"headquarters.state": "",
"id": "tu_oacxnseb3",
"industry": "Import and Export",
"industry_details.naics_code": "444180;423320",
"industry_details.naics_description": "Other Building Material Dealers;Brick; Stone; and Related Construction Material Merchant Wholesalers",
"industry_details.sic_code": "",
"industry_details.sic_description": "",
"is_linkedin_url_claimed": "true",
"monthly_google_adspend": "0",
"name": "Foshan Hotaqi Bath Ware Co.,ltd",
"primary_domain": "<http://hotaqibath.com|hotaqibath.com>",
"specialties": "",
"state": "",
"state_code": "",
"street": ""m
"total_reviews": "0",
"valid_email_count": "",
"website_traffic.monthly_organic": "0",
"website_traffic.monthly_paid": "0",
"website_traffic.total_monthly": "0",
"website_url": "<http://www.hotaqibath.com>",
...other
}
Manav Kothari
11/08/2024, 4:13 AM{
"city": "São Paulo",
"country": "Brazil",
"country_code": "BR",
"country_region": "LATAM",
"email": "",
"email_status": "Not Available",
"first_name": "Luciana",
"id": "tu_pacfr6hnj",
"job_start_date": "2023-07-01",
"last_name": "Franco",
"mobile_number": "",
"name": "Luciana Ferreira Franco",
"organization_id": "tu_oactbh3ln",
"other_mobile_number": "",
"seniority": "Staff",
"state": "",
"state_code": "",
"title": "Coordenadora Técnica Laboratório Central Hsp"
}
Harpreet Sangar
11/08/2024, 4:20 AMcurl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -X GET \
"<http://localhost:8108/collections/people/documents/export?include_fields=title>"
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -X GET \
"<http://localhost:8108/collections/organizations/documents/export?include_fields=organizations>"
Manav Kothari
11/08/2024, 8:04 AMManav Kothari
11/08/2024, 8:10 AMKishore Nallan
11/08/2024, 8:11 AMManav Kothari
11/11/2024, 5:36 AMtitle:['Manager'] && organizations.country:['United States'] && (id: * || $tuesday_development(workspace_id : ws30ngrb03luqxcl27))
any suggestion to improve this, I have already tried this but it's fetching the wrong results.
(title:['Manager'] && organizations.country:['United States']) ||($tuesday_development(workspace_id : ws30ngrb03luqxcl27))
Harpreet Sangar
11/11/2024, 5:39 AMManav Kothari
11/11/2024, 5:47 AM[
{
"document": {
"city": "San Diego",
"country": "United States",
"country_code": "US",
"country_region": "NORAM",
"email": "<mailto:kiana.west@envedabio.com|kiana.west@envedabio.com>",
"email_status": "Verified",
"first_name": "Kiana",
"id": "tu_padk6um2w",
"job_start_date": "2022-11-01",
"organizations.founded_year": "2019",
"organizations.name": "Enveda Biosciences",
"organizations.phone": "",
"organizations.primary_domain": "<http://envedabio.com|envedabio.com>",
"organizations.website_url": "<https://envedabio.com>",
"seniority": "Manager",
"state": "California",
"title": "Product Manager"
},
"highlight": {},
"highlights": []
},
{
"document": {
"city": "Longmont",
"country": "United States",
"country_code": "US",
"country_region": "NORAM",
"email": "",
"email_status": "Not Available",
"first_name": "Jackson",
"id": "tu_pacw23pjg",
"job_start_date": "2022-08-01",
"last_name": "Starkey",
"organizations.name": "Amazon",
"organizations.phone": "",
"organizations.primary_domain": "<http://amazon.com|amazon.com>",
"organizations.website_url": "<https://www.amazon.com>",
"seniority": "Staff",
"state": "Colorado",
"title": "Software Engineer",
"development": {
"stage": "saved"
}
},
"highlight": {},
"highlights": []
},
]
if you see the second result it gives the title = "software engineer" which is wrong.Harpreet Sangar
11/11/2024, 5:52 AMManav Kothari
11/11/2024, 5:55 AMManav Kothari
11/11/2024, 6:13 AM### Run Typesense via Docker ########################################
export TYPESENSE_API_KEY=xyz
mkdir "$(pwd)"/typesense-data
docker run -p 8108:8108 \
-v"$(pwd)"/typesense-data:/data typesense/typesense:27.1 \
--data-dir /data \
--api-key=$TYPESENSE_API_KEY \
--enable-cors
### Reproduction Steps ###############################################
export TYPESENSE_API_KEY=xyz
curl "<http://localhost:8108/debug>" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
curl "<http://localhost:8108/collections>" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"name": "companies",
"fields": [
{"name": "company_name", "type": "string" },
{"name": "num_employees", "type": "int32" },
{"name": "country", "type": "string", "facet": true }
],
"default_sorting_field": "num_employees"
}'
curl "<http://localhost:8108/collections/companies/documents/import?action=create>" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-H "Content-Type: text/plain" \
-X POST \
-d '{"id": "124","company_name": "Stark Industries","num_employees": 5215,"country": "USA"}
{"id": "125","company_name": "Acme Corp 1","num_employees": 2133,"country": "CA"}
{"id": "126","company_name": "Acme Corp 2","num_employees": 2133,"country": "USA"}
{"id": "127","company_name": "Acme Corp 3","num_employees": 2133,"country": "INDIA"}
{"id": "128","company_name": "Stark Industries 2","num_employees": 5215,"country": "USA"}
{"id": "129","company_name": "Acme Corp 4","num_employees": 2133,"country": "CA"}
{"id": "120","company_name": "Acme Corp 5","num_employees": 2133,"country": "USA"}
{"id": "122","company_name": "Acme Corp 6","num_employees": 2133,"country": "INDIA"}'
curl "<http://localhost:8108/collections>" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"name": "development",
"fields": [
{
"index": true,
"name": "workspace_id",
"type": "string"
},
{
"name": "user_id",
"type": "string"
},
{
"name": "company_id",
"reference": "companies.id",
"optional": true,
"index": true,
"type": "string"
},
{
"index": true,
"name": "stage",
"type": "string"
}
]
}'
curl "<http://localhost:8108/collections/development/documents/import?action=create>" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-H "Content-Type: text/plain" \
-X POST \
-d '{"workspace_id": "a","user_id": "test","company_id": "124","stage": "saved"}
'
curl "<http://localhost:8108/multi_search>" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "companies",
"q": "*",
"filter_by":"(country:[`India`] && company_name:[`Acme Corp 6`]) || $development(workspace_id: 'a')"
}
]
}'
### Documentation ######################################################################################
# Visit the API reference section: <https://typesense.org/docs/27.1/api/collections.html>
# Click on the "Shell" tab under each API resource's docs, to get shell commands for other API endpoints
Manav Kothari
11/11/2024, 6:15 AMIndia
] && company_name:[Acme Corp 6
]) || $development(workspace_id: 'a')
{
"facet_counts": [],
"found": 2,
"hits": [
{
"document": {
"company_name": "Stark Industries",
"country": "USA",
"development": {
"company_id": "124",
"id": "1",
"stage": "saved",
"user_id": "test",
"workspace_id": "a"
},
"id": "124",
"num_employees": 5215
},
"highlight": {},
"highlights": []
},
{
"document": {
"company_name": "Acme Corp 6",
"country": "INDIA",
"id": "122",
"num_employees": 2133
},
"highlight": {},
"highlights": []
}
],
"out_of": 8,
"page": 1,
"request_params": {
"collection_name": "companies",
"first_q": "*",
"per_page": 10,
"q": "*"
},
"search_cutoff": false,
"search_time_ms": 0
}
Manav Kothari
11/11/2024, 6:15 AMIndia
] && company_name:[Acme Corp 6
]) && (id:* || $development(workspace_id: 'a'))
{
"facet_counts": [],
"found": 1,
"hits": [
{
"document": {
"company_name": "Acme Corp 6",
"country": "INDIA",
"id": "122",
"num_employees": 2133
},
"highlight": {},
"highlights": []
}
],
"out_of": 8,
"page": 1,
"request_params": {
"collection_name": "companies",
"first_q": "*",
"per_page": 10,
"q": "*"
},
"search_cutoff": false,
"search_time_ms": 0
}
Manav Kothari
11/11/2024, 6:34 AMHarpreet Sangar
11/11/2024, 9:49 AMcountry:[`India`] && company_name:[`Acme Corp 6`] && $development(workspace_id: 'a')
Now to understand how this is equivalent to the filter_by
that produces the correct result:
(country:[`India`] && company_name:[`Acme Corp 6`]) && (id:* || $development(workspace_id: 'a'))
Let's suppose we have 3 documents:
• id: 0 that matches `country:[India
] && company_name:[Acme Corp 6
]` filter and references workspace_id: 'a'
• id: 1 that matches `country:[India
] && company_name:[Acme Corp 6
]` filter but references workspace_id: 'b'
• id: 2 that does not match `country:[India
] && company_name:[Acme Corp 6
]` filter and references workspace_id: 'a'
This is how the filter_by
is evaluated:
(0, 1) && ((0, 1, 2) || (0, 2))
That further is evaluated as:
(0, 1) && (0, 1, 2)
That finally returns only the following documents:
0, 1
Harpreet Sangar
11/11/2024, 9:52 AM&& (id:* || ...)
is wasted evaluation.Manav Kothari
11/11/2024, 9:55 AMcountry:[`India`] && company_name:[`Acme Corp 6`] && $development(workspace_id: 'a')
Harpreet Sangar
11/11/2024, 10:16 AMdevelopment
like:
{
q: *,
filter_by: workspace_id: x,
per_page: 0
}
and the check if the found
count is 0 or not. If 0, your query will be
country:[`India`] && company_name:[`Acme Corp 6`]
otherwise:
country:[`India`] && company_name:[`Acme Corp 6`] && $development(workspace_id: 'a')
• If you wish to achieve the result with a single query, you'll have to send:
(country:[`India`] && company_name:[`Acme Corp 6`]) || (country:[`India`] && company_name:[`Acme Corp 6`] && $development(workspace_id: 'a'))
Manav Kothari
11/14/2024, 4:52 AM"title:['Software Engineer'] && job_start_date_epoch :< 1638247099140"
this is on people's collection of 100M records any idea why it's happening or any way to optimize this. btw this doesn't happen with other filters like.
"title:['Software Engineer'] && country :['india']"
Harpreet Sangar
11/14/2024, 4:57 AMrange_index
on your job_start_date_epoch
field so a range filter like job_start_date_epoch :< 1638247099140
can evaluate faster.Harpreet Sangar
11/14/2024, 4:57 AMManav Kothari
11/14/2024, 5:02 AMHarpreet Sangar
11/15/2024, 1:06 PM"title:['Software Engineer'] && job_start_date_epoch :< 1638247099140"
now with range_index
?Manav Kothari
11/15/2024, 1:24 PMrange_index
i need to remigrate the data again, as we went to production we are carefully doing this, will update you on this ThanksHarpreet Sangar
11/15/2024, 1:29 PMrange_index
will increase the ram usage.Manav Kothari
11/15/2024, 1:30 PMHarpreet Sangar
11/15/2024, 1:39 PMManav Kothari
11/20/2024, 6:56 AMHarpreet Sangar
11/20/2024, 7:02 AMManav Kothari
11/27/2024, 5:25 AMtitle:['Software Engineer','Assistant Manager'] && $development_v2(workspace_id : '1' && stage:!['saved','blocked'])
Harpreet Sangar
11/27/2024, 5:35 AMi want filter to still work if workspace_id: 'x' doesn't exist in development table.Is this still a requirement?
Manav Kothari
11/27/2024, 5:37 AMHarpreet Sangar
11/27/2024, 5:40 AMManav Kothari
11/27/2024, 5:42 AM$development_v2(workspace_id : '1' && stage:['saved','blocked'])
Harpreet Sangar
11/27/2024, 5:54 AMtitle:['Software Engineer','Assistant Manager'] && $development_v2(workspace_id :!= '1' || stage:!=['saved','blocked'])
following De Morgan's laws.Manav Kothari
11/27/2024, 9:43 AMHarpreet Sangar
11/27/2024, 10:34 AMManav Kothari
11/27/2024, 11:14 AMHarpreet Sangar
11/27/2024, 11:17 AMManav Kothari
11/28/2024, 4:47 AMHarpreet Sangar
11/28/2024, 4:58 AMproduct_c
in the CustomerProductPrices
collection. Is the requirement to get product_c
in the result?Manav Kothari
11/28/2024, 5:02 AMHarpreet Sangar
11/28/2024, 5:06 AMHarpreet Sangar
11/28/2024, 5:07 AMManav Kothari
11/28/2024, 5:08 AMManav Kothari
11/28/2024, 5:17 AMHarpreet Sangar
11/28/2024, 5:20 AMfilter_by: $development_v2(workspace_id : '1' && stage:['saved','blocked'])
and then send another query like:
filter_by: id:!=[...]
Harpreet Sangar
11/28/2024, 5:25 AMManav Kothari
11/28/2024, 6:05 AM