Ashutosh Thakur
09/03/2025, 10:45 AMcity
(e.g. Bangalore vs Bengaluru).
◦ Our current plan is to normalize city
into a canonical field (e.g. city_norm
) and also add synonyms for query tolerance.
• This way, grouping/faceting happens on city_norm
and search queries still match across variants.
◦ → Is this the right approach, or does Typesense provide any built-in support for grouping on synonym sets?
2. Sub-grouping (nested groups)
◦ We would like to do hierarchical grouping in one query (e.g. group by city
→ then subgroup by supplier_id
).
• From what we read, Typesense only supports single-level grouping.
◦ → Is there any way to achieve nested grouping in one query, or is the only option to run multiple queries or use composite keys (like city|supplier
)?
3. Sorting groups by another metric
◦ With group_by
, we understand groups can be sorted by _group_found
(size of the group).
• What we want is to sort the groups based on an aggregated metric other than the group key. For example:
▪︎ Sum/avg of a numeric field inside the group
◦ Or a pre-computed ranking field stored on the documents
• Similarly, we saw facets can be sorted by sibling fields in newer versions, but stats (min/max/sum/avg) only apply to the same field being faceted.
◦ → Can groups or facets be sorted by an aggregation of a different field? Or is the recommended pattern to maintain a roll-up/summary collection externally?
We’d love to know if our understanding is correct and if there are better approaches with Typesense that we might have missed.
Thanks a lot!
cc. @Sahil Rally @Atishay JainCharlie Francis
09/03/2025, 9:38 PMPatrick Gray
09/03/2025, 10:22 PM/health
/debug
and /status
under heavy loadHarsh Barsaiyan
09/04/2025, 10:48 AMgemini-embedding-001
model for auto embeddings? I am on v28 and tried with the openai compatible api config
"model_config": {
"model_name": "openai/gemini-embedding-001",
"api_key": "api_key",
"url": "<https://generativelanguage.googleapis.com/v1beta/openai/embeddings>",
},
but this is throwing typesense.exceptions.RequestMalformed: [Errno 400] OpenAI API error:
when i am trying to create the collection.Harsh Barsaiyan
09/04/2025, 12:59 PM/v1beta/openai
instead of /v1
.
cc: @Kishore Nallan @Fanis TharropoulosWahid Bawa
09/04/2025, 1:14 PMNikola Stojisavljević
09/04/2025, 2:15 PMAndrew Powell
09/04/2025, 2:43 PM{
"status": "error",
"error": {
"name": "t",
"httpBody": {
"created_by": "fb2de4fa",
"description": null,
"height": 297,
"id": "f8fd6a81",
"name": "Screenshot 2025-07-09 at 10.51.07 AM",
"organization_id": "9d21c4b1",
"type": "IMAGE",
"updated_at": "2025-07-11 13:57:14.179059+00",
"width": 221
},
"httpStatus": 400
}
}
Todd Tarsi
09/04/2025, 4:15 PM{
facet_counts: [],
found: 4390173,
hits: [],
out_of: 17446,
page: 1,
request_params: { collection_name: 'calls', first_q: '*', per_page: 0, q: '*' },
search_cutoff: false,
search_time_ms: 141
}
The mystery of this is that how did we find 4 million out of 17 thousand?Daniel Martel
09/04/2025, 7:33 PMRyan Bubinski
09/04/2025, 8:24 PMJeremiah Ajayi
09/05/2025, 1:35 PMnum-documents-parallel-load=500000
db-write-buffer-size=536870912
db-max-write-buffer-number=10
max-indexing-concurrency=24
db-max-log-file-size=536870912
snapshot-interval-seconds=3600
db-compaction-interval=86400
Import details:
• Endpoint: /documents/import?action=upsert
(JSONL)
• Client flushes every 2 seconds, batch size ≈ 1000 (sometimes as low as 100, sometimes a few thousand)
• Example log:
[chats-indexing] Indexed 1117 messages in 101689.79ms
Observed behavior:
• `mpstat`: CPUs ~99% idle during import
• `iostat`: disk nearly idle, very low util% and low latency
• Performance was slow even with default server configs
• Adjusting db-max-write-buffer-number
, db-write-buffer-size
, max-indexing-concurrency
didn’t change throughput
Questions:
• Am I misconfiguring server-side parameters, or is the main bottleneck the way I’m feeding data (small batches, single worker)?
• What’s the recommended import pattern for a cluster this size (batch size, concurrency, flush strategy) to saturate available cores and memory?
TLDR:
Despite big hardware (3×32 cores, 128 GB RAM), indexing is crawling. The system looks under-utilized: CPU, disk, memory all idle.Gauthier PLM
09/05/2025, 2:33 PM{
"name": "companies-search-preset",
"value": {
"collection": "companies",
"nl_model_id": "gemini-flash",
"nl_query": true,
"query_by": "name,thematics.label,segments.label"
}
}
Query that works:
curl --location '<https://my-cluster.typesense.net/collections/companies/documents/search?q=workday&preset=companies-search-preset&nl_query=true&nl_model_id=gemini-flash>'
Having these settings supported in presets would make it much easier to enable / disable / tweak without having to push a dedicated releaseJonathan Zylberberg
09/05/2025, 3:37 PMMike Karikas
09/06/2025, 6:47 AMJesper Møjbæk
09/08/2025, 7:41 AM--enable-search-analytics
is default false
, but I can't set it to true in the configuration page?Gauthier PLM
09/09/2025, 8:01 AMMichael Keegan
09/09/2025, 9:57 AMStephane Demotte
09/09/2025, 2:15 PM<http://localhost:5200/en/construction?project%5BrefinementList%5D%5BfilterRegion%5D%5B0%5D=Montreal&project%5BrefinementList%5D%5BfilterConstructionStatus%5D%5B0%5D=Complete&project%5Bpage%5D=2>
//
URLSearchParams {
"project[refinementList][filterRegion][0]": "Greater Toronto Area / Golden Horseshoe",
"project[refinementList][filterRegion][1]": "Montreal",
"project[refinementList][filterConstructionStatus][0]": "Complete",
"project[page]": "2",
}
How can i easily make a search with the current searchParams without re-create a instantsearch client (with all the widgets) ?
Thank you for any idea !Hung-wei Chuang
09/09/2025, 3:49 PMintfloat/e5-base
, they recommend appending the query:
prefix in front of the search query for best results. does typesense autoembed automatically do this, or do we have to prefix the query ourselves before sending to typesense?Denny Vuong
09/10/2025, 8:02 AMGeorgi Nachev
09/10/2025, 9:35 AMpublic function typesenseSearchParameters()
{
$itemCollection = new Product()->searchableAs();
return [
'group_by' => 'item_id',
'group_limit' => 1,
'include_fields' => '$' . $itemCollection . '(*)',
'filter_by' => '$' . $itemCollection . '(*)',
'query_by' => "property_1,property_2,property_3,sku,barcode,\${$itemCollection}(name),\${$itemCollection}(description)"
];
}
but return error message: Query by reference is not yet supported. There is any way to make search with join in nested collection?Vikas Chawla
09/11/2025, 11:04 AMIvan Wolf
09/11/2025, 11:16 AM{
"name": "search_queries",
"fields": [
{
"name": "q",
"type": "string"
},
{
"name": "filter_by",
"type": "string"
},
{
"name": "count",
"type": "int32"
}
]
}
And the created a rule:
{
"rules": [
{
"name": "popular_queries",
"params": {
"source": {
"collections": [
"inquiries",
"orders"
]
},
"destination": {
"collection": "search_queries"
},
"expand_query": false,
"limit": 1000
},
"type": "popular_queries"
}
]
}
No documents are added to the collection search_queries
.
analytics-flush-interval: 300
is set.
Any help would be greatly appreciated.Urvis
09/11/2025, 12:50 PMLukas Matejka
09/11/2025, 1:09 PMAlan Buxton
09/11/2025, 2:33 PME20250911 07:31:34.140895 1174056 raft_server.cpp:783] 622 queued writes > healthy write lag of 500
I20250911 07:31:37.149612 1174056 raft_server.cpp:692] Term: 15, pending_queue: 0, last_index: 143407, committed: 143407, known_applied: 143407, applying: 0, pending_writes: 0, queued_writes: 622, local_sequence: 48889904
I20250911 07:31:37.149675 1174153 raft_server.h:60] Peer refresh succeeded!
E20250911 07:31:43.174098 1174056 raft_server.cpp:783] 622 queued writes > healthy write lag of 500
I20250911 07:31:47.192770 1174056 raft_server.cpp:692] Term: 15, pending_queue: 0, last_index: 143407, committed: 143407, known_applied: 143407, applying: 0, pending_writes: 0, queued_writes: 622, local_sequence: 48889904
I20250911 07:31:47.192821 1174143 raft_server.h:60] Peer refresh succeeded!
The 622 is not going down. And if I now try to post any more updates to typesense (even with a much smaller batch size than before), I get a typesense.exceptions.ServiceUnavailable: [Errno 503] Not Ready or Lagging
Any guidance on what to do in this situation?Paul Wallnöfer
09/12/2025, 8:47 AM770MB
and the memory usage of the server is about 5.5GB
. What am i missing here? The collection schema is appended at the end.
Now onto the paginated query. I have a paginated filterable table in my application
and if i just fetch the first page of the table, the query looks like this:
{
"q": "*",
"page": 1,
"per_page": 10,
"limit_hits": 20,
"include_fields": ", $in_use(), $manufacturers(), $distributors(), $product_bans(*)",
"filter_by": "(id:* || $in_use(id:) || $tenant_product_distributors(id:) || $product_bans(id:*))",
"sort_by": "eid:ASC"
}
This query takes about 1.4s
to finish and i was wondering if i am doing the left joins wrong, because in SQL this takes a few milliseconds.
Now if i am trying to filter by a distributor like so:
{
"q": "*",
"page": 1,
"per_page": 10,
"limit_hits": 20,
"include_fields": ", $in_use(), $manufacturers(), $distributors(), $product_bans(*)",
"filter_by": "$tenant_product_distributors(distributor_id:=2) && (id:* || $in_use(id:) || $product_bans(id:))",
"sort_by": "eid:ASC"
}
The query now takes only 600ms
. How is a query faster with a filter on a joined collection than without any filters at all.
I get that there will be less documents to join but does this really add up to cutting the the query time in half?
Thank you in advance.
Here is the collection schema i am using:
[
{
"name": "products",
"fields":
[
{ "name": "eid", "type": "int32", "sort": true },
{ "name": "name", "type": "string", "sort": true },
{ "name": "number", "type": "string", "sort": true },
{
"name": "manufacturer_id",
"type": "string",
"reference": "manufacturers.id"
},
{ "name": "gs1", "type": "string" },
{ "name": "hibc", "type": "string" },
{ "name": "ean8", "type": "string" },
{ "name": "ean13", "type": "string" }
],
"default_sorting_field": "eid"
},
{
"name": "tenant_product_distributors",
"fields":
[
{ "name": "tenant_id", "type": "int32", "index": false },
{
"name": "distributor_id",
"type": "string",
"reference": "distributors.id"
},
{ "name": "product_id", "type": "string", "reference": "products.id" }
]
},
{
"name": "in_use",
"fields":
[
{ "name": "tenant_id", "type": "int32", "index": false },
{ "name": "product_id", "type": "string", "reference": "products.id" },
{ "name": "in_use", "type": "bool", "sort": true }
],
"default_sorting_field": "in_use"
},
{
"name": "product_bans",
"fields":
[
{ "name": "tenant_id", "type": "int32", "index": false },
{ "name": "product_id", "type": "string", "reference": "products.id" },
{ "name": "ban", "type": "bool", "sort": true }
],
"default_sorting_field": "ban"
},
{ "name": "manufacturers", "fields": [{ "name": "name", "type": "string" }] },
{
"name": "manufacturer_prefixes",
"fields":
[
{
"name": "manufacturer_id",
"type": "string",
"reference": "manufacturers.id"
},
{ "name": "prefix", "type": "string" },
{ "name": "prefix_type", "type": "int32" }
]
},
{ "name": "distributors", "fields": [{ "name": "name", "type": "string" }] }
]
Hugo Catarino
09/12/2025, 9:33 AMgemini.geek
09/12/2025, 10:18 AM