https://typesense.org logo
Join Slack
Powered by
# community-help
  • a

    Ashutosh Thakur

    09/03/2025, 10:45 AM
    Hi Team, I wanted to confirm if we are using Typesense correctly, and whether some of our use cases are supported natively or if you suggest a better approach. 1. Grouping by city with synonyms ◦ We need to group documents by
    city
    (e.g. Bangalore vs Bengaluru). ◦ Our current plan is to normalize
    city
    into a canonical field (e.g.
    city_norm
    ) and also add synonyms for query tolerance. • This way, grouping/faceting happens on
    city_norm
    and search queries still match across variants. ◦ → Is this the right approach, or does Typesense provide any built-in support for grouping on synonym sets? 2. Sub-grouping (nested groups) ◦ We would like to do hierarchical grouping in one query (e.g. group by
    city
    → then subgroup by
    supplier_id
    ). • From what we read, Typesense only supports single-level grouping. ◦ → Is there any way to achieve nested grouping in one query, or is the only option to run multiple queries or use composite keys (like
    city|supplier
    )? 3. Sorting groups by another metric ◦ With
    group_by
    , we understand groups can be sorted by
    _group_found
    (size of the group). • What we want is to sort the groups based on an aggregated metric other than the group key. For example: ▪︎ Sum/avg of a numeric field inside the group ◦ Or a pre-computed ranking field stored on the documents • Similarly, we saw facets can be sorted by sibling fields in newer versions, but stats (min/max/sum/avg) only apply to the same field being faceted. ◦ → Can groups or facets be sorted by an aggregation of a different field? Or is the recommended pattern to maintain a roll-up/summary collection externally? We’d love to know if our understanding is correct and if there are better approaches with Typesense that we might have missed. Thanks a lot! cc. @Sahil Rally @Atishay Jain
    • 1
    • 1
  • c

    Charlie Francis

    09/03/2025, 9:38 PM
    Hey, is there a GUI when running typescript locally?
    o
    • 2
    • 7
  • p

    Patrick Gray

    09/03/2025, 10:22 PM
    how "heavy" are the different status endpoints? I'm seeing different behavior between
    /health
    /debug
    and
    /status
    under heavy load
    k
    a
    • 3
    • 12
  • h

    Harsh Barsaiyan

    09/04/2025, 10:48 AM
    Hi guys, How can i use
    gemini-embedding-001
    model for auto embeddings? I am on v28 and tried with the openai compatible api config
    Copy code
    "model_config": {
                            "model_name": "openai/gemini-embedding-001",
                            "api_key": "api_key",
                            "url": "<https://generativelanguage.googleapis.com/v1beta/openai/embeddings>",
                        },
    but this is throwing
    typesense.exceptions.RequestMalformed: [Errno 400] OpenAI API error:
    when i am trying to create the collection.
  • h

    Harsh Barsaiyan

    09/04/2025, 12:59 PM
    So apparently this issue was to be fixed in https://github.com/typesense/typesense/pull/2522, but the way the checks are written here, it will not work for the gemini openai endpoints, which use this endpoint
    /v1beta/openai
    instead of
    /v1
    . cc: @Kishore Nallan @Fanis Tharropoulos
    a
    a
    • 3
    • 3
  • w

    Wahid Bawa

    09/04/2025, 1:14 PM
    Hey all, we recently upgraded to version 29 of typesense from version 26 (big jump I know). Everything was going smoothly until we noticed an issue today. Our imports into typesense take way longer now for bulk imports. For example one of our processes went from 1m 54s to complete, to now almost 5 and a half minutes. This isn't a problem on our smaller collections, but it's causing us issues for our larger collections. We did increase our buffer size and number to 32mb and 6 respectively, could this be the cause for this? Any help would be appreciated.
    f
    a
    • 3
    • 4
  • n

    Nikola Stojisavljević

    09/04/2025, 2:15 PM
    Hello. Is there a chance to do tokenization based on alpanumeric values? For instance, a string "SC3" to be split into "SC" and "3" as two tokens once text gets mixed with numbers?
    a
    • 2
    • 3
  • a

    Andrew Powell

    09/04/2025, 2:43 PM
    Can we get some better error messages for 400s? This just isn’t helpful at all:
    Copy code
    {
      "status": "error",
      "error": {
        "name": "t",
        "httpBody": {
          "created_by": "fb2de4fa",
          "description": null,
          "height": 297,
          "id": "f8fd6a81",
          "name": "Screenshot 2025-07-09 at 10.51.07 AM",
          "organization_id": "9d21c4b1",
          "type": "IMAGE",
          "updated_at": "2025-07-11 13:57:14.179059+00",
          "width": 221
        },
        "httpStatus": 400
      }
    }
    f
    • 2
    • 28
  • t

    Todd Tarsi

    09/04/2025, 4:15 PM
    Hey all, I have kind of a weird collection. Basically, we consistently delete records from this collection over X months old because its got very large records and we can’t afford to keep all history in there. However, I’m pretty stumped by this response shape:
    Copy code
    {
      facet_counts: [],
      found: 4390173,
      hits: [],
      out_of: 17446,
      page: 1,
      request_params: { collection_name: 'calls', first_q: '*', per_page: 0, q: '*' },
      search_cutoff: false,
      search_time_ms: 141
    }
    The mystery of this is that how did we find 4 million out of 17 thousand?
    a
    • 2
    • 6
  • d

    Daniel Martel

    09/04/2025, 7:33 PM
    I tried to auto embed 1 million documents and I hit rate limits on the embedding service. It seems that Typesense swallowed the errors (not seeing it in my Docker logs) and was only able to generate embeddings for a small % of the documents and failed/gave up on the rest. Is there any way for me to tune this better? Or is my only option to drop the collection and index at a rate the embedding service can handle?
    a
    • 2
    • 1
  • r

    Ryan Bubinski

    09/04/2025, 8:24 PM
    I saw an update ~2 weeks ago about v30 code freeze and the possibility of querying on related fields landing in v30, but it looks like that hasn't happened yet? Message about it possibly landing in v30: https://typesense-community.slack.com/archives/C01P749MET0/p1755838501405959?thread_ts=1755779902.292019&amp;cid=C01P749MET0 What I believe is the relevant PR: https://github.com/typesense/typesense/pull/2342 Any update at this point? 🙏 It sounds like this is a highly anticipated feature 😅
    k
    • 2
    • 3
  • j

    Jeremiah Ajayi

    09/05/2025, 1:35 PM
    Hey team 👋 I’m running into very slow indexing performance on our Typesense cluster and could use some guidance. Cluster setup: • 3 nodes, each with 32 vCPUs and 128 GB RAM (Ubuntu 22.04) • Typesense 29.0 Config highlights:
    Copy code
    num-documents-parallel-load=500000
    db-write-buffer-size=536870912
    db-max-write-buffer-number=10
    max-indexing-concurrency=24
    db-max-log-file-size=536870912
    snapshot-interval-seconds=3600
    db-compaction-interval=86400
    Import details: • Endpoint:
    /documents/import?action=upsert
    (JSONL) • Client flushes every 2 seconds, batch size ≈ 1000 (sometimes as low as 100, sometimes a few thousand) • Example log:
    Copy code
    [chats-indexing] Indexed 1117 messages in 101689.79ms
    Observed behavior: • `mpstat`: CPUs ~99% idle during import • `iostat`: disk nearly idle, very low util% and low latency • Performance was slow even with default server configs • Adjusting
    db-max-write-buffer-number
    ,
    db-write-buffer-size
    ,
    max-indexing-concurrency
    didn’t change throughput Questions: • Am I misconfiguring server-side parameters, or is the main bottleneck the way I’m feeding data (small batches, single worker)? • What’s the recommended import pattern for a cluster this size (batch size, concurrency, flush strategy) to saturate available cores and memory? TLDR: Despite big hardware (3×32 cores, 128 GB RAM), indexing is crawling. The system looks under-utilized: CPU, disk, memory all idle.
    a
    k
    • 3
    • 22
  • g

    Gauthier PLM

    09/05/2025, 2:33 PM
    Hello 🙂 Is there any plan to support nl_query / nl_model_id parameters in presets? On Typesense cloud, v29, I tried to define the following preset but nl does not apply. Meanwhile, setting the parameter in my query works.
    Copy code
    {
      "name": "companies-search-preset",
      "value": {
        "collection": "companies",
        "nl_model_id": "gemini-flash",
        "nl_query": true,
        "query_by": "name,thematics.label,segments.label"
      }
    }
    Query that works:
    Copy code
    curl --location '<https://my-cluster.typesense.net/collections/companies/documents/search?q=workday&preset=companies-search-preset&nl_query=true&nl_model_id=gemini-flash>'
    Having these settings supported in presets would make it much easier to enable / disable / tweak without having to push a dedicated release
    a
    • 2
    • 3
  • j

    Jonathan Zylberberg

    09/05/2025, 3:37 PM
    Hey guys question I was hoping someone could help me with. I am self hosting my typesense cluster and have swap space enabled but after some time my swap space gets too high and my server starts having major latency issues / eventually crashes due to the large swap space. When I initially load my collections the memory and swap space are negligable but after some time the queries increase the swap space. How can I go about ensuring this doesnt happen? What parts of a query may cause the swap space to increase?
    a
    k
    • 3
    • 17
  • m

    Mike Karikas

    09/06/2025, 6:47 AM
    Hi, is there anyone who can help me with a production typesense server that keeps getting overwhelmed? We just deployed a large real estate website into production but the performance of typesense has bounced between really fast and great to unbelievably slow. We are self-hosted. Have tried: • Minimizing the original data set of 2M records to 200K • Changing various settings on the server, max-indexing-concurrency and db-write-buffer-size • Increased server size from 16vcpu/32GB RAM to 32vcpu/64GB RAM • Typesense was being primarily updated via Laravel Scout which writes updates for individual records when they are updated, we have greatly reduced this and now do regular bulk upserts. We have thousands of listings being updated at times and it was overwhelming the server - we have now dialed this back a lot, but have even tested with updates completely off and still slow issues. I had thought we could add additional servers and cluster them but learned this is only for stability, not for extra power. It seems that simply being online long enough causes it to become overwhelmed like this. Everything worked so great in production and we completely underestimated the live load, or are missing something about the configuration. Admittedly we have big indexes and queries but had assumed we could work through it with some configuration changes or that the above kind of would have some effect. I cannot believe that cutting our data set down to 10%, doubling server hardware has not had a noticeable effect. I'm kind of desperate for help from anyone with experience with production typesense facing these kind of problems - glad to pay for someone's time. Thank you.
    k
    • 2
    • 3
  • j

    Jesper Møjbæk

    09/08/2025, 7:41 AM
    It seems we are not getting our analytics collections updated. Not sure since when though The setting
    --enable-search-analytics
    is default
    false
    , but I can't set it to true in the configuration page?
    f
    k
    • 3
    • 7
  • g

    Gauthier PLM

    09/09/2025, 8:01 AM
    Hello 🙂 I am encountering an issue with my Typesense cluster which does not answer to multi-search requests and I don't understand the issue. Basically, the request goes but no answer is returned from the cluster and the call timesout. The multi search aggregate 5 requests, and the individual requests are returned without trouble. What's weird is that when I run the exact same setup (same env variables, same version) locally, it works without trouble. The other envs, that use the same code also works, and it seems to be user dependant (not all users have the same scoped search keys). Is it possible that the length of a search key could be causing issues? How may I debug this? I ran out of ideas
    h
    f
    • 3
    • 6
  • m

    Michael Keegan

    09/09/2025, 9:57 AM
    Just wondering about cluster configuration. Looking at the cluster configuration (its a HA cluster) and can see that we're set to 4GB - 2xCPU (4hr burst). Our memory usage is fine but CPU spikes slow everything down from time to time. Would we see much of a performance increase upgrading to 4GB - 2xCPU (i.e. not the burstable option), vs upgrading to 8GB - 2 x CPU (7hr burst). I ask because costs are similar between the two options.
    j
    • 2
    • 2
  • s

    Stephane Demotte

    09/09/2025, 2:15 PM
    Hello everyone ! 2 days to find out the best way to manage SSR with typesense and instantsearch, maybe someone can help me ? I'm in sveltekit, but it should work on any node server, i can make a simple request to typesense using the typesense client. But dont find the right way to convert the url generate by instantsearch to a typesense query, i've search in the instantseach code and typesense-adapter but dont find any function to convert. The url can look like
    Copy code
    <http://localhost:5200/en/construction?project%5BrefinementList%5D%5BfilterRegion%5D%5B0%5D=Montreal&project%5BrefinementList%5D%5BfilterConstructionStatus%5D%5B0%5D=Complete&project%5Bpage%5D=2>
    
    //
    
    URLSearchParams {
      "project[refinementList][filterRegion][0]": "Greater Toronto Area / Golden Horseshoe",
      "project[refinementList][filterRegion][1]": "Montreal",
      "project[refinementList][filterConstructionStatus][0]": "Complete",
      "project[page]": "2",
    }
    How can i easily make a search with the current searchParams without re-create a instantsearch client (with all the widgets) ? Thank you for any idea !
    f
    • 2
    • 2
  • h

    Hung-wei Chuang

    09/09/2025, 3:49 PM
    when using open source embedding models like
    intfloat/e5-base
    , they recommend appending the
    query:
    prefix in front of the search query for best results. does typesense autoembed automatically do this, or do we have to prefix the query ourselves before sending to typesense?
    h
    • 2
    • 1
  • d

    Denny Vuong

    09/10/2025, 8:02 AM
    Hi - I've set up my collection and now it has 72M documents. However, I didn't realise I needed to set the price field facet to True. What is the best way to do this now?
    f
    k
    • 3
    • 16
  • g

    Georgi Nachev

    09/10/2025, 9:35 AM
    Hello, i'm using Laravel scout and want to search in collection with join to other collection /for storage and index optimization/. here is my code:
    Copy code
    public function typesenseSearchParameters()
        {
            $itemCollection = new Product()->searchableAs();
            return [
                'group_by' => 'item_id',
                'group_limit' => 1,
                'include_fields' => '$' . $itemCollection . '(*)',
                'filter_by' => '$' . $itemCollection . '(*)',
                'query_by' => "property_1,property_2,property_3,sku,barcode,\${$itemCollection}(name),\${$itemCollection}(description)"
            ];
        }
    but return error message: Query by reference is not yet supported. There is any way to make search with join in nested collection?
    f
    • 2
    • 2
  • v

    Vikas Chawla

    09/11/2025, 11:04 AM
    Hi, I have a question regarding default sort column in typesense. How can i set default sort column in existing collection ?
    f
    • 2
    • 3
  • i

    Ivan Wolf

    09/11/2025, 11:16 AM
    Hello, we are using Typesense Cloud and are having troubles getting Analytics Rules to work. We've created a collection as follows:
    Copy code
    {
      "name": "search_queries",
      "fields": [
        {
          "name": "q",
          "type": "string"
        },
        {
          "name": "filter_by",
          "type": "string"
        },
        {
          "name": "count",
          "type": "int32"
        }
      ]
    }
    And the created a rule:
    Copy code
    {
      "rules": [
        {
          "name": "popular_queries",
          "params": {
            "source": {
              "collections": [
                "inquiries",
                "orders"
              ]
            },
            "destination": {
              "collection": "search_queries"
            },
            "expand_query": false,
            "limit": 1000
          },
          "type": "popular_queries"
        }
      ]
    }
    No documents are added to the collection
    search_queries
    .
    analytics-flush-interval: 300
    is set. Any help would be greatly appreciated.
    f
    • 2
    • 4
  • u

    Urvis

    09/11/2025, 12:50 PM
    Hello @Kishore Nallan @Jason Bosco @Fanis Tharropoulos I have this document in my collection, Text : "43--YOKE,COMPRESSOR" But I don't get it when I search with this query "COMPRESSOR". Text doesn't have any blank spaces in between. So, what might be the rationale for this thing ? And what should I do to get the document ?
    k
    j
    • 3
    • 3
  • l

    Lukas Matejka

    09/11/2025, 1:09 PM
    Hi, i would like to ask about parameter exhaustive_search=false -> doc says "stopping early when enough results are found" -> what is exactly "enough" and how this is driven, could not find details on that. I'm debuggin real-world scenario where i get only 3 results with this settings, which seems to me quite low (i know there are more results with typos, and prefixes...). Can you share a little bit on this logic and how to influence this?
    f
    • 2
    • 2
  • a

    Alan Buxton

    09/11/2025, 2:33 PM
    Hi everyone - typesense newbie here. I've got a python app and I've just tried to load approx 1 million nodes into my test typesense env. Doing so apparently overloaded something and now in my logs I see this sort of pattern:
    Copy code
    E20250911 07:31:34.140895 1174056 raft_server.cpp:783] 622 queued writes > healthy write lag of 500
    I20250911 07:31:37.149612 1174056 raft_server.cpp:692] Term: 15, pending_queue: 0, last_index: 143407, committed: 143407, known_applied: 143407, applying: 0, pending_writes: 0, queued_writes: 622, local_sequence: 48889904
    I20250911 07:31:37.149675 1174153 raft_server.h:60] Peer refresh succeeded!
    E20250911 07:31:43.174098 1174056 raft_server.cpp:783] 622 queued writes > healthy write lag of 500
    I20250911 07:31:47.192770 1174056 raft_server.cpp:692] Term: 15, pending_queue: 0, last_index: 143407, committed: 143407, known_applied: 143407, applying: 0, pending_writes: 0, queued_writes: 622, local_sequence: 48889904
    I20250911 07:31:47.192821 1174143 raft_server.h:60] Peer refresh succeeded!
    The 622 is not going down. And if I now try to post any more updates to typesense (even with a much smaller batch size than before), I get a
    typesense.exceptions.ServiceUnavailable: [Errno 503] Not Ready or Lagging
    Any guidance on what to do in this situation?
    j
    • 2
    • 4
  • p

    Paul Wallnöfer

    09/12/2025, 8:47 AM
    Hello everyone, i have a question regarding the memory usage and the loading times on a paginated query. First of all lets start with the memory usage. I created a raw JSON file with data that is equal to the data i sent to the server, this means that the server should have about 3x the amount in memory usage. However the JSON file is about
    770MB
    and the memory usage of the server is about
    5.5GB
    . What am i missing here? The collection schema is appended at the end. Now onto the paginated query. I have a paginated filterable table in my application and if i just fetch the first page of the table, the query looks like this:
    Copy code
    {
      "q": "*",
      "page": 1,
      "per_page": 10,
      "limit_hits": 20,
      "include_fields": ", $in_use(), $manufacturers(), $distributors(), $product_bans(*)",
      "filter_by": "(id:* || $in_use(id:) || $tenant_product_distributors(id:) || $product_bans(id:*))",
      "sort_by": "eid:ASC"
    }
    This query takes about
    1.4s
    to finish and i was wondering if i am doing the left joins wrong, because in SQL this takes a few milliseconds. Now if i am trying to filter by a distributor like so:
    Copy code
    {
      "q": "*",
      "page": 1,
      "per_page": 10,
      "limit_hits": 20,
      "include_fields": ", $in_use(), $manufacturers(), $distributors(), $product_bans(*)",
      "filter_by": "$tenant_product_distributors(distributor_id:=2) && (id:* || $in_use(id:) || $product_bans(id:))",
      "sort_by": "eid:ASC"
    }
    The query now takes only
    600ms
    . How is a query faster with a filter on a joined collection than without any filters at all. I get that there will be less documents to join but does this really add up to cutting the the query time in half? Thank you in advance. Here is the collection schema i am using:
    Copy code
    [
      {
        "name": "products",
        "fields":
          [
            { "name": "eid", "type": "int32", "sort": true },
            { "name": "name", "type": "string", "sort": true },
            { "name": "number", "type": "string", "sort": true },
            {
              "name": "manufacturer_id",
              "type": "string",
              "reference": "manufacturers.id"
            },
            { "name": "gs1", "type": "string" },
            { "name": "hibc", "type": "string" },
            { "name": "ean8", "type": "string" },
            { "name": "ean13", "type": "string" }
          ],
        "default_sorting_field": "eid"
      },
      {
        "name": "tenant_product_distributors",
        "fields":
          [
            { "name": "tenant_id", "type": "int32", "index": false },
            {
              "name": "distributor_id",
              "type": "string",
              "reference": "distributors.id"
            },
            { "name": "product_id", "type": "string", "reference": "products.id" }
          ]
      },
      {
        "name": "in_use",
        "fields":
          [
            { "name": "tenant_id", "type": "int32", "index": false },
            { "name": "product_id", "type": "string", "reference": "products.id" },
            { "name": "in_use", "type": "bool", "sort": true }
          ],
        "default_sorting_field": "in_use"
      },
      {
        "name": "product_bans",
        "fields":
          [
            { "name": "tenant_id", "type": "int32", "index": false },
            { "name": "product_id", "type": "string", "reference": "products.id" },
            { "name": "ban", "type": "bool", "sort": true }
          ],
        "default_sorting_field": "ban"
      },
      { "name": "manufacturers", "fields": [{ "name": "name", "type": "string" }] },
      {
        "name": "manufacturer_prefixes",
        "fields":
          [
            {
              "name": "manufacturer_id",
              "type": "string",
              "reference": "manufacturers.id"
            },
            { "name": "prefix", "type": "string" },
            { "name": "prefix_type", "type": "int32" }
          ]
      },
      { "name": "distributors", "fields": [{ "name": "name", "type": "string" }] }
    ]
    h
    • 2
    • 3
  • h

    Hugo Catarino

    09/12/2025, 9:33 AM
    hi everyone! just realized that Typesense collections just support ISO 639-1 language code as "locale" codes. This is a MAJOR limitation for enterprise and is not the best practice for locales. (for example if I want pt-PT / european portuguese / portuguese that applies to portugal within europe continent, and the content is totally different from pt-BR / brazilian portuguese / portuguese that applies to brazil witin south Ameria continent... the proper definition of locale follows the format <language>-<region>, where: • Language: A two-letter ISO 639-1 code representing the primary language (e.g., pt for Portuguese, es for Spanish). • Region: A two-letter ISO 3166-1 alpha-2 country code specifying the regional variant (e.g., PT for Portugal, BR for Brazil). ❓*my question:* is there any simple known way to force proper locales on a typesense server so I can store embeddings per locale and enable semantic search on different LOCALES (not just languages)?
    • 1
    • 2
  • g

    gemini.geek

    09/12/2025, 10:18 AM
    how to check if the indexing is complete after sending few million records in a batch of 50000, i could not find anything related to this in docs?