Hello! I am Alexey from Songsterr. I recently depl...
# community-help
a
Hello! I am Alexey from Songsterr. I recently deployed search on Typesense. Thank you for this excellent search engine. It is very fast and searches songs in our catalog very well. We are using a self-hosted setup with three servers. I encountered a problem with searching for songs by the artist's name. On one server, everything is fine, but on the other two servers, there are no results when searching by the artist's name. In which direction should I proceed with debugging? How can I check the state of the search index on each of the three servers?
Songs of artist from typesense by artist Id https://www.songsterr.com/api/artist/56001/songs?size=20 Songs from typesense by artist name: https://www.songsterr.com/api/songs?pattern=knocked%20loose&size=20 This query works differently on different servers:
Copy code
{
  "q": "knocked loose",
  "query_by": "title,artist",
  "query_weight": "1,1",
  "filter_by": "hasPlayer:=true && restriction:!=[AllCountries, US]",
  "sort_by": "totalViews:desc",
  "page": 1,
  "per_page": 20,
  "exhaustive_search": false
}
Copy code
curl "http://${TYPESENSE_HOST}/debug/" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
/debug endpoint returns the correct state for both the leader and followers. Are there any other endpoints to validate the state?
e
sounds like your sending documents to one node directly, and it's not sync'ing to the others?
j
Could you share the last 100 log lines from each of the nodes?
a
sounds like your sending documents to one node directly, and it's not sync'ing to the others?
I don't think so. The query with the filter by artistId works equally on all three nodes.
Copy code
{
  "q": "*",
  "query_by": "title",
  "filter_by": "artistId:=56001 && hasPlayer:=true && restriction:!=[AllCountries, US]",
  "sort_by": "totalViews:desc",
  "page": 1,
  "per_page": 100,
  "exhaustive_search": false
}
and here the logs:
And here how it looks if I query different servers from typesense-dashboard.
last lines without
raft_server.cpp:693
and
Running GC for aborted requests, req map size: 0
j
The indices look in sync based on the logs, but just to be sure, could do a GET /collections and make sure the num_documents field in each collection is the same?
a
it is the same in all three files:
Copy code
"name": "production_search_2024-07-11_03-00-00",
"num_documents": 301041,
I noticed that on c1 (server with full results) there is a bit different field for artist: c1:
Copy code
fields: [{
  "name": "artist",
  "stem": false,
  ...
}...],
c2/c3:
Copy code
fields: [{
  "name": "artist",
  "stem": true,
  ...
}...],
and collection aws created with
stem: true
for this field:
Copy code
{ name: 'artist', type: 'string', stem: true },
k
All servers should have the same type. Can you please try creating a new collection with a field with stemming enabled and tell me if the same problem occurs?
🙏 1
a
Today all servers have equal collections with stemming enabled. And results are consistent: all three can't find
knocked loose
artist by it's exact name, even with the simplest possible query:
Copy code
{
  "q": "Knocked Loose",
  "query_by": "artist"
}
Not sure what I'm doing wrong.
I think I need to disable stemming completely and rely only on typos.
k
Can you produce a small reproducible example that shows the issue like this, with a few records? https://gist.github.com/kishorenc/b2d8a01644dd78a815dbae7882ed74c5
👌 1
a
will try
k
I found a bug with the persistence of the
stem
property which might explain how the servers had different values for the field. I will update this thread with a new build with the fix tomorrow.
❤️ 1
🎉 1
This is fixed in
27.0.rc26
🙏 1
🎉 2
a
That is awesome, thank you!