#community-help

Resolving Multilingual Search Function in Typesense Software

TLDR Bill is having difficulty with multilingual search functionality in Typesense software. Developer Kishore Nallan suggested setting a language locale and provided a demo build. The build solution had some issues, and after multiple rounds of software updates and troubleshooting, the problem still persists.

Powered by Struct AI

1

1

Nov 29, 2021 (26 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
07:05 PM
Hello, please check the issue with the Cyrillic (case sensitive). It's a big issue for non latin languages.
Nov 30, 2021 (26 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:41 AM
Hi Bill, I will have a preview build with a fix by end of this week.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
07:38 PM
Perfect! :thumbsup:
Dec 02, 2021 (25 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:14 AM
11:34
Kishore Nallan
11:34 AM
One thing you do need to do is to define a language locale to the fields so that Typesense can handle that language. For e.g. for greek, use the el locale like this:

 curl -k "" -X POST -H "Content-Type: application/json" \
      -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
        "name": "titles", "num_memory_shards": 4,
        "fields": [
          {"name": "title", "type": "string", "locale": "el" },
          {"name": "points", "type": "int32" }
        ]
      }' 
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:58 PM
Hello Kishore Nallan, perfect, I'll test it soon! One question, if I set the title's locale to greek (el) and the content is in other language (english/russian etc..)? The title field in our project gets values from user input and it should be multilingual not for a specific language. In other words, we cannot control the language of the title field because users create records.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:00 PM
Do you have a rough idea of what languages these could be? I.e. could they be Chinese, Korean etc. as well? Or only European languages?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:03 PM
Yes, it could be any language because user controls the title input. We can't limit it to specific languages
02:04
Bill
02:04 PM
If we can have multilingual for European Languages, for now, it's ok. And in a future version you can check for Chinese, Korean etc.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:05 PM
Every language has different tokenization rules. The CJK family don't even use space to separate words/tokens. It's not possible to index without knowing the locale because the same Chinese characters are also used in Japanese as Kanji characters.
02:06
Kishore Nallan
02:06 PM
For now if you use "sr" (Serbian) it should support both Cyrillic and Latin families.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:07 PM
Okay we can continue with European Languages for now. So it's better to use sr instead of el in order to support both Latin and Cyrillic? "el" supports only greek language?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:08 PM
I'm not sure if el will support Latin/ASCII but Serbian can use either system so it should definitely work.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:10 PM
Okay, I'll try it with "sr" and I'll inform you
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:10 PM
👍
Dec 06, 2021 (25 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:10 PM
Hello Kishore, is this build, that you sent me, for ubuntu linux?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:10 PM
Yup, any issues?
12:11
Kishore Nallan
12:11 PM
It's a Linux binary so can be used on any Linux.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:11 PM
We are using Ubuntu Linux, is there a deb package available for this version ?
12:12
Bill
12:12 PM
Our previous installation of Typesense was with a deb package
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:12 PM
I can get you a DEB package but for now if you can replace the binary and give it a spin, then I can get you a proper package soon.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:13 PM
Will have any issue if we install this pakcage instead of the deb?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:13 PM
Nope, the DEB also basically overwrites the linux binary.
12:14
Kishore Nallan
12:14 PM
The location to replace is /usr/bin/typesense-server
12:14
Kishore Nallan
12:14 PM
Stop typesense service, overwrite binary at /usr/bin/typesense-server and then start typesense back up. This is what DEB also does under the hood.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:14 PM
Okay so for now, we will continue with the guide on this section -> Updating Linux binary instead of deb, am I right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:15 PM
Okay actually, hold on, let me see if I can quickly build the DEB for you.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:15 PM
Okay
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:26 PM
Perfect Kishore. Thank you very much!

1

12:26
Bill
12:26 PM
I'll install it now in a 3-node
12:27
Bill
12:27 PM
and I'll inform you about the issue that we had with the language in a moment
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:27 PM
Ok
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:35 PM
I installed in the first node but I receive -> Multi-node with no leader: refusing to reset peers. Connection refused
12:36
Bill
12:36 PM
This is happening because the other 2 nodes are stopped, but when I check the heal status in first node I receive -> {"ok":false}
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:36 PM
Did you stop all the nodes first?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:36 PM
yes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:36 PM
Okay then it's natural that this error will be logged.
12:36
Kishore Nallan
12:36 PM
Start the other nodes in the same way
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:36 PM
The status in first node is {"ok":false}
12:37
Bill
12:37 PM
Maybe because it cannot connect to the other nodes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:37 PM
Yes because the cluster died and is now only 1 node is available.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:37 PM
Okay I'll continue with the other nodes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:37 PM
2/3 nodes needed for quorum for a 3-node setup.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:38 PM
Yes, I just read in docs -> "If you are running Typesense in clustered mode for high availability, make sure you update the nodes one at a time. Wait until the /health endpoint responds with the status code 200 before updating the next node."
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:38 PM
Yup, one by one is best for retaining uptime.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:41 PM
Perfect, all nodes are online!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:41 PM
👍
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:46 PM
I get this error -> "The locale value of the field search_terms is not valid". The search_terms field is "type": "string[]"
12:47
Bill
12:47 PM
{
"facet": *true*,
"index": *true*,
"locale": "sr",
"name": "search_terms",
"optional": *false*,
"type": "string[]"
}
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:47 PM
:thinking_face: One second.
12:49
Kishore Nallan
12:49 PM
Can you post the output of the /debug end-point.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:50 PM
how can I reach the debug endpoint?
12:52
Bill
12:52 PM
got it
12:52
Bill
12:52 PM
{
"state": 4,
"version": "0.21.0"
}
12:52
Bill
12:52 PM
It's 0.21 version again
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:52 PM
That's not running 0.22
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:52 PM
yes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:53 PM
Can you check why DEB did not upgrade? Be careful to check the output during install.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:53 PM
I followed the section on updating guide -> Updating DEB package
12:55
Bill
12:55 PM
Did you generate by mistake the 0.21 version in DEB package or this is an installation issue?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:56 PM
No the DEB is fine. Can you do the update again and see if any errors are produced in the output? Maybe share the log of the output as well so I can look.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:56 PM
yes hold on
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:58 PM
Can you try this:

apt-get -o Dpkg::Options::="--force-confdef" \
        -o Dpkg::Options::="--force-confold" \
        -o Dpkg::Options::="--force-unsafe-io" \
        -y install /path/to/deb
12:58
Kishore Nallan
12:58 PM
Those additional configurations might be needed. If it goes well, check /debug again to confirm.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:04 PM
My mistake, i just copied the configuration on the guide. Now it's okay
01:04
Bill
01:04 PM
{
"state": 4,
"version": "0.23.0.rc4"
}
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:05 PM
👍
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:05 PM
The collection created succefully
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:05 PM
Cool, I hope the searches work fine too.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:09 PM
I'll test in a moment and I'll inform you

1

01:09
Bill
01:09 PM
This version includes more features from v0.22?
01:14
Bill
01:14 PM
I set the locale in "sr" but I have a strange issue. If i type "οδηγός" i get no results. If i type odigos I get results in greek language. :thinking_face:
01:21
Bill
01:21 PM
I tried setting the locale in "el" also, but no results again If I type the term in greek language
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:33 PM
Can you give me a sample document and query?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:35 PM
Yes, can we continue our conversation on Typesense's support chat because it's a private project?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:37 PM
Yes please email me a sample document and query that reproduces the issue.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:38 PM
I sent you in the custom chat
01:38
Bill
01:38 PM
sutomer*
01:38
Bill
01:38 PM
support*
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:40 PM
Ok I will check and respond.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:41 PM
:thumbsup:
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:07 PM
It works for me, I've emailed you a snippet. An end-to-end reproduceable sample like that will help.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:22 PM
Kishore Nallan I responded in support chat
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:23 PM
Ah got it. I missed implementing the fix for facet search. I will have to patch that and get back to you.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:25 PM
Okay
Dec 10, 2021 (25 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Dec 12, 2021 (25 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:59 PM
Kishore Nallan The server doesn't start with this build. I sent you the logs in support chat.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:02 PM
I've just replied.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Troubleshooting Issues with DocSearch Hits and Scraper Configuration

Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.

7

131
8mo
Solved

Diacritics Support in Instantsearch.js RefinementList

Jan queries about enabling special characters in instantsearch.js refinementList. Kishore Nallan admits diacritics support exists for text searches, promising to look into supporting it in query fields. After claiming a fix, Jan later reports issues post-upgrade, which Jason & Kishore Nallan promise to resolve.

1

25
3w

Trouble with DocSearch Scraper and Pipenv Across Multiple OSs

James ran into errors when trying to build Typesense DocSearch Scraper from scratch, and believes it’s because of a bad Pipfile.lock. Jason attempted to replicate the error, and spent hours trying to isolate the issue but ultimately fixed the problem and copied his bash history for future reference. The conversation touches briefly on the subject of using a virtual machine for testing.

7

161
10mo

Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense

Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.

11

225
4mo
Solved

Troubleshooting Typesense Docsearch Scraper Setup Issue

Vinicius experienced issues setting up typesense-docsearch-scraper locally. Jason identified a misconfiguration with the Typesense server after checking the .env file, and recommended using ngrok or port forwarding for development purposes. Vinicius successfully resolved the issue with port forwarding.

2

12
5mo
Solved