#community-help

Issues with Importing Typesense Collection to Different Server

TLDR Kevin had problems migrating a Typesense collection between Docusaurus sites on different machines. Jason advised them on JSONL format, handling server hosting, and creating a collection schema before importing documents, leading to successful import.

Powered by Struct AI
+11
grin1
raised_hands1
35
1mo
Solved
Join the chat
Aug 15, 2023 (1 month ago)
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
12:44 PM
Hello everyone! Typesense is awesome.

We have successfully integrated Typesense with Docusaurus on localhost, where both Docusaurus and the Typesense server are running on the same machine and where the typesense/docsearch-scraper docker job has been previously run on the same machine and scraped the localhost Docusaurus site. We would like to move the collection that was created by running typesense/docsearch-scraper to a Typesense server running in the test environment but we are having problems. details in thread.
01:51
Kevin
01:51 PM
We did the following -

1.) exported the localhost typesense collection

2.) changed the Docusaurus URLs in the collection JSON file to those of the Docusaurus site in the test environment

3.) created a collection in the Typesense server in the test environment

4.) converted the collection JSON file to a JSONL file and

5.) attempted to import the JSONL file to the newly created collection on the Typesense server on the test platform.

Unfortunately, nothing was imported. Here is a sample of the error messages displayed:

{"code":400,"document":"\"symbology\"","error":"Bad JSON: not a properly formed document.","success":false}
{"code":400,"document":"\"etc\"","error":"Bad JSON: not a properly formed document.","success":false}
{"code":400,"document":"\"etc\"","error":"Bad JSON: not a properly formed document.","success":false}
{"code":400,"document":"\"docs-default-current\"","error":"Bad JSON: not a properly formed document.","success":false}

Would anyone know if this is the correct approach? Typesense is a great tool, but it does not appear to be possible to import a collection by itself, at least via curl.

Or maybe this is a problem with scraped docusaurus sites? Maybe the exported JSON collection file needs to be modified in some way prior to conversion to JSONL?

Typesense meets our security needs, but we do need to test it thoroughly first.

Thank you all!

NOTE: If possible, we would have scraped the test Docusaurus site, but it is behind a login and password and Cloudflare Zero Trust (CF), Google Identity-Aware Proxy (IAP) and Keycloak (KC) are not used.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:59 PM
The format of the content exported by the documents/export endpoint is already in JSONL, so you wouldn’t need to change the format in any way.

Could you share the first two lines from the JSONL file you’re trying to import into the new collection?

head -2 your-exported-documents.jsonl
Aug 16, 2023 (1 month ago)
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
07:06 AM
Here are the first 7:

"6.5"
"6.5"
"default"
{"lvl0":null,"lvl1":null,"lvl2":null,"lvl3":null,"lvl4":null,"lvl5":null,"lvl6":null}
[{"lvl0":null,"lvl1":null,"lvl2":null,"lvl3":null,"lvl4":null,"lvl5":null,"lvl6":null}]
{"lvl0":null,"lvl1":null,"lvl2":null,"lvl3":null,"lvl4":null,"lvl5":null}
{"lvl0":null,"lvl1":null,"lvl2":null,"lvl3":null,"lvl4":null,"lvl5":null}
07:23
Kevin
07:23 AM
In retrospect, I suppose the structure of this file does seem illogical.
08:00
Kevin
08:00 AM
After giving the exported JSON file the extension'.jsonl', I was able to initiate an import. everything processed OK for 30 seconds or so, but then the message 502 Bad Gateway appeared.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:37 PM
Are you using curl to import the JSONL file? Or did you use the Cloud web interface? For large files you want to use the API
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
01:42 PM
I am using curl. The file is 12 MB.
02:06
Kevin
02:06 PM
The 502 problem has gone away after we re-installed the Typesense server.

Now when I attempt to create a collection through an import, the screen returns the following:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.4M    0    24  100 11.4M      1   817k  0:00:14  0:00:14 --:--:--  253k{"message": "Not Found"}
02:16
Kevin
02:16 PM
The 11.4M in the screen output corresponds to the size of the JSONL file.
02:17
Kevin
02:17 PM
Note that I ran curl on bash, if that matters.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:21 PM
Oh wait I forgot that you’re self hosting. So the 502 happens if the gateway / reverse-proxy you have in front of Typesense terminates the connection before the import is fully done. So you want to increase that timeout to as high as say 30 minutes.
02:21
Jason
02:21 PM
The not found issue is separate - you need to first create the collection before importing documents into it
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
02:22 PM
Ok to create the collection, does that mean I need to specify a schema for it as well?
02:23
Kevin
02:23 PM
Thanks!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:47 PM
Correct
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
02:57 PM
OK, is it possible to export the schema from an existing collection? Thanks!
03:05
Kevin
03:05 PM
With, of course, the intention of using the exported schema to create a new collection on a differnt server.
04:59
Jason
04:59 PM
You can then the output JSON of that endpoint to the create collection endpoint: https://typesense.org/docs/0.25.0/api/collections.html#create-a-collection
Aug 17, 2023 (1 month ago)
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
02:16 PM
Hi! I did as you said. I was able to export the schema, and then create a collection using the schema.
02:20
Kevin
02:20 PM
But when I attempted an import, I received lots of messages such as these:

{"code":400,"document":"    \"current\"","error":"Bad JSON: not a properly formed document.","success":false}
{"code":400,"document":"  ],","error":"Bad JSON: [json.exception.parse_error.101] parse error at line 1, column 3: syntax error while parsing value - unexpected ']'; expected '[', '{', or a literal","success":false}
{"code":400,"document":"  \"weight\": {","error":"Bad JSON: [json.exception.parse_error.101] parse error at line 1, column 11: syntax error while parsing value - unexpected ':'; expected end of input","success":false}
{"code":400,"document":"    \"level\": 0,","error":"Bad JSON: [json.exception.parse_error.101] parse error at line 1, column 12: syntax error while parsing value - unexpected ':'; expected end of input","success":false}
{"code":400,"document":"    \"page_rank\": 0,","error":"Bad JSON: [json.exception.parse_error.101] parse error at line 1, column 16: syntax error while parsing value - unexpected ':'; expected end of input","success":false}
{"code":400,"document":"    \"position\": 57,","error":"Bad JSON: [json.exception.parse_error.101] parse error at line 1, column 15: syntax error while parsing value - unexpected ':'; expected end of input","success":false}
{"code":400,"document":"    \"position_descending\": 1","error":"Bad JSON: [json.exception.parse_error.101] parse error at line 1, column 26: syntax error while parsing value - unexpected ':'; expected end of input","success":
02:20
Kevin
02:20 PM
Any idea why?
02:21
Kevin
02:21 PM
Do I need to removed EOL characters in the JSONL file?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:13 PM
Could you share the first few lines on the JSONL file again?
Aug 18, 2023 (1 month ago)
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
07:35 AM
{
  "content": "6.5",
  "content_camel": "6.5",
  "docusaurus_tag": "default",
  "hierarchy": {
    "lvl0": null,
    "lvl1": null,
    "lvl2": null,
    "lvl3": null,
    "lvl4": null,
    "lvl5": null,
    "lvl6": null
  },
08:02
Kevin
08:02 AM
Or rather this is the file exported from the other Typesense server.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:53 PM
JSONL file needs to be one JSON object per line. For eg:

{"id": "124", "company_name": "Stark Industries", "num_employees": 5215, "country": "US"}
{"id": "125", "company_name": "Future Technology", "num_employees": 1232, "country": "UK"}
{"id": "126", "company_name": "Random Corp.", "num_employees": 531, "country": "AU"}
04:53
Jason
04:53 PM
This is how the Typesense export endpoint exports docs as well
04:54
Jason
04:54 PM
It seems like there’s some additional processing you might be doing that’s outputting a formatted JSON object with line-breaks between key values
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
05:48 PM
OK thanks
06:11
Kevin
06:11 PM
You are right. It works. I was able to import. Thanks much!
+11
grin1
raised_hands1
Aug 21, 2023 (1 month ago)
Kevin
Photo of md5-a2785b9d22ba23f3627d4bd877e95e7c
Kevin
07:51 AM
Just one follow up question - do most of the people who use Typesense NOT use curl? In other words do more people use one of the APIs rather than curl to interact with the Typsense server?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:15 PM
Typesense Server exposes an API, and curl just calls that same API
05:15
Jason
05:15 PM
So whether you use curl / browser / client library - it’s the same API that gets called