Resolving Node.js Limitation in Loading Data to Cloud Cluster
TLDR Ethan was having trouble loading data into a cloud cluster due to a Node.js error. Jason identified the issue and suggested reading the file in a streaming fashion in chunks.
Dec 19, 2022 (9 months ago)
Ethan
02:17 PMError: Cannot create a string longer than 0x1fffffe8 characters
I assume it's referring to a particular value in my jsonl set, but I know for certain none of the values are even close to that long. Is it referring to the entire jsonl file? A bit lost here.
Jason
02:27 PMJason
02:28 PMJason
02:28 PMEthan
02:32 PMEthan
02:35 PMconst schema: CollectionCreateSchema = {
name: 'transcripts',
fields: [
{ name: 'text', type: 'string' },
{ name: 'start', type: 'float' },
],
default_sorting_field: 'start',
};
console.log('Populating index in Typesense');
try {
await client.collections('transcripts').delete();
console.log('Deleting existing collection: transcripts');
} catch (error) {
// Do nothing
}
console.log('Creating schema: ');
console.log(JSON.stringify(schema, null, 2));
await client.collections().create(schema);
console.log('Adding records: ');
const transcripts = require('./data/merged.json');
try {
const returnData = await client
.collections('transcripts')
.documents()
.import(transcripts, { action: 'create' });
console.log(returnData);
console.log('Done indexing.');
Ethan
02:35 PMEthan
02:35 PMError: Cannot create a string longer than 0x1fffffe8 characters
at Object.slice (node:buffer:599:37)
at Buffer.toString (node:buffer:818:14)
at Object.readFileSync (node:fs:512:41)
at Object.Module._extensions..json (node:internal/modules/cjs/loader:1219:22)
at Module.load (node:internal/modules/cjs/loader:1037:32)
at Function.Module._load (node:internal/modules/cjs/loader:878:12)
at Module.require (node:internal/modules/cjs/loader:1061:19)
at require (node:internal/modules/cjs/helpers:103:18)
at /Users/ethan/testing/typesense-instantsearch-demo/populateTypesenseIndex.ts:53:23
at processTicksAndRejections (node:internal/process/task_queues:95:5) {
code: 'ERR_STRING_TOO_LONG'
}
Ethan
02:36 PMJason
02:36 PMEthan
02:52 PMEthan
02:52 PMconst transcripts = require('./data/merged.json');
Ethan
02:52 PMJason
03:44 PMEthan
06:09 PMTypesense
Indexed 2764 threads (79% resolved)
Similar Threads
Large JSONL Documents Import Issue & Resolution
Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.

Resolving JSON Parsing Error in Import Function Implementation
Harpreet experienced a JSON parsing failure while implementing an import function. Kishore Nallan suggested double escaping characters in the JSON objects. After testing and discussion, both agreed that the error resulted from not handling JSON-encoded strings manually. Harpreet decided to update the test cases accordingly.

Issues with Importing Typesense Collection to Different Server
Kevin had problems migrating a Typesense collection between Docusaurus sites on different machines. Jason advised them on JSONL format, handling server hosting, and creating a collection schema before importing documents, leading to successful import.



Troubleshooting Indexing Duration in Typesense Import
Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.


Querying with Not-in in Typesense
Masahiro inquired about using not-in queries in Typesense. Kishore Nallan explained how to conduct such queries by using the "-" operator in the query string, and assisted Masahiro with issues stemming from a high number of exclusion tokens. The problem was eventually resolved by switching to the `multi_search` endpoint.


