#community-help

Resolving Node.js Limitation in Loading Data to Cloud Cluster

TLDR Ethan was having trouble loading data into a cloud cluster due to a Node.js error. Jason identified the issue and suggested reading the file in a streaming fashion in chunks.

Powered by Struct AI
15
9mo
Solved
Join the chat
Dec 19, 2022 (9 months ago)
Ethan
Photo of md5-85acee380db5007c516a932d917dfa74
Ethan
02:17 PM
Hello again! I purchased a small cloud cluster and am trying to load my data into it, but I'm getting an error
Error: Cannot create a string longer than 0x1fffffe8 characters

I assume it's referring to a particular value in my jsonl set, but I know for certain none of the values are even close to that long. Is it referring to the entire jsonl file? A bit lost here.
02:28
Jason
02:28 PM
Could you share the exact code snippet that throws this error?
02:28
Jason
02:28 PM
and the stack trace?
Ethan
Photo of md5-85acee380db5007c516a932d917dfa74
Ethan
02:32 PM
Yeah doing some research, it appears there's some workarounds to it that people have found
02:35
Ethan
02:35 PM
const schema: CollectionCreateSchema = {
    name: 'transcripts',
    fields: [
      { name: 'text', type: 'string' },
      { name: 'start', type: 'float' },
    ],
    default_sorting_field: 'start',
  };

  console.log('Populating index in Typesense');

  try {
    await client.collections('transcripts').delete();
    console.log('Deleting existing collection: transcripts');
  } catch (error) {
    // Do nothing
  }

  console.log('Creating schema: ');
  console.log(JSON.stringify(schema, null, 2));
  await client.collections().create(schema);

  console.log('Adding records: ');
  const transcripts = require('./data/merged.json');
  try {
    const returnData = await client
      .collections('transcripts')
      .documents()
      .import(transcripts, { action: 'create' });
    console.log(returnData);
    console.log('Done indexing.');
02:35
Ethan
02:35 PM
There's the snippet, if you'd like
02:35
Ethan
02:35 PM
Error: Cannot create a string longer than 0x1fffffe8 characters
    at Object.slice (node:buffer:599:37)
    at Buffer.toString (node:buffer:818:14)
    at Object.readFileSync (node:fs:512:41)
    at Object.Module._extensions..json (node:internal/modules/cjs/loader:1219:22)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Function.Module._load (node:internal/modules/cjs/loader:878:12)
    at Module.require (node:internal/modules/cjs/loader:1061:19)
    at require (node:internal/modules/cjs/helpers:103:18)
    at /Users/ethan/testing/typesense-instantsearch-demo/populateTypesenseIndex.ts:53:23
    at processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 'ERR_STRING_TOO_LONG'
}
02:36
Ethan
02:36 PM
But yes, I believe it's just an issue with the size
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:36 PM
May I know which is line 53 in the snippet you shared? It seems like this is part of a larger snippet
Ethan
Photo of md5-85acee380db5007c516a932d917dfa74
Ethan
02:52 PM
Yes sorry, I was working on breaking the reading into smaller chunks
02:52
Ethan
02:52 PM
const transcripts = require('./data/merged.json');
02:52
Ethan
02:52 PM
That is line 53
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:44 PM
Ah ok, yeah you want to read that file in a streaming fashion in chunks like in this script: https://github.com/typesense/showcase-songs-search/blob/e7ad97ce4e09191743abd727c2dfc949811bbcd6/scripts/indexer/index.js#L154
Ethan
Photo of md5-85acee380db5007c516a932d917dfa74
Ethan
06:09 PM
Perfect!