#community-help

Creating Local Index with DocSearchScraper and Distributing

TLDR Rotfisch needed help with creating a local index using DocSearchScraper. Jason suggested using documents/export endpoint to export the JSONL file from Typesense.

Powered by Struct AI

1

6
7mo
Solved
Join the chat
Mar 13, 2023 (7 months ago)
Rotfisch
Photo of md5-54f045fadc868912ffcd8bfcfccbccc9
Rotfisch
11:02 AM
I am trying to create an index locally with DocSearchScraper and distribute it to different systems. As I understand the scraper, a collection is created first, which is given to the Typsense service at the end. Is there a way to interrupt at this point and create a JSON file with the index that can be imported later on other machines or is it necessary to do an export at the end of the indexing.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:49 PM
cc Jason
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:20 PM
A collection is first created in Typesense, and then as the scraping happens, each page is imported into Typesense as documents
06:20
Jason
06:20 PM
Here’s the file that manages all communication with Typesense: https://github.com/typesense/typesense-docsearch-scraper/blob/master/scraper/src/typesense_helper.py

You could edit this as needed to write to a local JSONL file
06:21
Jason
06:21 PM
Or if you’re also using Typesense for search, you could let the scraper do it’s thing and the use the documents/export endpoint to export the JSONL file from Typesense: https://typesense.org/docs/0.24.0/api/documents.html#export-documents
Mar 14, 2023 (7 months ago)
Rotfisch
Photo of md5-54f045fadc868912ffcd8bfcfccbccc9
Rotfisch
01:40 PM
Thank you, Jason, i think the export option will be my solution

1