#community-help

Issue Integrating Typesense with Docusaurus Documentation Website

TLDR Oz had issues integrating Typesense with their Docusaurus documentation website, Jason suggested running the docsearch-scraper and making adjustments to the start URLs to resolve the issue.

Powered by Struct AI

1

May 31, 2022 (19 months ago)
Oz
Photo of md5-d10c0a74f677627228f32e02e578b298
Oz
03:20 PM
Hi everyone,
I’m trying to add Typesense to my Docusaurus documentation website. I’m using Typesense Cloud. I couldn’t figure how to make the cluster index my website, and all of the requests coming from my website are returning an error {code: 404, error: "Could not find a field named 'hierarchy.lvl0' in the schema."}
Anyone can help?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:21 PM
Oz Did you already run the docsearch-scraper against your site? The scraper is the one that indexes your site into Typesense and creates the schema for you
Jun 01, 2022 (19 months ago)
Oz
Photo of md5-d10c0a74f677627228f32e02e578b298
Oz
11:53 AM
For some reason I thought that with Typesense Cloud this happens automatically. Will do it now, thank you

1

02:45
Oz
02:45 PM
Jason I’ve tried running the scraper now, using the recommended config for Docusaurus exactly except the start_urls and sitemap_urls. This is the result, and I’m not sure why
docker run -it --env-file=./.env -e "CONFIG=$(cat configs/public/config.json | jq -r tostring)" typesense/docsearch-scraper
INFO:scrapy.utils.log:Scrapy 2.2.1 started (bot: scrapybot)
INFO:scrapy.utils.log:Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.6.9 (default, Dec  8 2021, 21:08:43) - [GCC 8.4.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k  25 Mar 2021), cryptography 3.4.7, Platform Linux-5.10.47-linuxkit-x86_64-with-Ubuntu-18.04-bionic
DEBUG:scrapy.utils.log:Using reactor: twisted.internet.epollreactor.EPollReactor
INFO:scrapy.crawler:Overridden settings:
{'DUPEFILTER_CLASS': 'src.custom_dupefilter.CustomDupeFilter',
 'LOG_ENABLED': '1',
 'LOG_LEVEL': 'ERROR',
 'TELNETCONSOLE_ENABLED': False,
 'USER_AGENT': 'Algolia DocSearch Crawler'}
INFO:scrapy.middleware:Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
INFO:scrapy.middleware:Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats',
 'src.custom_downloader_middleware.CustomDownloaderMiddleware']
INFO:scrapy.middleware:Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
INFO:scrapy.middleware:Enabled item pipelines:
[]
INFO:scrapy.core.engine:Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.core.engine:Crawled (200) <GET https://docs.blinkops.com> (referer: None)
DEBUG:scrapy.core.engine:Crawled (200) <GET https://docs.blinkops.com/sitemap.xml> (referer: None)
> DocSearch: https://docs.blinkops.com 0 records)
INFO:scrapy.core.engine:Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{'downloader/request_bytes': 427,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 2873,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'elapsed_time_seconds': 0.513301,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 674386),
 'memusage/max': 64651264,
 'memusage/startup': 64651264,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 161085)}
INFO:scrapy.core.engine:Spider closed (finished)

Crawling issue: nbHits 0 for docusaurus-2
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:47 PM
I see a 404 when I visit https://docs.blinkops.com
02:47
Jason
02:47 PM
Is that expected?
Oz
Photo of md5-d10c0a74f677627228f32e02e578b298
Oz
02:49 PM
Yep, it was up when I tried it - I removed the deployment because the website is still in work
But I re-deployed again for testing purposes
02:49
Oz
02:49 PM
It’s up now - same output
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:14 PM
Oz Looks like there's a client-side redirect when you visit https://docs.blinkops.com/.... Could you try setting your start URLs to:

https://docs.blinkops.com/docs/documentation
https://docs.blinkops.com/docs/Integrations/

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Configuring Docusaurus and Typesense for a Documentation Site

Apoorv had trouble adding search functionality to a Docusaurus documentation website with Typesense. Jason worked through several troubleshooting steps, identified issues with Apoorv's setup, and ultimately provided solutions that successfully implemented the search bar function.

1

69
29mo

Typesense Integration Issue in Docusaurus

Benjamin experienced an error implementing Typesense (TS) in Docusaurus. Jason identified the correct placement of the 'typesense' key within the 'themeConfig' within the docusaurus config file, resolving the issue.

7

19
17mo

Solving Typesense Docsearch Scraper Issues

Sandeep was having issues with Typesense's docsearch scraper and getting fewer results than with Algolia's scraper. Jason helped by sharing the query they use and advised checking the running version of the scraper. The issue was resolved when Sandeep ran the non-base regular docker image.

28
24mo

Troubleshooting Issues with DocSearch Hits and Scraper Configuration

Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.

7

131
8mo

Crawler Deleting Old Collection and Creating New Name

James faced issues with Typesense as the crawler changed collection names, breaking their production website. Jason suggested changing "index_name" in their config file to their desired name and explained the reason behind the name combination differences.

1

18
10mo