#community-help

Issue Integrating Typesense with Docusaurus Documentation Website

TLDR Oz had issues integrating Typesense with their Docusaurus documentation website, Jason suggested running the docsearch-scraper and making adjustments to the start URLs to resolve the issue.

Powered by Struct AI
+11
9
16mo
Solved
Join the chat
May 31, 2022 (16 months ago)
Oz
Photo of md5-d10c0a74f677627228f32e02e578b298
Oz
03:20 PM
Hi everyone,
I’m trying to add Typesense to my Docusaurus documentation website. I’m using Typesense Cloud. I couldn’t figure how to make the cluster index my website, and all of the requests coming from my website are returning an error {code: 404, error: "Could not find a field named 'hierarchy.lvl0' in the schema."}
Anyone can help?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:21 PM
Oz Did you already run the docsearch-scraper against your site? The scraper is the one that indexes your site into Typesense and creates the schema for you
Jun 01, 2022 (16 months ago)
Oz
Photo of md5-d10c0a74f677627228f32e02e578b298
Oz
11:53 AM
For some reason I thought that with Typesense Cloud this happens automatically. Will do it now, thank you
+11
02:45
Oz
02:45 PM
Jason I’ve tried running the scraper now, using the recommended config for Docusaurus exactly except the start_urls and sitemap_urls. This is the result, and I’m not sure why
docker run -it --env-file=./.env -e "CONFIG=$(cat configs/public/config.json | jq -r tostring)" typesense/docsearch-scraper
INFO:scrapy.utils.log:Scrapy 2.2.1 started (bot: scrapybot)
INFO:scrapy.utils.log:Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.6.9 (default, Dec  8 2021, 21:08:43) - [GCC 8.4.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k  25 Mar 2021), cryptography 3.4.7, Platform Linux-5.10.47-linuxkit-x86_64-with-Ubuntu-18.04-bionic
DEBUG:scrapy.utils.log:Using reactor: twisted.internet.epollreactor.EPollReactor
INFO:scrapy.crawler:Overridden settings:
{'DUPEFILTER_CLASS': 'src.custom_dupefilter.CustomDupeFilter',
 'LOG_ENABLED': '1',
 'LOG_LEVEL': 'ERROR',
 'TELNETCONSOLE_ENABLED': False,
 'USER_AGENT': 'Algolia DocSearch Crawler'}
INFO:scrapy.middleware:Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
INFO:scrapy.middleware:Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats',
 'src.custom_downloader_middleware.CustomDownloaderMiddleware']
INFO:scrapy.middleware:Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
INFO:scrapy.middleware:Enabled item pipelines:
[]
INFO:scrapy.core.engine:Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.core.engine:Crawled (200) <GET https://docs.blinkops.com> (referer: None)
DEBUG:scrapy.core.engine:Crawled (200) <GET https://docs.blinkops.com/sitemap.xml> (referer: None)
> DocSearch: https://docs.blinkops.com 0 records)
INFO:scrapy.core.engine:Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{'downloader/request_bytes': 427,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 2873,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'elapsed_time_seconds': 0.513301,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 674386),
 'memusage/max': 64651264,
 'memusage/startup': 64651264,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 161085)}
INFO:scrapy.core.engine:Spider closed (finished)

Crawling issue: nbHits 0 for docusaurus-2
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:47 PM
I see a 404 when I visit https://docs.blinkops.com
02:47
Jason
02:47 PM
Is that expected?
Oz
Photo of md5-d10c0a74f677627228f32e02e578b298
Oz
02:49 PM
Yep, it was up when I tried it - I removed the deployment because the website is still in work
But I re-deployed again for testing purposes
02:49
Oz
02:49 PM
It’s up now - same output
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:14 PM
Oz Looks like there's a client-side redirect when you visit https://docs.blinkops.com/.... Could you try setting your start URLs to:

https://docs.blinkops.com/docs/documentation
https://docs.blinkops.com/docs/Integrations/