Oz Heymann
05/31/2022, 3:20 PM{code: 404, error: "Could not find a field named 'hierarchy.lvl0' in the schema."}
Anyone can help?Jason Bosco
05/31/2022, 3:21 PMOz Heymann
06/01/2022, 11:53 AMOz Heymann
06/01/2022, 2:45 PMstart_urls
and sitemap_urls
. This is the result, and I’m not sure why
docker run -it --env-file=./.env -e "CONFIG=$(cat configs/public/config.json | jq -r tostring)" typesense/docsearch-scraper
INFO:scrapy.utils.log:Scrapy 2.2.1 started (bot: scrapybot)
INFO:scrapy.utils.log:Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.6.9 (default, Dec 8 2021, 21:08:43) - [GCC 8.4.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021), cryptography 3.4.7, Platform Linux-5.10.47-linuxkit-x86_64-with-Ubuntu-18.04-bionic
DEBUG:scrapy.utils.log:Using reactor: twisted.internet.epollreactor.EPollReactor
INFO:scrapy.crawler:Overridden settings:
{'DUPEFILTER_CLASS': 'src.custom_dupefilter.CustomDupeFilter',
'LOG_ENABLED': '1',
'LOG_LEVEL': 'ERROR',
'TELNETCONSOLE_ENABLED': False,
'USER_AGENT': 'Algolia DocSearch Crawler'}
INFO:scrapy.middleware:Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
INFO:scrapy.middleware:Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats',
'src.custom_downloader_middleware.CustomDownloaderMiddleware']
INFO:scrapy.middleware:Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
INFO:scrapy.middleware:Enabled item pipelines:
[]
INFO:scrapy.core.engine:Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.core.engine:Crawled (200) <GET <https://docs.blinkops.com>> (referer: None)
DEBUG:scrapy.core.engine:Crawled (200) <GET <https://docs.blinkops.com/sitemap.xml>> (referer: None)
> DocSearch: <https://docs.blinkops.com> 0 records)
INFO:scrapy.core.engine:Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{'downloader/request_bytes': 427,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 2873,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'elapsed_time_seconds': 0.513301,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 674386),
'memusage/max': 64651264,
'memusage/startup': 64651264,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 161085)}
INFO:scrapy.core.engine:Spider closed (finished)
Crawling issue: nbHits 0 for docusaurus-2
Jason Bosco
06/01/2022, 2:47 PMJason Bosco
06/01/2022, 2:47 PMOz Heymann
06/01/2022, 2:49 PMOz Heymann
06/01/2022, 2:49 PMJason Bosco
06/01/2022, 3:14 PM