TLDR Vinicius experienced issues setting up typesense-docsearch-scraper locally. Jason identified a misconfiguration with the Typesense server after checking the .env file, and recommended using ngrok or port forwarding for development purposes. Vinicius successfully resolved the issue with port forwarding.
error msg: Traceback (most recent call last): File "Desktop/Projects/typesense-docsearch-scraper/./docsearch", line 5, in <module> run() File "Desktop/Projects/typesense-docsearch-scraper/cli/src/index.py", line 147, in run exit(command.run(sys.argv[2:])) File "Desktop/Projects/typesense-docsearch-scraper/cli/src/commands/run_config.py", line 21, in run return run_config(args[0]) File "Desktop/Projects/typesense-docsearch-scraper/cli/../scraper/src/index.py", line 44, in run_config typesense_helper.create_tmp_collection() File "Desktop/Projects/typesense-docsearch-scraper/cli/../scraper/src/typesense_helper.py", line 32, in create_tmp_collection print(self.typesense_client.collections.retrieve()) File "/home/vinicius/.local/share/virtualenvs/typesense-docsearch-scraper-RhF6cRUK/lib/python3.10/site-packages/typesense/collections.py", line 21, in retrieve return self.api_call.get('{0}'.format(Collections.RESOURCE_PATH)) File "/home/vinicius/.local/share/virtualenvs/typesense-docsearch-scraper-RhF6cRUK/lib/python3.10/site-packages/typesense/api_call.py", line 138, in get return self.make_request(requests.get, endpoint, as_json, File "/home/vinicius/.local/share/virtualenvs/typesense-docsearch-scraper-RhF6cRUK/lib/python3.10/site-packages/typesense/api_call.py", line 130, in make_request raise last_exception File "/home/vinicius/.local/share/virtualenvs/typesense-docsearch-scraper-RhF6cRUK/lib/python3.10/site-packages/typesense/api_call.py", line 114, in make_request error_message = r.json().get('message', 'API error.') File "/home/vinicius/.local/share/virtualenvs/typesense-docsearch-scraper-RhF6cRUK/lib/python3.10/site-packages/requests/models.py", line 975, in json raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) requests.exceptions.JSONDecodeError: Expecting ',' delimiter: line 1 column 7 (char 6)
Could you share the contents of your .env file?
I suspect the scraper is unable to connect to the Typesense server due to some misconfiguration
TYPESENSE_API_KEY=xyz TYPESENSE_HOST=172.18.182.239 TYPESENSE_PORT=8107 TYPESENSE_PROTOCOL=http TYPESENSE_PATH= # WARNING! Please be aware that the scraper sends auth headers to every scraped site, so use `allowed_domains` to adjust the scope accordingly! # If the scraped site is behind the CloudFlare Access. CF_ACCESS_CLIENT_ID= CF_ACCESS_CLIENT_SECRET= # WARNING! Please be aware that the scraper sends auth headers to every scraped site, so use `allowed_domains` to adjust the scope accordingly! # If the scraped site is behind the Google Cloud Identity-Aware Proxy IAP_AUTH_CLIENT_ID= IAP_AUTH_SERVICE_ACCOUNT_JSON= CHROMEDRIVER_PATH=./chrome-driver/chromedriver
Typesense’s default API port is `8108`. Did you specifically intend to change it to `8107`?
So you were right. I changed my port to 8108 and the crawler is now running. But I'm getting only 1 nb. Not sure why. These are my configurations.
```{
"index_name": "sigma-calibration",
"start_urls": [
"
And this is the debug I got
```DEBUG:typesense.api_call:Making get /aliases/sigma-calibration
DEBUG:typesense.api_call:Try 1 to node 172.18.182.239:8108 -- healthy? True
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 172.18.182.239:8108
DEBUG:urllib3.connectionpool:
The scraper doesn’t support scraping sites running on non-standard ports unfortunately.
I would recommend running something like ngrok to proxy port your local port 3000 to port 443 and then point the scraper at the ngrok url
Since I'm only running on dev, I got it working with port fowarding. Thanks a lot for your help! Everything seems to be working now
Vinicius
Thu, 22 Jun 2023 20:10:53 UTCHi team, than you all for this amazing development Due to network restrictions, I'm currently trying to run the typesense-docsearch-scraper locally. So far I've installed typesense-server with WSL Ubuntu and set it up. The server is running and seems to be fine as curl returns {"ok":true}, but I can't get the scrapper to work
I keep getting the same requests.exceptions.JSONDecodeError that is being raised by virtualenvs/typesense-docsearch-scraper-RhF6cRUK/lib/python3.10/site-packages/requests/models.py
From my debugging, I know that this is being called by the create_tmp_collection function on the typesense_helper.py
When trying to delete in self.typesense_client.collections[self.collection_name_tmp].delete(), the client can't find a collection with that name and then inside /virtualenvs/typesense-docsearch-scraper-RhF6cRUK/lib/python3.10/site-packages/typesense/api_call.py,
r.text is [127.0.1.1:8107][E1002]Fail to find method on `/collections' and r.statuscode is 404.
when it tries to do error_message = r.json().get('message', 'API error.') L 114, from that response, it raises the requests.exceptions.JSONDecodeError in models.py
From my understanding it should raise the ObjectNotFound exception and then pass the try on the helper. Right?
What am I doing wrong here?
Thank you in advance for your attention