#community-help

Troubleshooting Issues with DocSearch Hits and Scraper Configuration

TLDR Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.

Powered by Struct AI

2

2

1

1

typesense

1

131
6mo
Solved
Join the chat
Mar 20, 2023 (7 months ago)
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:03 PM
here is the result
Image 1 for here is the result
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:03 PM
Awesome 🙌
07:04
Jason
07:04 PM
CC: Abhishek Thank you for that PR! ^ I’ll publish this change for you in the docusaurus theme as well by EOD

1

typesense

1

Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:05 PM
thanks a lot , convert the blank space to lvl0

1

07:13
Rubai
07:13 PM
and one thing is it behaves like as expected ? because I searched a key & got the result on the top but I also got the example or text which is not required . you can see the document under backdrop .

and can we add (...) at the start of hits if the match result are on a long text . so it's easy to understand such that there have some text before that
Image 1 for and one thing is it behaves like as expected ? because I searched  a key & got the result on the top but I also got the example or text which is not required . you can see the document under backdrop .

and can we add `(...)` at the start of hits if the match result are on a long text . so it's easy to understand such that there have some text before that
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:40 PM
> because I searched a key & got the result on the top but I also got the example or text which is not required
The scraper just shows all content that is on the page, as specified by the css selectors. If you don’t want examples to show you, you want to exclude that via css selectors
07:42
Jason
07:42 PM
> and can we add (...) at the start of hits if the match result are on a long text . so it’s easy to understand such that there have some text before that
should be shown at the end of the hits technically… looks like that’s hidden in the UI.

In your docsearch initialization code, could you try adding this:

typesenseSearchParameters: {
  filter_by: '...',
  highlight_affix_num_tokens: 3,
},
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:46 PM
nothing changed after adding this
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:47 PM
Could you share the response from Typesense?
07:50
Rubai
07:50 PM
Image 1 for
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:51 PM
Hmm that doesn’t seem like the same API response for the search query in your screenshot
07:52
Jason
07:52 PM
Could you open the network inspector first, then type in that search query and then send me the api response of the last call to multi_search?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:52 PM
sure
07:54
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:42 PM
Ah, could you also set snippet_threshold: 5?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
08:47 PM
in configJS?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:47 PM
typesenseSearchParameters: {
  filter_by: '...',
  snippet_threshold: 5,
},
08:48
Jason
08:48 PM
in the docsearch initialization code
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
08:49 PM
nothing changed . can't see anything like ... on hits
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:49 PM
Hmm, could you share the curl request to Typesense and the response once again?
08:50
Jason
08:50 PM
Did anything change in the UI at all or does it still look the same?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
08:50 PM
yes
08:51
Rubai
08:51 PM
I added snippet_threshold: 5, but still getting same result like previous one
08:52
Rubai
08:52 PM
curl '' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: text/plain' \
  -H 'Origin: ' \
  -H 'Pragma: no-cache' \
  -H 'Referer: ' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-site' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  --data-raw '{"searches":[{"collection":"Developer_Docs","q":"to be present","query_by":"hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content","include_fields":"hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content,anchor,url,type,id","highlight_full_fields":"hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content","group_by":"url","group_limit":3,"sort_by":"item_priority:desc","filter_by":"product_tag:=payment-page_android","snippet_threshold":5}]}' \
  --compressed
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:53 PM
Could you run ngrok for port 8108 temporarily? I’d like to be able to reach that Typesense server
08:54
Rubai
08:54 PM
I will run it for 8108
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:54 PM
ok, could you share that url with me?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
08:55 PM
sure give me a moment
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:09 PM
Let’s try this:

      "snippet_threshold": 5,
      "highlight_affix_num_tokens": 3
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
09:12 PM
getting same results
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:12 PM
Hmm the response is definitely different from the API
09:12
Jason
09:12 PM
Could you share a screenshot?
09:12
Jason
09:12 PM
of the UI
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
09:14 PM
Image 1 for
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
09:15 PM
yes
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:16 PM
So it’s working now yeah?
09:16
Jason
09:16 PM
Or did I misunderstand the issue
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
09:17 PM
no . you can see there is no ... for the text . actually I am trying to say that the screenshot are taken different time but getting same result that's why it's look like same
09:19
Rubai
09:19 PM
see on to be present ,the 2nd hits . it's a long text that's why I want to add ... at start
09:21
Rubai
09:21 PM
and 2nd one is more matching then 1st one . but it not in the top
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:36 PM
Ahh got it, you’re talking about the ellipsis specifically…
09:36
Jason
09:36 PM
Looking into it
09:39
Jason
09:39 PM
To debug the order of the results, could you upgrade your Typesense server to 0.24.1.rc10 and let me know?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
09:48 PM
as of now the order of result are fine .
I want to say for text hits like this ,start or end with ... . I got this from https://docusaurus.io/ site
Image 1 for as of now the order of result are fine .
I want to say for text hits like this ,start or end with `...` . I got this from <https://docusaurus.io/> site
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:55 PM
Could you upgrade to 3.4.0-1 and check now?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
10:04 PM
getting error on the given version
Image 1 for getting error on the given version
Mar 21, 2023 (7 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:17 AM
Could you try with 3.4.0-8
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
08:54 AM
yea it's working
09:12
Rubai
09:12 AM
sometime before it's working fine but now facing this error
Image 1 for sometime before it's working fine but now facing this error
09:13
Rubai
09:13 AM
searchbar braking while opening the popup of searchbar
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:39 PM
Could you share curl request and response for that screenshot?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
02:10 PM
curl '' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: text/plain' \
  -H 'Origin: ' \
  -H 'Pragma: no-cache' \
  -H 'Referer: ' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-site' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  --data-raw '{"searches":[{"collection":"Developer_Docs","q":"session api","query_by":"hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content","include_fields":"hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content,anchor,url,type,id","highlight_full_fields":"hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content","group_by":"url","group_limit":3,"sort_by":"item_priority:desc","snippet_threshold":5,"highlight_affix_num_tokens":3,"filter_by":"product_tag:=payment-page_android"}]}' \
  --compressed

https://gist.github.com/rubai99/8f21fb34638fd68f0683137d4e6ee810 .
sometimes works ,sometimes braking . now it's breaking
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:07 PM
Could you update your script tag to this:

<script src=""></script>

And then try replicating the same error in that screenshot and post a stack trace?

(This is hopefully pulls in the source-map and shows a proper stack trace)
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
04:28 PM
this error getting when I click on searchbar , it's happens for this version only .
at first when I add the version it's working fine .
after 5-6 min I got an error .
then after 1 hr I add this version again but also get same thing , some time worked but suddenly getting this error . and now I am getting the error also
Image 1 for this error getting when  I click on searchbar , it's happens for this version only .
at first when I add the version it's working fine .
after 5-6 min I got an error .
then after 1 hr I add this version again but also get same thing , some time worked but suddenly getting this error . and now I am getting the error also
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:30 PM
Could you click on “Snippet.js:14:52” in that stack trace and post a screenshot?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:31 PM
Looks like the Typesense URL is pointing to localhost, so I can’t see any search results because of that
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
04:32 PM
Image 1 for
04:34
Rubai
04:34 PM
I think localhost URL is not an issue . because it's working for some times suddenly facing this things
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:34 PM
Yup yup, that’s a different issue, just with the ngrok URL you shared earlier
04:34
Jason
04:34 PM
I’m pushing a potential fix for the root cause.. let’s see
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
04:34 PM
sure
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:37 PM
Could you try with 3.4.0-9?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
04:37 PM
sure
04:38
Rubai
04:38 PM
now it's working fine you can check with my ngrock url
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:39 PM
The ngrok URL still doesn’t work for me, because it’s trying to connect to localhost:8108 to talk to Typesense
Image 1 for The ngrok URL still doesn’t work for me, because it’s trying to connect to localhost:8108 to talk to Typesense
04:40
Jason
04:40 PM
To get it to work, you would have to start a separate ngrok tunnel for port 8108, then use that ngrok URL in the docsearch init code as the typesense hostname… But that’s too much effort, so that’s fine.

Happy to hear that it works now!
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
04:44 PM
Image 1 for

1

04:47
Rubai
04:47 PM
thanks Jason. till now I am asked for lot of issues , sorry for that .
now it's working fine . great work👏

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:48 PM
That’s great to hear! Thank you for helping catch all these issues!

1

Mar 22, 2023 (6 months ago)
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
12:37 PM
hi Jason is this possible to run multiple collection ? we have two product payment-page &amp; upi-inapp in our documentation ,
suppose 1st time we run the scraper for collection Developer_Docs_upi-inapp and again run the scraper for other collection of Developer_Docs_payment-page , so can we access both collection in a single documentation ,

the benefit of this is when anything change happens for a product then we can scrape again for this particular product 's collection only . so here we don't need to run the scraper every product's collection .
for reference you can check our documentation https://docs.juspay.in/
Image 1 for hi <@4L6c7> is this possible to run multiple collection ?  we have two product `payment-page` &amp; `upi-inapp`   in our documentation ,
suppose 1st time we run the scraper for collection  `Developer_Docs_upi-inapp` and again run the scraper for other collection of `Developer_Docs_payment-page` , so can we access both collection in a single documentation ,

the benefit of this is when anything change happens for a product then we can scrape again for this particular product 's collection  only  . so here we don't need to run the scraper every product's collection .
for reference you can check our documentation <https://docs.juspay.in/>
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:40 PM
Unfortunately this is not possible to do with the scraper - it creates a whole new collection each time.

So you would have to fork the scraper and update it appropriately, if you want to do partial scraping into the same collection

1

Mar 23, 2023 (6 months ago)
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:31 AM
Jason is there any API to run the scraper in production ,
and what are changes needed in .env for production
TYPESENSE_API_KEY=xyz
TYPESENSE_HOST=host.docker.internal
TYPESENSE_PORT=8108
TYPESENSE_PROTOCOL=http
07:05
Rubai
07:05 PM
hey Jason anything about that ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:29 PM
You would have to use something like say AWS Fargate (or any docker-based runtime even in your CI pipeline) to run the scraper using the docker image
07:30
Jason
07:30 PM
On Typesense Cloud, we only host the Typesense cluster itself - you still need to run the scraper in your infrastructure
07:30
Jason
07:30 PM
If you’re using Typesense Cloud, the .env file would look something like this:

TYPESENSE_API_KEY=<GENERATED_FROM_DASHBOARD>
TYPESENSE_HOST=
TYPESENSE_PORT=443
TYPESENSE_PROTOCOL=https
07:31
Jason
07:31 PM
The host and api key will be generated once you provision a cluster
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:36 PM
and anything about chromedriver path ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:37 PM
You can leave that as the default, there’s a chrome executable inside the docker image we publish
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:38 PM
so we have to build an api to run the scraper for production
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:49 PM
You can just run the docker command directly
07:49
Jason
07:49 PM
For eg, here’s how we call AWS Fargate that runs the docker command for us for the Typesense docs website: https://github.com/typesense/typesense-website/blob/cca16595a480dc880145bf8b01b8464476ba051e/docs-site/package.json#L12
Mar 25, 2023 (6 months ago)
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
10:21 PM
can you please have a look why getting the error ??
the error getting while build dockerfile:base from typesense-docsearch-scraper
can i change the version to 111.0.5563.110-1 . after changing the version can it be effects anything
Image 1 for can you please have a look why getting the error ??
the error getting while build dockerfile:base from typesense-docsearch-scraper
can i change the version to 111.0.5563.110-1 . after changing the version can it be effects anything
10:37
Rubai
10:37 PM
and getting this error while run the scraper . what is the issue here ?
Image 1 for and getting this error while run the scraper . what is the issue here ?
Mar 26, 2023 (6 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:52 AM
You need to build the scraper on a Linux machine with intel cpu. Building it on a mac, especially an M1 doesn’t seem to work
01:53
Jason
01:53 AM
Any reason you’re not using the prebuilt Docker image we’ve published?
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
11:11 AM
we made some changes that's why not using pre built docker image. we used dynamic config
06:51
Rubai
06:51 PM
then may I have change anything in the code for pod deployment for build the scraper
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:59 PM
I haven’t been able to get the scraper to build on M1. So you have to spin up a Linux VM and build it from there
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
11:32 PM
that's for the 1st time . after that if anything change on docs we have to scrape the docs again . what should we do for that case ?
Mar 27, 2023 (6 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:45 AM
You have to push the docker image you build to a docker registry, and then pull the pre-built image from there anytime you want to scrape
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
09:45 AM
for typesense can we change host='host.docker.internal' to host='localhost' , cause we don't use docker as of now to run the scraper . we run it from VS code via an API and getting this error
Image 1 for for typesense can we change  `host='host.docker.internal'` to `host='localhost'`  , cause we don't use docker as of now to run the scraper  . we run it from VS code  via an API and getting this error
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:54 PM
Yes, you can change that to any hostname, including localhost in the .env file you’re using
Mar 28, 2023 (6 months ago)
Rubai
Photo of md5-89fb99de3bf7e23767aaf9108a5636ad
Rubai
07:44 AM
sitemap not working for me