Crawler Deleting Old Collection and Creating New Name
TLDR James faced issues with Typesense as the crawler changed collection names, breaking their production website. Jason suggested changing "index_name" in their config file to their desired name and explained the reason behind the name combination differences.
1
Feb 08, 2023 (8 months ago)
James
08:45 PMI read the entire documentation but it doesn't mention anything about managing collection names or somehow specifying the intended collection name to the scaper:
https://typesense.org/docs/guide/docsearch.html#run-the-scraper
Can someone link me to the correct documentation for this? I feel like I must be missing something basic.
Jason
08:50 PMEvery time the scraper runs it does the following:
1. Look at the
index_name
field in your docsearch-scraper config file (let’s say it’s defined as index_name: docs
). 2. Create a new collection called
docs_<current_unix_timestamp>
3. Create/update an alias called
docs
to point to docs_<current_unix_timestamp>
. 4. Delete the previously scrapped version of the docs, stored in
docs_<previous_timestamp>
Think of the
docs
alias as a symlink that points to the latest scraped version of the docs. The scraper handles this updation automatically.In the docsearch config on the frontend, you want to use
docs
as the index name, instead of the timestamped collection nameJason
08:53 PMIsaacScript
. ~Actually that’s the not the scraper configuration. That’s the FE configuration.
Could you share your docsearch-scraper configuration?
James
08:55 PMJames
08:55 PM{
"index_name": "docusaurus-2",
"start_urls": [
"https://isaacscript.github.io/"
],
"sitemap_urls": [
"https://docusaurus.io/sitemap.xml"
],
"sitemap_alternate_links": true,
"stop_urls": [
"/tests"
],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"conversation_id": [
"833762294"
],
"nb_hits": 46250
}
James
08:55 PMJason
08:55 PM"index_name": "docusaurus-2"
Jason
08:55 PMJason
08:56 PM"index_name": "IsaacScript"
and rerun the scraperJames
08:56 PMJason
08:57 PMJason
08:57 PMJames
09:12 PMhttps://github.com/typesense/typesense-website/pull/161/files
Jason
09:43 PMJames
09:44 PMhttps://github.com/typesense/typesense-website/pull/160/files
Jason
09:46 PMThis part feels a little confusing as to whether that includes
https://
or not…James
09:46 PM1
Jason
10:04 PMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Configuring Docusaurus and Typesense for a Documentation Site
Apoorv had trouble adding search functionality to a Docusaurus documentation website with Typesense. Jason worked through several troubleshooting steps, identified issues with Apoorv's setup, and ultimately provided solutions that successfully implemented the search bar function.
Trouble with DocSearch Scraper and Pipenv Across Multiple OSs
James ran into errors when trying to build Typesense DocSearch Scraper from scratch, and believes it’s because of a bad Pipfile.lock. Jason attempted to replicate the error, and spent hours trying to isolate the issue but ultimately fixed the problem and copied his bash history for future reference. The conversation touches briefly on the subject of using a virtual machine for testing.
Solving Typesense Docsearch Scraper Issues
Sandeep was having issues with Typesense's docsearch scraper and getting fewer results than with Algolia's scraper. Jason helped by sharing the query they use and advised checking the running version of the scraper. The issue was resolved when Sandeep ran the non-base regular docker image.
Docsearch Scrapper Metadata Configuration and Filter Problem
Marcos faced issues with Docsearch scrapper not adding metadata attributes and filtering out documents without content. Jason helped fix the issue by updating the scraper and providing filtering instructions.
Typesense Integration Issue in Docusaurus
Benjamin experienced an error implementing Typesense (TS) in Docusaurus. Jason identified the correct placement of the 'typesense' key within the 'themeConfig' within the docusaurus config file, resolving the issue.