James Nesta
02/08/2023, 8:45 PMJason Bosco
02/08/2023, 8:50 PMindex_name
field in your docsearch-scraper config file (let’s say it’s defined as index_name: docs
).
2. Create a new collection called docs_<current_unix_timestamp>
3. Create/update an alias called docs
to point to docs_<current_unix_timestamp>
.
4. Delete the previously scrapped version of the docs, stored in docs_<previous_timestamp>
Think of the docs
alias as a symlink that points to the latest scraped version of the docs. The scraper handles this updation automatically.
In the docsearch config on the frontend, you want to use docs
as the index name, instead of the timestamped collection nameJason Bosco
02/08/2023, 8:53 PMIsaacScript
. ~
Actually that’s the not the scraper configuration. That’s the FE configuration.
Could you share your docsearch-scraper configuration?James Nesta
02/08/2023, 8:55 PMJames Nesta
02/08/2023, 8:55 PM{
"index_name": "docusaurus-2",
"start_urls": [
"<https://isaacscript.github.io/>"
],
"sitemap_urls": [
"<https://docusaurus.io/sitemap.xml>"
],
"sitemap_alternate_links": true,
"stop_urls": [
"/tests"
],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"conversation_id": [
"833762294"
],
"nb_hits": 46250
}
James Nesta
02/08/2023, 8:55 PMJason Bosco
02/08/2023, 8:55 PM"index_name": "docusaurus-2"
Jason Bosco
02/08/2023, 8:55 PMJason Bosco
02/08/2023, 8:56 PM"index_name": "IsaacScript"
and rerun the scraperJames Nesta
02/08/2023, 8:56 PMJason Bosco
02/08/2023, 8:57 PMJason Bosco
02/08/2023, 8:57 PMJames Nesta
02/08/2023, 9:12 PMJason Bosco
02/08/2023, 9:43 PMJames Nesta
02/08/2023, 9:44 PMJason Bosco
02/08/2023, 9:46 PM(i.e. the first part of the public URL that end-users will connect to).This part feels a little confusing as to whether that includes
https://
or not…James Nesta
02/08/2023, 9:46 PMJason Bosco
02/08/2023, 10:04 PM