Speeding up Index Creation in WordPress with Typesense Client
TLDR Jennifer needed help with index creation for large datasets in WordPress. Jason suggested using a timestamped collection with Typesense's PHP client and collection aliases to speed up the process, as well as the batch import endpoint for more efficient indexing.
1
Nov 23, 2021 (24 months ago)
Jennifer
07:15 PMcan somebody give me a hand on index creation of large datasets in wordpress?
Jason
07:20 PMJennifer
07:21 PMI am trying to follow the tutorial based on their book collection.
I have a custom post type with over 50.000 entries that I need to index with acf/custom fields.
I was thinking of creating a wp-cli command for that to speed the process up, but I am fairly new to this so I am a little bit stuck on that
Jennifer
07:23 PMfunction index_lawyers($client) {
$paged = 1;
$count = 0;
do {
$posts = new WP_Query([
'posts_per_page' => 100,
'paged' => $paged,
'post_type' => 'rechtsanwalt',
'post_status' => 'publish'
]);
if (!$posts->have_posts()) {
break;
}
$records = [];
foreach ($posts->posts as $post) {
$record = (array) apply_filters('post_to_record', $post);
if (!isset($record['objectID'])) {
$record['objectID'] = implode('#', [$post->post_type, $post->ID]);
}
$client->collections['lawyers']->documents->create($post);
$records[] = $record;
$count++;
}
$paged++;
} while (true);
}
This is basically a snippet that was provided by algolia search to instantiate the wp-cli command which I have stripped of its origin cli commands because I didnt know how to get my client passed over to the wp-cli
Jennifer
07:23 PM<?php
if (!(defined('WP_CLI') && WP_CLI)) {
return;
}
class Typesearch_Command {
public function reindex_rae($args, $assoc_args) {
global $algolia;
$index = $algolia->initIndex('racom_rae');
$index->clearObjects()->wait();
$paged = 1;
$count = 0;
do {
$posts = new WP_Query([
'posts_per_page' => 100,
'paged' => $paged,
'post_type' => 'rechtsanwalt'
]);
if (!$posts->have_posts()) {
break;
}
$records = [];
foreach ($posts->posts as $post) {
if ($assoc_args['verbose']) {
WP_CLI::line('Serializing ['.$post->post_title.']');
}
$record = (array) apply_filters('post_to_record', $post);
if (!isset($record['objectID'])) {
$record['objectID'] = implode('#', [$post->post_type, $post->ID]);
}
$records[] = $record;
$count++;
}
if (isset($assoc_args) && $assoc_args['verbose']) {
WP_CLI::line('Sending batch');
}
$index->saveObjects($records);
$paged++;
} while (true);
WP_CLI::success("$count lawyers indexed in Algolia");
}
}
WP_CLI::add_command('typesearch', 'Typesearch_Command');
Jennifer
07:23 PMJennifer
07:24 PM$client = new Client xyz
I don't know how the cli command would know about itJason
07:28 PMglobal $algolia;
$index = $algolia->initIndex('racom_rae');
Jason
07:29 PM$algolia
is instantiated? You'd want to have the equivalent and instantiate the Typesense PHP client the same wayJason
07:31 PMJennifer
07:49 PMmy main-plugin file and my wp-cli file
main
/**
* Plugin Name: Racom: Typesearch Custom Integration
* Description: Add Typesearch Search feature
* Version: 1.0.0
*
*/
require_once __DIR__ . '/vendor/autoload.php';
// require_once __DIR__ . '/wp-cli.php';
use Typesense\Client;
$client = new Client(
[
'api_key' => '7Tpl7VzcAXOTJofvTqj4u6dDgnRDXGHZWEuhPFA6Sex1yuUM',
'nodes' => [
[
'host' => 'localhost', // For Typesense Cloud use
'port' => '8108', // For Typesense Cloud use 443
'protocol' => 'http', // For Typesense Cloud use https
],
],
'connection_timeout_seconds' => 2,
]
);
$lawyerSchema = [
'name' => 'lawyers',
'fields' => [
['name' => 'title', 'type' => 'string'],
['name' => 'rechtsgebiete', 'type' => 'string[]', 'facet' => true],
['name' => 'rechtsgebiete_spezialisierungen', 'type' => 'string[]', 'facet' => true],
['name' => 'fachanwaltschaften', 'type' => 'string[]', 'facet' => true],
['name' => 'sprachen', 'type' => 'string[]', 'facet' => true],
['name' => 'zip', 'type' => 'int32', 'facet' => true],
['name' => 'street', 'type' => 'string', 'facet' => true],
['name' => 'city', 'type' => 'string', 'facet' => true],
['name' => 'country', 'type' => 'string', 'facet' => true],
['name' => 'sortkey', 'type' => 'float'],
],
'default_sorting_field' => 'sortkey'
];
$client->collections->create($lawyerSchema);
Jennifer
07:49 PMJennifer
07:50 PMglobal $algolia
to use global $client
1
Jason
07:51 PM$client->collections->create($lawyerSchema)
to the other file, just before creating documentsJennifer
07:54 PMshould I maybe instead put it into something an plugin activation hook?
Jason
07:56 PMJennifer
07:56 PMJason
07:56 PMSo create a new timestamped collection every time, and at the end swap the alias to point to the collection
Jennifer
07:56 PMJason
07:57 PMJason
07:57 PMJennifer
07:58 PMJennifer
07:58 PMJason
07:58 PMJason
07:59 PMTypesense
Indexed 2786 threads (79% resolved)
Similar Threads
Typesense Server Bulk Import/Upsert Issue Resolved
Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.
Querying with Typesense-Js and Handling Null Values
michtio was querying using typesense-js and receiving fewer results than expected. Kishore Nallan suggested using different query parameters. Further discussion led to the handling of 'null' values and filtering syntax in the search queries. The thread ended with Jason offering migration support from Algolia to Typesense.
Moving from Algolia to Typesense: Questions and Answers
Juan sought advice from Kishore Nallan about moving from Algolia to Typesense, handling MultiSearch, setting parameters, checking imported documents, and a specific syntax query.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.
Troubleshooting Typesense Connection Issue and Data Retrieval
Felix encountered issues connecting to a server-created typesense and retrieving data collections. Kishore Nallan and Jason provided guidance, highlighting errors in the original code and suggesting changes to the URL and curl command.