#community-help

Speeding up Index Creation in WordPress with Typesense Client

TLDR Jennifer needed help with index creation for large datasets in WordPress. Jason suggested using a timestamped collection with Typesense's PHP client and collection aliases to speed up the process, as well as the batch import endpoint for more efficient indexing.

Powered by Struct AI

1

Nov 23, 2021 (24 months ago)
Jennifer
Photo of md5-4344072cdd3bbbfd816dd77ba2aee65f
Jennifer
07:15 PM
๐Ÿ‘‹ Hi everyone!

can somebody give me a hand on index creation of large datasets in wordpress?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:20 PM
Hi Jennifer! I can help. Are you running into any specific issues?
Jennifer
Photo of md5-4344072cdd3bbbfd816dd77ba2aee65f
Jennifer
07:21 PM
Yes and no...lol

I am trying to follow the tutorial based on their book collection.

I have a custom post type with over 50.000 entries that I need to index with acf/custom fields.

I was thinking of creating a wp-cli command for that to speed the process up, but I am fairly new to this so I am a little bit stuck on that
07:23
Jennifer
07:23 PM
function index_lawyers($client) {
    $paged = 1;
    $count = 0;

    do {
        $posts = new WP_Query([
            'posts_per_page' => 100,
            'paged' => $paged,
            'post_type' => 'rechtsanwalt',
            'post_status' => 'publish'
        ]);

        if (!$posts->have_posts()) {
            break;
        }

        $records = [];

        foreach ($posts->posts as $post) {
            $record = (array) apply_filters('post_to_record', $post);

            if (!isset($record['objectID'])) {
                $record['objectID'] = implode('#', [$post->post_type, $post->ID]);
            }
            $client->collections['lawyers']->documents->create($post);

            $records[] = $record;
            $count++;
        }

        $paged++;

    } while (true);
}

This is basically a snippet that was provided by algolia search to instantiate the wp-cli command which I have stripped of its origin cli commands because I didnt know how to get my client passed over to the wp-cli
07:23
Jennifer
07:23 PM
<?php

if (!(defined('WP_CLI') && WP_CLI)) {
    return;
}

class Typesearch_Command {
    public function reindex_rae($args, $assoc_args) {
        global $algolia;
        $index = $algolia->initIndex('racom_rae');

        $index->clearObjects()->wait();

        $paged = 1;
        $count = 0;

        do {
            $posts = new WP_Query([
                'posts_per_page' => 100,
                'paged' => $paged,
                'post_type' => 'rechtsanwalt'
            ]);

            if (!$posts->have_posts()) {
                break;
            }

            $records = [];

            foreach ($posts->posts as $post) {
                if ($assoc_args['verbose']) {
                    WP_CLI::line('Serializing ['.$post->post_title.']');
                }
                $record = (array) apply_filters('post_to_record', $post);

                if (!isset($record['objectID'])) {
                    $record['objectID'] = implode('#', [$post->post_type, $post->ID]);
                }

                $records[] = $record;
                $count++;
            }

            if (isset($assoc_args) && $assoc_args['verbose']) {
                WP_CLI::line('Sending batch');
            }

            $index->saveObjects($records);

            $paged++;

        } while (true);

        WP_CLI::success("$count lawyers indexed in Algolia");
    }
}


WP_CLI::add_command('typesearch', 'Typesearch_Command');
07:23
Jennifer
07:23 PM
this is the original class
07:24
Jennifer
07:24 PM
but as I said when I do $client = new Client xyzI don't know how the cli command would know about it
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:28 PM
This line is interesting in the original Class:

global $algolia;
        $index = $algolia->initIndex('racom_rae');
07:29
Jason
07:29 PM
Could you check where $algolia is instantiated? You'd want to have the equivalent and instantiate the Typesense PHP client the same way
07:31
Jason
07:31 PM
On a side note, in Typesense you'd have to create a new collection (equivalent of initIndex in Algolia) first before you add documents to it: https://typesense.org/docs/0.21.0/api/collections.html#create-a-collection
Jennifer
Photo of md5-4344072cdd3bbbfd816dd77ba2aee65f
Jennifer
07:49 PM
i have 2 files
my main-plugin file and my wp-cli file
main


/**
 * Plugin Name:     Racom: Typesearch Custom Integration
 * Description:     Add Typesearch Search feature
 * Version:         1.0.0
 *
 */

require_once __DIR__ . '/vendor/autoload.php';
// require_once __DIR__ . '/wp-cli.php';

use Typesense\Client;

$client = new Client(
    [
        'api_key'         => '7Tpl7VzcAXOTJofvTqj4u6dDgnRDXGHZWEuhPFA6Sex1yuUM',
        'nodes'           => [
            [
                'host'     => 'localhost', // For Typesense Cloud use 
                'port'     => '8108',      // For Typesense Cloud use 443
                'protocol' => 'http',      // For Typesense Cloud use https
            ],
        ],
        'connection_timeout_seconds' => 2,
    ]
);

$lawyerSchema = [
    'name' => 'lawyers',
    'fields' => [
        ['name' => 'title', 'type' => 'string'],
        ['name' => 'rechtsgebiete', 'type' => 'string[]', 'facet' => true],
        ['name' => 'rechtsgebiete_spezialisierungen', 'type' => 'string[]', 'facet' => true],
        ['name' => 'fachanwaltschaften', 'type' => 'string[]', 'facet' => true],
        ['name' => 'sprachen', 'type' => 'string[]', 'facet' => true],
        ['name' => 'zip', 'type' => 'int32', 'facet' => true],
        ['name' => 'street', 'type' => 'string', 'facet' => true],
        ['name' => 'city', 'type' => 'string', 'facet' => true],
        ['name' => 'country', 'type' => 'string', 'facet' => true],

        ['name' => 'sortkey', 'type' => 'float'],
    ],
    'default_sorting_field' => 'sortkey'
];

$client->collections->create($lawyerSchema);

07:49
Jennifer
07:49 PM
and the other code was from the wp-cli file
07:50
Jennifer
07:50 PM
I guess what I could try to do is instead of global $algoliato use global $client

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:51 PM
Does the main-plugin file get called each time the plugin is loaded? If so, we don't want to create the collection in Typesense each time the plugin is loaded though. Instead you'd want to move $client-&gt;collections-&gt;create($lawyerSchema) to the other file, just before creating documents
Jennifer
Photo of md5-4344072cdd3bbbfd816dd77ba2aee65f
Jennifer
07:54 PM
that would mean that if I would want to update the documents it would create the schema again too right?

should I maybe instead put it into something an plugin activation hook?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:56 PM
It would... I'd imagine every time you run the CLI command you're essentially trying to reindex the entire dataset
Jennifer
Photo of md5-4344072cdd3bbbfd816dd77ba2aee65f
Jennifer
07:56 PM
that is the idea
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:56 PM
One thing to do is use collection aliases: https://typesense.org/docs/0.21.0/api/collection-alias.html#create-or-update-an-alias

So create a new timestamped collection every time, and at the end swap the alias to point to the collection
Jennifer
Photo of md5-4344072cdd3bbbfd816dd77ba2aee65f
Jennifer
07:56 PM
Since this is all development right now I will propably have to add fields as I go if I missed some and then I would have to update it
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:57 PM
And then when querying the collection, you want to use the alias name (which won't change across re-indexes)
07:57
Jason
07:57 PM
This way you can also change the schema as needed before you run the CLI command, and it will just create a new collection with the updated schema
Jennifer
Photo of md5-4344072cdd3bbbfd816dd77ba2aee65f
Jennifer
07:58 PM
I see
07:58
Jennifer
07:58 PM
And if I dont need them anymore I can delete them I suppose
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:58 PM
Oh yes, after you swap the alias to point to the latest collection, you can delete the old collection
07:59
Jason
07:59 PM
On side note, I'd recommend using the batch import endpoint instead of calling the single doc create endpoint, especially when indexing large number of documents. Batch import is much performant: https://typesense.org/docs/0.21.0/api/documents.html#import-documents