@John Sokol could you check your logs while loading the model to memory. If your server do not use the GPU, you will see this log:
> ONNX shared libs: off
Before:
> Loading model from disk: PATH
If you can't see that log about the shared libs, that means Typesense were able to load model to your GPU successfully.