I am facing high ms per request when using OpenAI embeddings. Is this because it is forwarding the search to OpenAI to create a vector, then using that vector to perform the search? Or is this because OpenAI's embeddings have a much larger size?
When using
ts/all-MiniLM-L12-v2
, processingTimeMS is ~20ms, and when using
openai/text-embedding-3-small
, processingTimeMS is ~450-500ms.
Also open to discussion about what is a good local model to use!