> number of returned results from retrieval matters as context window can overflow
Yes, conversation histories and retrieved documents can overflow, conversation models has a field called
max_bytes
which sets max bytes of characters will sent to LLMs in a single prompt, we send max documents and max conversation history messages that fits into that window (starting from most recent messages).