“you absolutely have to view LLM benchmarks from a position of default-distrust” @ouguoc describes how easily answers to benchmark problems can leak into the training set. https://seinmastudios.com/posts/llm-benchmarks-are-not-trustworthy/
“you absolutely have to view LLM benchmarks from a position of default-distrust” @ouguoc describes how easily answers to benchmark problems can leak into the training set. https://seinmastudios.com/posts/llm-benchmarks-are-not-trustworthy/