“you absolutely have to view LLM benchmarks from a position of default-distrust” @ouguoc.mastodon.online.ap.brid.gy describes how easily answers to benchmark problems can leak into the training set. https://seinmastudios.com/posts/llm-benchmarks-are-not-trustworthy/
why is the link not a link? let's try that again: seinmastudios.com/posts/llm-be...