AI¶

June 5, 2026
in Thoughts, AI
5 min read

I Built a Local LLM Benchmark Harness, and It Mostly Started as an Argument With My GPU

For a while now I have had the same nagging question that I suspect a lot of people in security and IT have been quietly circling. Which local model is actually good enough for the work I do, and what does "good enough" even mean once you stop hand waving? Not the leaderboard scores, not the demos where someone asks a model to write a haiku about Kubernetes, but the actual workloads. Reading a log. Spotting a brute force that turns into a successful login. Writing an incident report without quietly inventing a threat actor that never existed.