Claude vs GPT-5: We Ran 300 Real-World Tasks to Find the Best AI
I have spent the past month using Claude and GPT-5 as my primary work tools. Not in controlled benchmark conditions. In actual work: writing, research, coding, analysis, planning.
I ran 300 tasks, documented the outputs, and rated each on usefulness without knowing, in many cases, which model had produced the output until the session ended.
<h2>Where GPT-5 Wins</h2>
GPT-5 is the better coding assistant. Across 60 coding tasks, GPT-5 produced higher-rated outputs in 38 cases. Claude in 16. Six were equal. The advantage isn’t about raw code quality — it’s about maintaining context across a long coding session and catching its own errors when corrected.
GPT-5 is also better at research synthesis tasks where the output is a structured document: executive summaries, competitive analyses, briefing documents.
<h2>Where Claude Wins</h2>
Claude is the better writing collaborator. Across 80 writing tasks, Claude produced higher-rated outputs in 51 cases. GPT-5 in 21. The difference is consistent: Claude produces prose that sounds more like a writer and less like a text generator.
Claude is also significantly better at tasks requiring epistemic carefulness — reasoning about uncertain evidence, flagging when a claim is contested, distinguishing between what is known and what is inferred.
<h2>The Honest Summary</h2>
If you write code every day: GPT-5. If you write prose every day: Claude. If you do both: you’re going to end up with both tools open anyway. The benchmark tables will tell you one is definitively better. They are wrong.
David has reviewed over 400 consumer tech products across a decade of journalism. He is suspicious of spec sheets and trusts benchmarks. Based in Toronto.
Leave a Reply