Claude Code Leaderboard??? #81

hesreallyhim · 2025-07-30T22:30:04Z

hesreallyhim
Jul 30, 2025
Maintainer

I want to set up a "leaderboard" for Claude Code (see e.g. the HuggingFace leaderboards) - more in the spirit of friendly competition than anything scientific or scholarly - but it's hard to think of a good approach/design, given the variety of resources that are featured here, and also the fluctuations around models, API vs. Subscription, etc.

My first thought is to focus on domain areas, like "TDD", or "UI", and the submissions would consist of repositories with specialized configurations, including slash commands, CLAUDE.md, hooks, sub agents, etc. And then rather than try to figure out specific challenges, just use Claude as LLM-as-judge to review the submissions and decide which is the "winner". So the prompt would be something like:

Test-Driven Development (TDD) is... [SOME INFORMATION/RESOURCES ABOUT TEST-DRIVEN DEVELOPMENT]. Claude Code is a coding agent, it can be configured using resources such as [HOOKS DOCUMENTATION], [CLAUDE.md DOCUMENTATION], etc. Below is a repository that contains a set of resources for working with Claude Code within a TDD framework. Please review the resources and evaluate the repository on a scale from 1-10 with respect to the following criteria: [CRITERION 1], etc.

Obviously this would be rudimentary and not very "objective" but the idea would be rather to stimulate people to experiment with specialized Claude Code designs, and hopefully learn from each other, instead of compete (there is no prize anyway). I notice a lot of people seem to like TDD so that's why I brought that up, but would love some feedback if anyone has thoughts about how a Claude Code Leaderboard could look like.

There's also a lot of different ways to evaluate submissions, like set up a prepared container, with a prepared prompt, with a pre-determined goal/endpoint, and then initiate Claude Code with the common prompt and see how each framework performs, but again it's really tricky, and I'm inclined to do something less rigid at the moment because of all the independent variables.

bl-ue · 2025-07-31T02:43:23Z

bl-ue
Jul 31, 2025

Hi @hesreallyhim! It's funny that you thought of this idea, because we've actually been working on a leaderboard for Claude Code usage (as well as Gemini CLI and Codex usage) called Splitrail Leaderboard. It's a bit raw still and can't be used just yet, but here's a screenshot:

It's open source on GitHub. It's more oriented around tokens/cost/usage, but your suggestions sound interesting!

1 reply

hesreallyhim Jul 31, 2025
Maintainer Author

Wow that's very cool! Look forward to seeing how it develops.

nikshepsvn · 2025-08-06T03:02:22Z

nikshepsvn
Aug 6, 2025

https://github.com/sculptdotfun/viberank

0 replies

hesreallyhim · 2025-08-06T13:43:54Z

hesreallyhim
Aug 6, 2025
Maintainer Author

@bl-ue @nikshepsvn so there are in fact two leaderboards for CC usage? that is... very remarkable!

1 reply

nikshepsvn Aug 6, 2025

think so! i made this over a month ago so I'm not sure!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude Code Leaderboard??? #81

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Claude Code Leaderboard??? #81

Uh oh!

hesreallyhim Jul 30, 2025 Maintainer

Replies: 3 comments · 2 replies

Uh oh!

bl-ue Jul 31, 2025

Uh oh!

hesreallyhim Jul 31, 2025 Maintainer Author

Uh oh!

nikshepsvn Aug 6, 2025

Uh oh!

hesreallyhim Aug 6, 2025 Maintainer Author

Uh oh!

nikshepsvn Aug 6, 2025

hesreallyhim
Jul 30, 2025
Maintainer

Replies: 3 comments 2 replies

bl-ue
Jul 31, 2025

hesreallyhim Jul 31, 2025
Maintainer Author

nikshepsvn
Aug 6, 2025

hesreallyhim
Aug 6, 2025
Maintainer Author