Trying to View Claude Code's Contributions on Github Will Crash Their Servers

Earlier today I merged a contributing PR on one of my opens-ources packages consistent-classifier. (Thank you Gabe!)

It is a Go package for generating more consistent classification labels of your data from LLMs on a open/unconstrained label set. I wrote about the project in full detail here for those who are curious.

Anyways. Upon merging the PR, the list of Contributors changed to the following:

Contributors
Contributor’s List

This made me extremely curious.

“Huh. I wonder on how many other people’s repo is @claude listed on as a contributor and how many commits a day are attributable to it?”

I eagerly clicked on Claude’s profile but the outcome surprised me! Now, this might be fixed in the future but as of Oct 24, 2025, you will get this empty state:

Error
The error on Claude’s Github profile

I cranked open my Networks tab and saw this request:

GET https://github.com/claude?action=show&controller=profiles&tab=contributions&user_id=claude

502 Bad Gateway

Response Body:
This page is taking too long to load.
Sorry about that. Please try refreshing and contact us if the problem persists.

🤔 Interesting. After a few attempts I noticed it consistently timed out at exactly 10 seconds. This behavior tells us a few things about Github’s systems:

Why 10 seconds? Likely because they measured and saw that it’s ample time for the p99 of query to fetch a user’s contribution graph.

Gorking Github’s Systems

I think Github may be running into their own version of The Celebrity Problem that broke Instagram in the early days. Think about it for a second. What does the typical Github user look like?

A human engineer active on Github will have on average less than 20 repos they’ve pushed code into that year. This can go up to 50 for those who are spread too thin in micro-services galore.

In addition, more than 90% of commits are concentrated in their top 5 most active repos. This looks dramatically different from Claude who has commits across millions of customer’s repos daily. With that said, what is the best way for Github to quickly pull a normal user’s contribution history? They have two basic choices here:

Which are they most likely to do? Well it depends.

Evaluating Theory #1

My intuition tells me #1 is the most likely. Caching gets beat into your brain when designing systems for scale after all. Right?

Reasoning through it; most users’s profiles don’t get viewed often outside of the periodic bot crawler. Repos is what gets most of the traffic. For the sake of those high-traffic nodes, you will still want to soften to hit on your critical resources (like DB), especially if you’re operating at Github’s global scale.

I imagine they have to have a caching layer in their cloud infra they use on repos. If it’s available, it could be used here also to cache user’s profile data.

If that was true however, why would the operation take so long and eventually time out?

Theory 1.1: It could be that they do selective caching. For users who get lots of traffic only, we cache. For regular users, we don’t cache.

This makes sense. Little value to cache a page that gets < 1,000 views per month compared to the ones that get +50M/month. This becomes more true if there is no such thing as high-traffic Github user pages, only high-traffic repos.

Interestingly enough, the Github engineering team has said themselves their primary database is MySQL for anything non-git. A contribution history is git-adjacent but it doesn’t sound like pure git to me. So maybe theory #2 has some legs?

Evaluating Theory #2

I tried to look information on this other blog post where the Github Infra Team adds more color to their read patterns but I still saw no mentions of a caching strategy. They do work hard around “hot” data, but they do that at the DB level with partitioning.

Furthermore, due to low complexity (up to 50 repos per user) and low scale (avg. user page gets < 1,000 views a month) then a direct DB read feels reasonable.

I’m still highly skeptical there is no caching between server ←→ DB but if that is indeed true, then it explains very plausibly why it fails. My working theory is:

Conclusion

How should Github solve this? Well first they should obviously hire me. 🤓

Trump Hands
I’m the best in the business folks. No one does System Design designs better than me. Very efficient. Save so much compute you won’t believe it.

But all jokes aside, I think a simple solution is to do selective denormalization on highly horizontal users. Take a look at this visualization.

Distribution of Commits
Distribution of commits, by repos

Your average human’s commits are highly concentrated in a few repos, whereas your average AI agent’s are highly distributed across many repos.

Thus, my logic would be to mark users who have committed to say more than 500 repos this year as highly_horizontal and make sure to compress their contribution data more frequently and efficiently so that the query doesn’t belly up as it does now.

Of course there’s more nuance than that in the real world, and I’d be happy to have someone more knowledgeable than me on the inner workings share some thoughts in the comments.

All in, I thought it was an funny gap I’ve discovered and worth writing about. This might get named “The Autonomous Agent Problem” and be taught in System Design classes a few years from now to the next generation of computer scientists.

Thanks for reading! If you found this entertaining, consider subscribing.