The first time you cloned a repo at a new job, you probably felt smart. You'd been hired. You knew your stack. You'd shipped real things.
Then you opened the file tree. A hundred folders. A legacy/ directory that contained something called legacy-v3/. A README that said "see internal docs." There were no internal docs.
You opened a file at random. It imported six things from places you hadn't found yet. There was a function called process() that was four hundred lines long.
You stared at it for an hour. You didn't understand it. You felt dumb.
This article is to tell you: you weren't dumb. You were reading.
You're Not Stupid. You're Reading.
Reading code is genuinely, measurably harder than writing it.
"Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code." — Robert C. Martin, Clean Code
A 2024 New Stack analysis of developer time found developers spend less than 32% of their time writing new code. The other 68% is other things — and a large chunk of that is reading.
Why is reading harder than writing?
When you write code, you have the context. You know what you're trying to do. You're choosing the variable names. You're laying out the structure. The mental model is in your head before the code exists.
When you read code, the mental model isn't given to you. You have to reconstruct it from the artifact. You're a detective in a house someone else built, where every door might lead to another house.
This is asymmetric, and the asymmetry compounds with every layer.
The Cognitive Load Tax
There's actual research on this.
Cognitive Load Theory, originally formulated by educational psychologist John Sweller in 1988, distinguishes three kinds of mental load: intrinsic (the inherent complexity of what you're learning), extraneous (load imposed by how the material is presented), and germane (productive mental work of building schema).
Applied to code reading, that maps to:
- Intrinsic load: the algorithm itself. If a function implements RSA, you have to understand RSA. There's a floor.
- Extraneous load: naming, formatting, dead code, abandoned abstractions, files in the wrong folder, comments that lie. This is the load that should be zero — but isn't.
- Germane load: the "ah, I see" moment when your brain builds the right mental model and stops fighting the artifact.
Recent research published in IEEE and ACM venues — for example, "Estimating Developers' Cognitive Load at a Fine-grained Level Using Eye-Tracking Measures" (ICPC 2022) and the ScienceDirect systematic mapping study on developer cognitive load (2021) — uses eye tracking to measure load while developers read real source.
The findings are consistent: when extraneous load is high, comprehension is slow. When the codebase has accumulated layers of half-finished refactors, abandoned patterns, and inconsistent naming, your brain spends most of its energy on the wrapper, not the meat.
You're not stupid. You're paying tax.
Even Maintainers Can't Read Their Own Code
The most reassuring evidence comes from the people who actually wrote the code.
On Hacker News, an open-source maintainer wrote, in a thread about AI documentation tools:
Read that again. The person who wrote and maintains the codebase calls their own code "fairly convoluted" and points new contributors at a third-party tool to read it.
The reason isn't laziness. It's that the maintainer has the mental model in their head. Loading that model into someone else's head — that's the hard part. The codebase is the artifact, but the artifact alone doesn't transfer the model.
You see this everywhere:
- LLVM contributors are routinely told to read research papers before the code, in a specific order, because the code without the papers is impenetrable
- The Linux kernel has a curated
Documentation/directory that's nearly as long as some kernels in their entirety - Major frameworks like React publish explanation videos because watching someone walk through the source is faster than reading it cold
The codebase alone doesn't teach you. Even the people who wrote it know that.
"Just Read the Code" Is Bad Advice
The advice you'll get from senior engineers is some variant of "just read the code." This is bad advice — not because they're wrong about reading being valuable, but because the advice is incomplete in three specific ways:
- You can't read all of it. As one engineer put it on Hacker News, in a now-classic 2022 thread titled "It's Harder to Read Code Than Write It": "It takes a lot of time, and there's no way you can dig through more than a fraction of a large codebase." A 200K-LOC project at 100 LOC/minute reading speed (a generous pace) is over 33 hours. That's a full work-week. You don't have a work-week.
- You don't know where to start. A large codebase has thousands of plausible entry points. Most are wrong. Without a guide, you'll pick the wrong door three times before finding the right one.
- You don't know what you're looking for. Reading without a question is like reading a dictionary. You finish more confused than when you started.
The actual technique senior engineers use isn't "just read the code." It's chunking, scaffolding, and asking. The GitHub Engineers' guide to learning new codebases — written by people who routinely onboard onto repos with millions of LOC — lists tactics like:
- Read the tests first
- Pair with someone for the first week
- Master one module before touching another
- "Understand what code does without necessarily knowing exactly how it does it"
Notice what's missing from that list: "read the code start to finish." Nobody says that. Nobody does it.
What Actually Helps
If "just read the code" is bad advice, here's better advice. Most of it works without any new tools.
Read tests, not implementation files
Tests document the contract. Implementation details live under the contract. Start with tests; they're written for clarity in a way the implementation usually isn't. They tell you what the code is supposed to do and what edge cases the original author cared about.
Trace one path end-to-end, not the whole tree
Don't try to read every file. Pick a real user action — "what happens when someone clicks login?" — and follow only the code that runs in that path. You'll touch maybe 1% of the codebase, and that 1% will teach you more about the architecture than reading 50%. Repeat for two or three different user actions and you'll have a working mental model of the whole system.
Pair with someone for the first week
The mental model you're reconstructing already exists, fully formed, in another engineer's head. Borrowing it costs you a 30-minute conversation. Reconstructing it from scratch costs you a week. Senior engineers undervalue how much context they carry; if you ask, most are willing to spend an hour walking you through the bones.
Use chunking — ask "what does this do" not "how does it do it"
This is the technique GitHub engineers explicitly recommend. You don't need to know how a function implements its logic before you trust it. Just trust that it does what its name implies, treat it as a black box, and move on. Save the deep dive for the parts that surprise you, where the name and the behavior diverge.
Find what already exists before you write anything new
If you're about to add a feature, search the codebase for half-built versions of it first. Codebases over a year old usually contain two or three abandoned attempts at the thing you're trying to build. Knowing those attempts exist — and why they were abandoned — saves you a sprint and prevents you from being the fourth person to abandon it.
These are the moves engineers actually use. They're not glamorous. They're not fast. But they work.
Or Have an Engineer Read It For You
Here's the thing nobody admits about that list above: every single tactic depends on having a person — pair, mentor, senior engineer — who already has the model.
Most engineers, most of the time, don't. The senior who could pair with you is busy. The maintainer is in a different time zone. The team that wrote the abandoned attempts left two years ago. You're alone with a 200K-LOC repo and a Friday deadline.
That's why we built AI Code Research. Not because we think you can't read code. You obviously can. You did just now, when your brain spent ten minutes trying to reconstruct what process() was doing.
We built it because the most useful engineer in the world is one that reads what you don't have time to read. Point AI Code Research at any GitHub repo and ask — "what does this do?" / "how should I migrate it?" / "what's the architecture?" — and the agent opens the source, reads what's there, and returns the answer an engineer would, in plain English, in roughly 60 seconds.
It's not a replacement for a senior engineer. It's the senior engineer you didn't have.
For the longer version of what AI Code Research is and how it differs from ChatGPT or DeepWiki, see What Is AI Code Research?.
You Were Always Going to Feel This
Reading other people's code feels bad because the work is hard, not because you're bad at it. You're paying cognitive load tax on every layer of abstraction someone before you didn't bother to clean up. The maintainers feel it too. The senior engineers feel it too. They've just been paying the tax long enough to develop scar tissue and tactics.
The next time you open a repo and feel dumb, remember: you're not. You're reading.
That's the hardest mode software engineering has.
If you'd rather not read all of it yourself: we built a tool for that.