How Much Of A Library Do You Actually Use?

Armin Ronacher’s Build It Yourself got reposted again after the latest wave of npm supply-chain attacks, and it landed again. The two-line version: most of the small libraries in your dependency graph aren’t pulling their weight, and the maintenance treadmill they put you on costs more than writing the function yourself. Especially now that an LLM can write that function while you’re reading this sentence. Go read his post first. Mine is the part I keep thinking about after closing his tab.

Screenshot of Armin Ronacher's "Build It Yourself" post, with the byline "this article is a year old" — the piece that keeps getting reposted after every new supply-chain incident.

I’ve been importing-and-shipping for twenty years. I’ve maintained libraries on the other side of the dependency arrow too. Armin’s instinct is one I’ve ignored for years, because the alternative used to look harder than the maintenance. Two things changed in the last eighteen months that made the advice un-ignorable.

The supply chain stopped being theoretical. TanStack last week. The fsnotify maintainer drama. The 170-package npm wave that hit Mistral and friends. Upstream attacks used to be a tail risk you could ignore. Now they show up monthly. Every transitive dep in your Cargo.toml or package.json comes with a side bet: am I going to read a postmortem about this one?

And AI changed the math on the other side. Libraries used to earn their keep by absorbing maintenance you didn’t want to do. Now an LLM absorbs that maintenance: bug fixes, edge cases, the boring 60% of any small utility. The per-line cost of “build it yourself” collapsed. The breakeven shifted quietly while nobody was looking. The caveat is that the LLM brings its own corruption tax, so the trade only works when the linter, type checker, and tests are pointed at the new code too. Assume they will be. For anything small enough to fit on one screen, the math still tips toward the local version.

OK. So build it yourself. The next question is the one Armin’s post doesn’t quite reach: build which?

The first answer is a linter

When I say this out loud to other engineers, the first tool they reach for is some flavor of unused-dependency check. deptry or vulture in Python. depcheck and knip in the JS ecosystem. cargo-udeps in Rust. All useful, all roughly the same shape: scan your imports, scan your declared dependencies, flag the ones that don’t line up. They catch the obvious wins. The imports that died with a refactor. The declared deps nobody actually pulls in. The package you added once to try something out and then forgot. (I’ve written about the day-to-day version of this for Python.)

But they tell you exactly one thing: did I import this package or not? That’s not measurement. That’s a binary check at the package boundary. It can’t see what’s inside.

I imported requests and called .get(). The linter sees import requests and goes home happy. It can’t tell you that I shipped the entire requests package, including streaming, sessions, adapters, connection pools, and retry logic, to call a single function. Most of that ships with my service. All of it counts when someone compromises the maintainer.

The deeper answer is coverage into the dependency

If you actually want to know how much of a library you’re using, you have to run coverage through the dependency, not around it. coverage.py will happily measure */site-packages/requests/* if you tell it to. cargo-llvm-cov has flags for including specific external crates in the report. nyc can be pointed at node_modules. The defaults exclude all of this on purpose, because most of the time you don’t care. The point of this exercise is to care once.

The output is uncomfortable. Run it against a real workload and most dependencies come back well under twenty percent exercised. You imported a fifty-function library and your hot path touches three of them. The other forty-seven were maintained, security-scanned, pulled in transitively, and shipped to production, all to give you the comfortable feeling that someone else owned the problem.

A few will surprise you the other way. The library you suspected was bloated turns out to be eighty percent used because edge cases you forgot about are quietly hitting it. Those stay. The discovery is genuine in both directions.

And here I get stuck

Because even the deep number is a metric, not a verdict.

The library I use one percent of might be exactly the one I should keep. That one percent is JWT verification, or X.509 parsing, or the part of dateutil that handles timezone transitions on the days the IANA database changes. If I roll that myself I’m guaranteeing a future incident.

The library I use eighty percent of might be the one I should rip out today. The maintainer stopped responding eight months ago. The fork I’d need is twelve commits behind. The version I’m pinned to has an open CVE nobody is going to patch.

The metric tells me where to look. It does not tell me what to do.

What we’d actually need (and don’t have)

The tool I keep wanting doesn’t exist. The pieces of it do. OSSF Scorecard scores maintainer health. Socket.dev and Snyk score security surface. coverage.py and cargo-llvm-cov give the usage number. Each piece lives in its own silo, owned by a different vendor, behind a different login. Nobody composes them next to the coverage number in your editor, sorted by “most worth rethinking.”

What I want is the composite. Maintainer health: last release date, bus factor, weekly download trend. Security surface: does this library parse untrusted input, touch crypto, talk to the network. Replaceability cost: a rough estimate of how many lines I’d write to reproduce the slice I actually use, ideally generated by an LLM that already read both sides.

Glue those four numbers together and a senior engineer could spend an afternoon making real decisions instead of wandering through their lockfile. The numbers wouldn’t decide anything. They would just put the judgment on the right scale.

I haven’t built this. I haven’t found anyone who has, at least not in a way I’d put on my screen every morning. Maybe that’s because it would commercialize a judgment call that doesn’t want to be commercialized: the moment you put a score on “should I keep this dependency,” half the value is gone, because the score is the new thing people optimize. The act of looking at three numbers in your editor and feeling something is the senior judgment that resists automation. Removing the friction might be removing the practice.

The receipt

I’ve been writing this against the Mergify CLI, a Rust codebase that’s been accumulating crates the way Rust codebases do. The cargo-only update sweep last week made me look at the lockfile longer than usual. Some entries are obviously gone next week. Others are obviously staying.

The long tail in the middle is the work I haven’t done yet, because the deep coverage run is a half-day’s setup and I keep postponing it. That postponement is what most engineers actually do. We run the linter, we feel responsible, we move on. The whole point of writing this down is to admit that the next step is the one nobody schedules.

The tools will keep getting better. The judgment will not get cheaper.

“Build it yourself” is good advice. The unglamorous truth is that most of us can’t act on it until we know which it the sentence is talking about, and the tools that answer that question with anything more than “did you import the package” are still mostly homemade. The next decade of platform engineering is probably right there, sitting in the gap between cargo-udeps and what you’d actually want to know. Someone is going to build the thing. I hope they leave room for the call.