The Equilibrium Has Shifted

The Equilibrium Has Shifted

1. Introduction

The existence of AI tools has caused a jump in the ever-shifting speed-rigor equilibrium we all use to evaluate projects, and our intuition needs to follow that jump. In this article we'll argue that the purpose of project documentation has subtly changed; it is no longer just a tedious overhead, but a bottleneck in the primary way to communicate with modern tools.

We'll not be concerned with AI systems whose primary value lies in open-ended conversation, creativity, or social participation. We'll focus on what we'll call coding agents: agents that are part of a technical workflow or build infrastructure. These systems operate entirely on authoritative inputs, typically written documents, and succeed or fail based on the precision and completeness of those inputs.

Every project lives on a spectrum between speed and rigor. At one extreme are quick fixes, MVPs, and "just ship it" decisions; at the other are detailed specifications, formal processes, and exhaustive documentation. Historically, most projects rationally leaned toward speed, accepting technical debt and relying on humans to fill in missing information. Tacit knowledge, shared experience, and 'obvious' unspoken assumptions allow teams to operate effectively with informal or incomplete documentation. Often this is a useful feature rather than a flaw, especially if the cost can be easily absorbed.

New LLM based AI tools break this system. Unlike humans, they have no social intuition, and no memory of hallway conversations. They only know what they were trained on and what's written in their context. Which means many of the disciplinary slips we've always relied on, like putting off documentation and letting knowledge live in people's heads, are no longer worth the vig.

In the following sections we'll explore specifically what coding agents need in tech documents, suggest some structure, and note some immediately useful implications. Let's start with a deeper look at the historical balance.

2. Fast or Easy versus Slow or Correct

So, do you cut corners to get something running today, or do you take the time to build it properly so you won't pay for it later? Modern tools that can seemingly build entire applications from a single sentence have only exacerbated this abiding tension.

The financial world has a neat framing for this tradeoff in net present value [1]. A shortcut pays off immediately, while the costs of technical debt-that is, rewriting or re-architecting later-along with the opportunity cost of not building something else are discounted into the future. If the shortcut gets you a customer demo, a market lead, or simply lets your team ship this quarter, the math looks obvious: do it fast, fix it later.

This logic shaped entire eras of software and product development from the 1980s through the dot-com boom with the Worse is better motto [2]. The MVP (minimum viable product) and Vertical Slice goals are still common today [3]. "Adequate now" is worth more than "perfect someday" is a common refrain because it is genuinely true that even if an entire system has to be thrown away and rebuilt, an early advantage often more than justified it.

But both ends of the speed-rigor spectrum have well-known pathologies. Speed-first cultures often misalign incentives for short-term progress while externalizing future costs, producing a cycle of quick fixes bandaided over quick fixes. At the opposite extreme, rigor becomes doctrinal: formality, traceability, and process infrastructure grow heavy enough to slow work significantly, while diminishing returns always fail to eliminate error completely. In practice this discipline is frequently experienced not as a commitment to quality or appropriate risk mitigation but as a mechanism for blame management. Neither extreme scales and both are examples of mispricing discipline.

For most situations, fast iteration and informal knowledge transfer keeps a project moving precisely because they rely on human adaptability. Team members absorb institutional knowledge through context and trial-and-error (e.g. [4]). This is not sloppiness but a rational tactic: when asking questions and filling gaps on the fly is quick and easy, writing everything down is an unnecessary friction.

It is the reliance on human communication, memory, and cooperation that distinguishes these two cultural extremes. Between them there is a rational equilibrium, a perfect strategy. One that is constantly battled over but, ironically, rarely thought or reasoned about more formally.

AI tools do not participate in any culture, do not learn like newcomers, and have no intrinsic way to discover common knowledge (e.g. [5] [6]). This is the new problem we need to address.

3. The Machine Can't Read the Room

Unlike humans, LLM based AI tools can't patch gaps through lunch time discussions, a glance at half-written notes, or an overheard hallway comment. They lack the back-channels humans unconsciously rely on. At inference time, everything an LLM can know must be stored in its trained neuron weights, which are highly abstracted difficult to interpret functions, or in the text of its immediate working context, which suffers from all the laxness problems of human language.

If crucial information is absent from both places then it simply doesn't exist for the LLM, and whatever is missing must be extrapolated to form a best guess, often in the form of a hallucination.

Worse, conversational corrections, now common in many applications, often make things unstable. The LLM can see both the mistake and the correction and will often revert to the wrong one, especially if the error fits its internal training bias. What a human would internalize as a "lesson learned," an LLM sees as more of an interesting suggestion for a temporary correction. To actually internalize it, that is, change it persistently, the model generally needs weight updates, such as tuning, with all the time and cost that entails.

For these tools to produce good results, and especially for coding agents, we must give them the right information, and only the right information. It's garbage in, garbage out.

However, the same is true for newly hired humans. If you give them vague specs, missing assumptions, an obscure tribal shorthand, and then fail to answer their questions or provide any feedback, then the new hire will produce slop. Switch to another new hire and they'll produce a variation of the same slop. That is, remove the back channels and similar slop-inducing pathologies arise in humans, unless you are very lucky.

This "luck" leads to a second anti-pattern: keep hitting that enticing redo button and hope for a "good-enough" jackpot. It's a classic human behavior that taps into our gaming or gambling reflex; think school sci-tech expos, hackathons, or game jams, where a few "winners" emerge from a pile of under-specified but enthusiastic slop. Only now, with an AI, we can indulge this pattern dozens of times in minutes, and without HR inquiring about new hire or intern mistreatment.

Success stories for generative AI generally fall into two categories: amazing results from almost no information, or good results when given clear rules, instructions, and goals. The former is the "luck" or "vibe" pattern. The latter has spawned "prompt engineering" and "specification coding". The formality implied by these latter names and the frequent recommended use of "plain language principles" or "simplified technical English" is what this article explores.

The thesis here, then, is that "slop" isn't a problem with coding agents per se, but an externalization of the lack of access to the back channels and common knowledge that keep human productivity functional. Without enshrining this informal tacit common knowledge into something a coding agent can ingest, its effectiveness is down to luck at best.

The purpose of project documentation, then, has subtly shifted. It's no longer just a message to our future selves about what was created, how to use it, and perhaps some forensics about decisions along the way. It is now an active part of the forward-looking engineering, and includes sufficient information for any human or coding agent to recreate the work in isolation.

By active I mean that it must be kept up to date, is part of, and plays a continuous role in, the engineering build process. It is never just a historical starting point that gets left behind, but a current source of information. In a software project, we'd call that "source code."

This is why we call them coding agents: they treat all documentation the way a compiler, linter, or debugger treats source code.

The idea of documentation as code, or specification as code, isn't new; it's been around for decades [7] [8]. But this is a new twist, with new rules, new tricks, and new shortcuts. Coding agents now allow us to "compile" documentation into other artifacts like code, tests, evaluations, or other documentation.

4. What is the new good enough?

We can take ideas from history and form more formal methodologies to give documents more of the precision of code without losing the human-language benefits of using an LLM.

We need to be able to give a coding agent the content a human would normally interpolate or "just know." Once that is done, the agent behaves more like a compiler: it transforms the input documents according to the given specifications, resolving decidable statements (in the formal, computability-theoretic sense) while passing along those statements that can't be resolved.

This framing isn't arbitrary. Source code is itself a specification - one that can be compiled. However a compiler doesn't read a source file as instructions or macros, instead it interprets it as what must be true after compilation and emits a program that satisfies that truth. This dual code versus specification view, sometimes described as proscriptive versus descriptive, or imperative versus declarative, is explored in depth in [9], but the essential idea for us is that the high-level properties that make code compilable are the same properties that make documentation usable by a coding agent.

For brevity, we'll start with determinism: that each statement must have one stable interpretation, and its close relative, decidability: that statements must be testable, provable, or at least checkable. Compilers rely on these constraints, and so do coding agents. Without them, compilers fail, humans guess, and agents hallucinate.

From this, we can derive some practical criteria for "good enough" documentation. Terms and boundaries must be explicit; vague modifiers should collapse to measurable conditions. Definitions need to be clear and unambiguous so that their meaning is fixed and names don't drift. Functional content should appear in structured form, like enumerations or tables, so the model can treat them as resolvable units rather than prose to be inferred. Lastly, obsolete requirements or unused definitions must be removed to reduce ambiguity, which we refer to as context hygiene.

With these properties in place, documentation stops being commentary and more closely resembles a partially executable input: something both humans and AI tools can use, verify, and extend within a shared workflow.

This shift toward specifications creates a need for a systematic document structure: a small set of components that make the input resolvable, testable, and stable. We need a practical taxonomy that makes maintaining discipline tractable and cheap. To make that concrete, we've drafted a set of modular technical documents, mostly specifications [10], which we'll discuss in the next section.

5. Toward a Taxonomy for Human-/AI-Readable Technical Documents

Our practical aim now is to create documents that can be

  • checked by a non-technical stakeholder to confirm intent,
  • used by a developer (human or AI) to generate an artifact, and
  • used by a tester (human or AI) to verify that artifact.

A single source of truth for all stages of development and verification that also follows the philosophy of section 4.

This allows coding agents to participate in the iterative loop we already use: build something, verify it, inspect the failures, refine things, rinse, and repeat. The errors in each loop shrink, and a consistent desirable result emerges. Making this stable and predictable for an LLM clarifies how the components of each document influence model behavior and informed the taxonomy choices for our documents. We go into depth about this approach [11], but here we focus on the high-level principle of determinism and decidability.

Technical documents and source code are expressions of the same idea: a description of what is or must be true about some objective entity. Using this lens, we can borrow familiar compiler workflow concepts and language and apply them to how an AI should process documents, particularly in how they handle static analysis and verification.

To make this workable, we use a high-level template that separates narrative statements from assertions and splits assertions into facts (indicative "what is" statements) and constraints (normative "what ought to be" statements). This allows verifications to be explicit.

With this structure, the coding agent can treat a document the way a compiler treats source: resolve what can be deterministically decided, propagate what cannot be decided, and generate verification logic to confirm compliance.

A useful idea arose from a common failure mode familiar to many AI-generated tests: meaningless checks like asserting that 3.14159 == 3.14159. Our fix was to attach verification hint tags on each normative statement, indicating what is statically provable, what requires execution testing, and what requires human or heuristic validation. Those tags became their own specification because they exert such strong influence on model behavior.

At this point a practical taxonomy emerges, not as rigid sections, but as a small set of components that must exist somewhere in every document:

  • narrative context (why the thing exists), as narrative statements;
  • definitions and terminology (the environment in which statements are interpreted), as indicative statements;
  • atomic requirements with verification tags, as normative statements;
  • a short list of prohibited or ambiguous constructions;
  • optional examples that act as additional constraints.

Again, this isn't bureaucratic layering. Templates give predictable defaults, reduce friction, and remove accidental ambiguity, lowering cost while allowing both humans and coding AIs to reliably transform documents, even when unaware they are doing so.

Over time, this pushes toward a more interesting outcome: documents become partially executable specifications. They don't merely describe systems; they participate in building them, generating code, tests, evaluations, and downstream artifacts. As tools mature, writing such documents becomes a form of programming at a higher level of abstraction, much as C is to assembly.

6. Conclusion - Why It Matters

The shift toward documentation isn't cosmetic. Tacit knowledge was always a fragile foundation, but as long as humans were the only interpreters, it worked well enough. As coding agents become first-class actors in our build pipelines, that fragility becomes visible. These systems can only act on what is written down. When documentation is underspecified, ambiguous, or inconsistent, they do exactly what the specification permits generate slop at scale.

Recasting documentation as a compilable, testable specification is not about process theater, or formal ceremonies. It is about making system constraints explicit so the end-to-end system behaves predictably. Coding agents remove our ability to be sloppy or undisciplined about this because that documentation is becoming an intrinsic part of the project infrastructure and it must be maintained, validated, and evolved alongside the artifacts it describes.

This matters because the 'good enough' equilibrium has moved, and where some discipline was once optional or deferrable, it now determines whether many AI tools will provide leverage or friction. As coding agents become embedded in the build process, documentation must evolve toward what we've called executable specification rather than narrative explanation. That shift is what separates amplified productivity from yet more mass production of slop. Thankfully we don't need cultural upheaval, just to price discipline correctly. It's infrastructure cost not overhead.

References {#references .unnumbered}


  1. R. A. Brealey, S. C. Myers, F. Allen, and A. Edmans, Principles of corporate finance, 14th ed. New York: McGraw Hill, 2022. ↩︎

  2. R. P. Gabriel, “Lisp: Good news, bad news, how to win big.” Accessed: Apr. 01, 2026. [Online]. Available: https://www.dreamsongs.com/WIB.html ↩︎

  3. E. Ries, The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business, 2011. ↩︎

  4. I. Nonaka and H. Takeuchi, The knowledge-creating company: How japanese companies create the dynamics of innovation. Oxford University Press, 1995. ↩︎

  5. S. Carroll and S. Pinker, “Steven pinker on rationality and common knowledge.” Mindscape Podcast, 2025. Accessed: Sep. 22, 2025. [Online]. Available: https://www.preposterousuniverse.com/podcast/2025/09/22/329-steven-pinker-on-rationality-and-common-knowledge/ ↩︎

  6. S. Pinker, Rationality: What it is, why it seems scarce, why it matters. Viking, 2021. ↩︎

  7. D. E. Knuth, “Literate programming,” The Computer Journal, vol. 27, no. 2, pp. 97–111, 1984, doi: 10.1093/comjnl/27.2.97. ↩︎

  8. A. Gentle, Docs like code, 2nd ed. Lulu Press, 2017. ↩︎

  9. I. McEwan, “Proscriptive versus descriptive statements in specifications.” Forthcoming, 2025. ↩︎

  10. I. McEwan, “Specsmith repository.” https://github.com/ijm/specsmith, 2025. ↩︎

  11. I. McEwan, “Verification versus decidability in specification documents.” Forthcoming, 2025. ↩︎