Your harness is too small

Designing an integrated business environment

Mar 13, 2026

There is a scene in Succession where Matteson asks Roman what he is worst at. Roman deflects — “who, me?” — and Matteson answers the question himself. “Success doesn’t really interest me anymore,” he says. “Analysis plus capital plus execution — anyone can do that.” Then he leans in: “Failure — that’s a secret. Just as much failure as possible, as fast as possible. Burn that shit out. That’s interesting.” He says it like a man who has already solved the easy problem and moved on.

LLMs just made success cheaper. Given contextualized data, a model can research a problem, analyze it, execute on it, and iterate until the outcome occurs. The formula Matteson found trivial is now available to anyone who can state a goal clearly. Which means Matteson was right about what becomes interesting next. The failure mode. The thing that still burns. The constraint that did not disappear when the inputs got cheaper. That constraint is context rot.

Context rot is not a metaphor. Humans set ambitious goals. They know the best practices. And then time passes, attention moves, the context decays. Not because they stopped caring. Because context does not persist reliably in humans. We forget. We drift. We apply yesterday’s understanding to today’s problem and call it experience.

Here is what I find both ironic and almost serendipitous: we built our artificial intelligence with a version of the same flaw. Not identical. Human context rot is organizational, cultural, the slow drift of priorities over months. AI context decay is technical, session-scoped, attention fading across a long prompt. Different failure modes. But the same symptom: the model loses the thread. We replicated our own cognitive limitation in the tools we made to augment us. The interesting part is that the fix for the AI problem (injecting context deliberately at the start of every session) also forces you to confront the human problem. You cannot write program.md without deciding what you are actually trying to do. The act of building the injection mechanism disciplines the goal.

But unlike humans, you can deterministically re-inject context into an AI. Every session. Mechanically. Karpathy’s AutoResearch project, released this month, makes the mechanism concrete. There is a file called program.md. It contains the goal, the constraints, the research direction. The agent reads it at the start of every run. You do not touch the Python. You program the program.md. The agent re-grounds itself to your intent on every single loop, not because it remembers, but because the context is injected fresh each time. That is the fix. Not memory. Injection. The Socratic question: are we still moving toward what we said we were trying to do? It gets asked mechanically, before anything executes. Humans cannot do this reliably. A well-designed system can. That is what the harness needs to be built around.

The funnel is not the only option

What changed is not that context became more available. Organizations have always produced enormous amounts of context – metrics, feedback, goals, decisions, postmortems. What changed is the cost of routing it to the moment it matters.

That cost is now near zero.

An agent can pull last week’s conversion data and attach it to a feature spec before a line of code is written. It can surface the three support tickets that describe this exact problem in user language, not product language. It can check whether the goal that originally motivated this work still exists – or got quietly deprioritized while everyone was looking at their tickets.

This is not a vision. This is plumbing. Every piece of that context already exists somewhere in the stack. The question is whether your harness routes it, or lets it stay siloed.

Most harnesses let it stay siloed.

The harness that only covers the development lifecycle is not a small problem. It is a fundamental mismatch between what is now possible and what most builders have built.

How organizations will run

The organizations that figure this out first will not look like they have better tools. They will look like they think differently – like they make fewer decisions based on stale context, like they ship things that actually move the numbers they were trying to move, like they notice problems before the metrics force them to.

What they will actually have is a harness that extends beyond code.

I use the word builder deliberately. Not developer. Because the person making product decisions in a two-person startup, or a solo founder shipping an app between Slack messages, or a non-technical operator wiring together AI agents – that person is building. Their harness should not assume they write code. Their harness should assume they have goals.

That distinction changes everything about what the harness needs to know.

A developer needs autocomplete and debugging.

A builder needs that – plus: Is this the right thing to build? What happens to the number if we do not ship it? Who asked for it, in what words, and how recently?

The sidecar that watches everything

I am building this for myself with Clueless Clothing as a guinea pig. Not from theory. From watching how much context disappears between the moment I understand what the business needs and the moment I sit down to build something.

The architecture I landed on is not just a wider harness. It is a harness with a governing layer above it – an agent that observes what is actually happening, analyzes it against what I said I was trying to do, and surfaces improvements for me to review and act on.

The observe-analyze-action loop is not new. Every retrospective, every postmortem, every weekly metrics review is some version of it. What is new is that it does not have to be periodic anymore. It does not have to wait for the meeting.

But here is the part that took me a while to see clearly.

The sidecar does not improve processes. It improves the processes that move your specific goal.

That sounds like a minor distinction. It is not. If your goal is to grow to $50k MRR, the sidecar optimizes toward revenue-moving leverage points – shipping frequency, conversion bottlenecks, activation rate. If your goal is to build the most efficient engineering team, the sidecar optimizes toward a completely different set of processes. The same codebase, the same activity, the same signals – and a different set of improvements surfaces.

The goal is the lens. Without it, you are just generating observations. But this is also where the thesis gets uncomfortable. The sidecar only fights context rot on the goal you gave it. If the goal is wrong, if you are faithfully optimizing toward $50k MRR when the actual opportunity is a different product entirely, the system will not tell you. It will optimize diligently toward the wrong destination. AutoResearch is instructive here, but not in the way I originally thought. The agents own train.py entirely. They modify it, iterate it, discard what does not work. But program.md, the file that contains the research direction, the goal, the constraints, stays human. Deliberately. The design enforces it. Which means the question of whether you are optimizing toward the right thing never gets delegated. The next version of the sidecar is not one that updates your goal automatically. It is one that makes the cost of ignoring a wrong goal high enough that you actually revisit it.

The IDE moment, one level up

This has happened before in a smaller domain.

Before the IDE existed, developers used separate tools for editing, compiling, and debugging. Each tool did its job. The cost was not in any single tool – it was in the constant switching, the context that leaked at every seam, the mental overhead of holding state across systems that did not talk to each other.

Then someone merged them. The productivity gain was not from better features. It was from eliminating the seams.

We are at that moment again, one level up.

The separate systems now are not editor, compiler, and debugger. They are product management, development, analytics, marketing, support, and documentation. Each one has its own tool, its own data model, its own workflow. The seams between them leak context every day – the same context that used to require a meeting to route, that now could be routed automatically.

This is what an Integrated Business Environment actually is. Not a better IDE. Not an IDE with more integrations bolted on. A different assumption about what the harness is for – not just executing work, but continuously orienting work toward the goal, based on what is actually happening.

The IDE helped developers write better code.

The IBE helps builders make better decisions about what to build next.

The architecture

I am building this in layers. Not because I love architecture diagrams, but because the tooling landscape is moving too fast to couple everything together. The best AI coding tool right now will not be the best in four months. If your business rules are tangled with your adapter logic, every tool switch means starting over.

Five layers:

Core: the rules that should still make sense in a different repo, a different agent, a different stack entirely. Think of it as the constitution. It does not care what language you write in.

Adapter: environment-specific translations. This is the disposable layer. When the tool changes, rewrite the adapter. Leave everything else alone.

Stack: technology-specific defaults. React Native looks different from Rails. This layer knows that.

Overlay: the layer most harnesses do not have. Product rules. Business context. Metrics. Goals. User segments. The actual reason any of this work is happening.

Split: not a layer but a discipline. Any file that mixes concerns gets split over time. This is what keeps the architecture honest as it evolves.

The overlay layer is where the IBE lives. It is also where I spend most of my time now. That is the right place to spend it.

I am open-sourcing this structure. Not because it is finished, but because the pattern is clear enough to share before it is polished.

One thing worth saying plainly before you go build it: this is not a software idea.

Look at what AutoResearch actually does to understand why. The primitive Karpathy’s agents operate on is not software. It is experiments. Train for five minutes. Check if val_bpb improved. Keep or discard. Repeat. The code is almost incidental. The agent rewrites it, yes, but that is not the work. The work is the experiment loop oriented toward a measurable goal. Software is just what happens to be the medium. A restaurant optimizing table turns has a different medium. A consulting firm improving proposal win rates has a different medium. The loop is the same. The medium changes. The harness needs to know the difference.

The IDE framing is a useful on-ramp. It is not the destination. The destination is any operation, any goal, any set of processes, with a governing layer that fights context rot and keeps everything oriented toward what you said you were trying to do.

There is one more place where the harness is too small. Not the tooling. Not the lifecycle. The goal itself.

I have been running $50k MRR as a target because it feels concrete. Measurable. Safe to write in a file. But $50k MRR is not a goal. It is a metric that might serve a goal, if I chose the right one. The actual goal, the one worth writing in program.md, is something like: enough margin to be present. To travel when it matters. To not make decisions from fear at 11pm. Or read more. Or be more present with the people who matter. The harness does not require the goal to be economic. It requires the goal to be real. But if you give it the metric instead of the goal, you will hit the number and wonder why the thing you were actually trying to do did not happen. The sidecar optimizes toward what you said. Not toward what you meant. That gap is yours to close, before you write anything down.

Matteson was right. Success got cheap. The failure mode is what stayed interesting. Context rot is the failure mode. The only question is whether you build a harness that fights it, or one that lets it win quietly while you keep shipping.

Next: The Athenians did not need productivity software. They had something better: a society that treated economic output as infrastructure, not identity. The citizen class was freed from subsistence not so they could optimize, but so they could think, argue, govern, and live. We are building the infrastructure that could do the same thing. We just have not figured out what we are supposed to do once it works. The Greeks knew. Next piece is about what they got right, and what it means for how we should be spending the hours the agents are buying us.

What is actually in your program.md right now, and is it a goal, or a metric?

The Compounding Founder

Discussion about this post

Ready for more?