What Actually Breaks When LLMs Write Code?

Thu, 11 Jun 2026 00:00:00 +0000

Most conversations about the safety of coding agents revolve around adversarial scenarios: prompt injection, jailbreaks, malicious instructions hidden in a README. Those threats are real. But after watching these tools work — and occasionally watching them wreck a working environment while “fixing” a unit test — we kept returning to a more uncomfortable question: what goes wrong when nobody is attacking, and the agent is simply trying to help?

Our new preprint, What Breaks When LLMs Code?, led by our Ph.D. student Alif Al Hasan, is an attempt to answer that question with evidence rather than anecdotes. We call this operational safety: the safety of an agent during benign, goal-directed, everyday use.

Teaching LLMs to Plan Before They Act

Wed, 10 Jun 2026 00:00:00 +0000

If you have ever watched a language model reason its way through a hard math problem, you have probably seen it wander. The chain of thought starts off promising, circles back on itself, re-derives something it already knew, and occasionally talks itself out of a correct intermediate result. The final answer may still be right, but the path there is long, redundant, and hard to trust.

Our ICML 2026 paper, Plan Then Action, starts from a simple diagnosis of why this happens: autoregressive generation is local. At every step the model decides only what token comes next, so the reasoning process is essentially a sequence of small, greedy decisions. There is no global plan — nothing that commits the model to a strategy before it starts executing one. Tree search and reinforcement learning can partially compensate, but they are expensive and still operate over the same token-level process.

Posts on reSAID Lab

What Actually Breaks When LLMs Write Code?

Teaching LLMs to Plan Before They Act