AI-assisted Coding (21 blogmarks)

← Blogmarks

Code Review Is Not About Catching Bugs

https://www.davidpoll.com/2026/02/code-review-is-not-about-catching-bugs/

When a human writes code, the code is an artifact of their reasoning process. You can review the code and infer the thinking behind it. When AI generates code, you lose that direct connection. The code might be perfectly functional but reflect no coherent design intent – or worse, reflect a design intent that’s subtly different from what the developer actually wanted.

the risk of AI side quests

https://bsky.app/profile/carnage4life.bsky.social/post/3mfytrcpux22q

I've been in meetings where I've been asked to imagine a near-future in which members of the team who aren't currently producing code (e.g. project managers, C-suite, support team) are able to prompt an LLM blackbox with feature requests. Each of those requests will process in the background and eventually produce a preview environment for the prompter to look at and a PR to handoff to a software developer.

One of the assumptions baked in to the idea of this workflow is that there are all these well-defined, high-priority issues just sitting in the project management software waiting to be worked. Maybe that is the case in some orgs, however in my experience, the majority of teams and projects I've worked on don't have this. The backlog is a place where "nice-to-have" improvements and half-backed ideas collect dust and lose proximity to the state of the software system.

there’s a risk of AI side quests distracting from doubling down on the main one.

My hard-earned intuition for what works well with various LLM models and coding agents combined with my general expertise in software engineering combined with my knowledge of the specific codebase(s) are what allow me to get strong results from LLM tooling.

I suspect in a lot of cases I'd be tossing out the initial "engineer-out-of-the-loop" attempt and re-prompting from scratch, all while trying to keep tabs on the main quest work I have in progress.

Andrej Karpathy wrote a post recently that gets at why I think "asking an LLM to build that feature from the backlog" is not as straightforward as it seems:

It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges.


I'd like to coin the term "AI-assisted Backlog Resurrection", but I don't think it's going to catch on.

Your codebase is NOT ready for AI

https://www.youtube.com/watch?v=uC44zFz7JSM

Two things:
- Coding agents operating on your codebase are much more powerful than a disconnected LLM chat session where you try to manually give whatever context you think is important.
- Many of our decades old (and newer too) software engineering practices have directly useful benefits to AI-assisted coding.

Your codebase, way more than the prompt that you use, way more than your AGENTS.md file, is the biggest influence on AI's output.

Use DDD's Bounded Context or A Philosophy of Software Design's Deep Modules to create clear boxes around functionality with interfaces on top.

Design your codebase for progressive disclosure of complexity.

Writing code is cheap now

https://simonwillison.net/guides/agentic-engineering-patterns/code-is-cheap/

The math has shifted, in some cases, significantly.

Coding agents dramatically drop the cost of typing code into the computer, which disrupts so many of our existing personal and organizational intuitions about which trade-offs make sense.

That prototype, that bug fix, that nice-to-have -- those things that tend to get pushed off over and over because more urgent things are taking your time -- are suddenly viable in a lot of cases with an LLM coding agent. Equipped with a paragraph of detail and using something like Cursor or Claude Code, we can hand off these back-burner tasks, iterate on the idea in minutes instead of hours, and have at the very least an MVP if not a ready-to-ship implementation. All with a relatively minimal interruption to the main task at hand.

While we're trying to catch up to what is possible here, all the different processes our organizations and engineering teams have surrounding how we go from feature idea to shipped and supported implementation of said feature have some catching up to do as well. In some cases, these processes are meant to slow things down, as a way of managing risk. These processes are in opposition to the mandate to using AI agents to ship faster. What are organizations going to do with this contradiction? That's an open question.


I also like that as part of this series Simon is putting forth the term Agentic Engineering as the other side of the spectrum from vibe coding when it comes to using LLMs to write code.

Ladybird adopts Rust, with help from AI

https://ladybird.org/posts/adopting-rust/

I like hearing about other people's AI coding workflows, especially if they are demonstrating concrete success on a complex task. Porting a browser engine from C++ to Rust in two weeks with no regressions is impressive. This is a great example of human heavily in the loop -- using knowledge of the system to pick what parts of the codebase to port and when.

I used Claude Code and Codex for the translation. This was human-directed, not autonomous code generation. I decided what to port, in what order, and what the Rust code should look like. It was hundreds of small prompts, steering the agents where things needed to go. After the initial translation, I ran multiple passes of adversarial review, asking different models to analyze the code for mistakes and bad patterns.

The Claude C Compiler: What It Reveals About the Future of Software

https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software

CCC shows that AI systems can internalize the textbook knowledge of a field and apply it coherently at scale. AI can now reliably operate within established engineering practice. This is a genuine milestone that removes much of the drudgery of repetition and allows engineers to start closer to the state of the art.

Right now, where LLM coding agents can do there best work is in codebase where a solid foundation, clear conventions, and robust abstractions and patterns have all been well-established.

As writing code is becoming easier, designing software becomes more important than ever. As custom software becomes cheaper to create, the real challenge becomes choosing the right problems and managing the resulting complexity. I also see big open questions about who is going to maintain all this software.

Managing the complexity of large software systems has always been what makes software engineering challenging. While LLMs handle the “drudgery” of actually writing the code, are they simultaneously exacerbating the challenge of managing the complexity? If so, how do we, as “conductors” of coding agents, reign in that expanding complexity?

Agentic anxiety

https://jerodsanto.net/2026/02/agentic-anxiety/

Something’s different this time, and I can say confidently this is the most unsure I’ve ever been about software’s future.

As Jerod puts it. It's not FOMO (fear of missing out) so much as it is FOBLB (fear of being left behind).

One fascinating part of our conversation with Steve Ruiz from tldraw started when he confessed that he feels bad going to bed without his Claudes working on something.

I think part of what is behind this is the same thing discussed in the HBR article, AI Doesn't Reduce Work--It Intensifies It, which points to this feeling that if we can do more we should be doing more.

How AI assistance impacts the formation of coding skills

https://www.anthropic.com/research/AI-assistance-coding-skills

Anthropic released the results of a recent study they conducted where they warn of the impact that LLM agent use for software development tasks can have on learning and skill development. This impacts everyone, but is crucial for earlier-career developers who haven't developed as many of these skills without LLMs in the picture.

None of this means we shouldn't be using LLMs for software development tasks, but rather we have to be intentional about how we use them.

Importantly, using AI assistance didn’t guarantee a lower score. How someone used AI influenced how much information they retained. The participants who showed stronger mastery used AI assistance not just to produce code but to build comprehension while doing so—whether by asking follow-up questions, requesting explanations, or posing conceptual questions while coding independently.

And here is another excerpt from the end of the article:

Our study can be viewed as a small piece of evidence toward the value of intentional skill development with AI tools. Cognitive effort—and even getting painfully stuck—is likely important for fostering mastery. This is also a lesson that applies to how individuals choose to work with AI, and which tools they use.

Prompt for a scathing code review

https://www.reddit.com/r/ClaudeAI/comments/1q5a90l/so_i_stumbled_across_this_prompt_hack_a_couple/

I ran a variation of the following prompt because the OP sounds pretty hyped about the results they are getting from it.

Do a git diff and pretend you're a senior dev doing a code review and you HATE this implementation. What would you criticize? What edge cases am I missing?

I expanded on it a little to guide it toward overly verbose or overengineered code. And I added some structure by asking it to include a confidence score with each item.

Based on the latest commit (see git show) and the untracked files that go along with it, pretend you're a senior dev doing a code review and you HATE this implementation. What would you criticize? What edge cases am I missing? What is overengineered, too verbose, or overcomplicated? Provide a confidence score with each issue and order your results with highest confidence issues at the top.

These instructions also work better with my workflow where I have in progress changes that I plan to amend into the most recent commit.

It picked up a couple things that I completely missed. It didn't find much in terms of refactoring away verbose or over-engineered code, unfortunately. I'm going to keep trying that though. This was quick to do and give me some actionable feedback. Worth the squeeze in my opinion.

Your job is to deliver code you have proven to work

https://simonwillison.net/2025/Dec/18/code-proven-to-work/

To master these tools you need to learn how to get them to prove their changes work as well.

I’ve been using Claude Code to automate a series of dependency upgrades on a side project. I was surprised by the number of times it has confidently told me it fixed such-and-such issue and that it can now see that it works, when in fact it doesn’t work. I had to give it additional tests and tooling to verify the changes it was making had the intended effect.

My surprise at this was because the previous several upgrade steps it would deftly accomplish.

Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work.

The step beyond this that I also view as table stakes is ensuring the code meets team standards, follows existing patterns and conventions, and doesn’t unreasonably accrue technical debt.

How to write a great agents.md: Lessons from over 2,500 repositories

https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/

Put commands early: Put relevant executable commands in an early section: npm test, npm run build, pytest -v. Include flags and options, not just tool names. Your agent will reference these often.

I’ve seen big improvements in how agents do with my codebase when I include exactly what command and flags should be used for things like running the test suite.

Cover six core areas: Hitting these areas puts you in the top tier: commands, testing, project structure, code style, git workflow, and boundaries.

Covering these areas is a good starting point. I’m usually not sure where to start and only add things as needed.

Anthropic CEO: AI Will Be Writing 90% of Code in 3 to 6 Months

https://www.businessinsider.com/anthropic-ceo-ai-90-percent-code-3-to-6-months-2025-3

Around March 14th, 2025:

"I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code," Amodei said at a Council of Foreign Relations event on Monday.

Amodei said software developers would still have a role to play in the near term. This is because humans will have to feed the AI models with design features and conditions, he said.

Via this Armin Ronacher article, via this David Crespo Bluesky post.

I don’t want AI agents controlling my laptop – Sophie Alpert

https://sophiebits.com/2025/09/09/ai-agents-security

There’s no good way to say “allow access to everything on my computer, except for my password manager, my bank, my ~/.aws/credentials file, and the API keys I left in my environment variables”. Especially with Simon Willison’s lethal trifecta, you don’t really want to be giving access to these things, even if most of the time, nothing bad happens.

Context engineering

https://simonwillison.net/2025/Jun/27/context-engineering/

As Simon argues here, “prompt engineering” as a term didn’t really catch on. Perhaps “context engineering” better captures the task of providing just the right context to an LLM to get it to most effectively solve a problem.

It turns out that inferred definitions are the ones that stick. I think the inferred definition of "context engineering" is likely to be much closer to the intended meaning.

Why agents are bad pair programmers

https://justin.searls.co/posts/why-agents-are-bad-pair-programmers/

Justin describes some of ways that LLM coding agents used as pair programmers can lead to bad patterns in pair programming. In particular, one person taking over while the other person disengages or becomes a rubber stamp for their changes.

He then throws out some ideas for ways that LLM coding assistants can be improved with what we’ve learned over the years from person-to-person pairing.

My preference is what he describes here:

throttle down from the semi-autonomous "Agent" mode to the turn-based "Edit" or "Ask" modes. You'll go slower, and that's the point.

One suggestion is something we can do now (out-of-band or at least front-loaded).

Design agents to act with less self-confidence and more self-doubt. They should frequently stop to converse: validate why we're building this, solicit advice on the best approach, and express concern when we're going in the wrong direction

Tell the LLM you want to approach things this way, that you want to break the problem down and tease out any issues with the proposed approach. Engage in some back and forth, have it generate a plan, and then go from there.

AI-assisted Coding and the Jevons Paradox

https://lobste.rs/s/42qb2p/i_am_disappointed_ai_discourse#c_6sp5oi

Simon Willison on whether LLMs are going to replace software developers and what these tools mean for our careers:

I do think that AI-assisted development will drop the cost of producing custom software - because smaller teams will be able to achieve more than they could before. My strong intuition is that this will result in massively more demand for custom software - companies that never would have considered going custom in the past will now be able to afford exactly the software they need, and will hire people to build that for them. Pretty much the Jevons paradox applied to software development.

Claude and I write a utility program

https://blog.plover.com/tech/gpt/claude-xar.html

Just In Time manual reading

The world is full of manuals, how can I decide which ones I should read? One way is to read them all, which used to work back when I was younger, but now I have more responsibilities and I don't have time to read the entire Python library reference including all the useless bits I will never use. But here's Claude pointing out to me that this is something I need to know about, now, today, and I should read this one. That is valuable knowledge.

Vibe Coding the Right Way

https://wycats.substack.com/p/vibe-coding-the-right-way

TL;DR I use coding assistants to build out a piece of code. Once I feel ok with the initial output, I switch back into normal coding mode. I refactor the code, carefully document it and introduce abstractions that capture common patterns. I also write good tests for the code. For testing, I use my “normal coding brain” to come up with the test cases but lean on AI code accelerators to speed up the process.

I like this term AI code accelerators. It suggests that it helps do what you’d already decided to do. It helps with the boilerplate or repetition for instance.

I’m seeing more and more of this genre of article where very experienced software engineer describes their AI coding workflow.

Horseless intelligence

https://nedbatchelder.com/blog/202503/horseless_intelligence.html

My advice about using AI is simple: use AI as an assistant, not an expert, and use it judiciously. Some people will object, “but AI can be wrong!” Yes, and so can the internet in general, but no one now recommends avoiding online resources because they can be wrong. They recommend taking it all with a grain of salt and being careful. That’s what you should do with AI help as well.

Skeptics of the uses of LLMs typically point to strawman arguments and gotchas to wholesale discredit LLMs*. They're either clinging too tightly to their bias against these tools or completely missing the point. These tools are immensely useful. They aren't magic boxes though, despite what the hypemen might want you to believe. If you take a little extra effort to use them well, they are a versatile swiss army knife to add to your software tool belt."Use [these LLMs] as an assistant, not an expert."

*There are soooo many criticisms and concerns we can and should raise about LLMs. Let's have those conversations. But if all you're doing is saying, "Look, I asked it an obvious question that I know the answer to and it got it WRONG! Ha, see, this AI stuff is a joke." The only thing you're adding to the dialogue is a concern for your critical thinking skills.

If you approach AI thinking that it will hallucinate and be wrong, and then discard it as soon as it does, you are falling victim to confirmation bias. Yes, AI will be wrong sometimes. That doesn’t mean it is useless. It means you have to use it carefully.

Use it carefully, and use it for what it is good at.

For instance, it can scaffold a 98% solution to a bash script that does something really useful for me. Something that I might have taken an hour to write myself or something I would have spent just as much time doing manually.

Another instance, I'm about to write a couple lines of code to do X. I've done something like X before, I have an idea of how to approach it. A habit I've developed that enriches my programming life is to prompt Claude or Cursor for a couple approaches to X. I see how those compare to what I was about to do.

There are typically a few valuable things I get from doing this:

  1. The effort of putting thoughts into words to make the prompt clarifies my thinking about the problem itself. Maybe I notice an aspect of it I hadn't before. Maybe I have a few sentences that I can repurpose as part of a PR description later.
  2. The LLM suggests approaches I hadn't considered. Maybe it suggests a command, function, or language feature I don't know much about. I go look those things up and learn something I wouldn't have otherwise encountered.
  3. The LLM often draws my attention to edge cases and other considerations I hadn't thought of when initially thinking through a solution. This leads me to develop more complete and robust solutions.

I’ve used AI to help me write code when I didn’t know how to get started because it needed more research than I could afford at the moment. The AI didn’t produce finished code, but it got me going in the right direction, and iterating with it got me to working code.

Sometimes you're staring at a blank page. The LLM can be the first one to toss out an idea to get things moving. It can get you to that first prototype that you throw away once you've wrapped your head around the problem space.

Your workflow probably has steps where AI can help you. It’s not a magic bullet, it’s a tool that you have to learn how to use.

This reiterates points I made above. Approach LLMs with an open and critical mind, give it a chance to see where they can fit into your workflow.

There is no Vibe Engineering

https://serce.me/posts/2025-31-03-there-is-no-vibe-engineering

Software engineering is programming integrated over time. The integrated over time part is crucial. It highlights that software engineering isn't simply writing a functioning program but building a system that successfully serves the needs, can scale to the demand, and is able to evolve over its complete lifespan.

LLMs generate point-in-time code artifacts. This is only a small part of engineering software systems over time.

Vibe Coding as a practice is here to stay. It works, and it solves real-world problems – getting you from zero to a working prototype in hours. Yet, at the moment, it isn’t suitable for building production-grade software.

The 70% problem: Hard truths about AI-assisted coding

https://addyo.substack.com/p/the-70-problem-hard-truths-about

LLMs are no substitute for the hard-won expertise of years of building software, working within software teams, and evolving systems. You can squeeze the most out of iterations with a coding LLM by bringing that experience to every step of the conversation.

In other words, they're applying years of hard-won engineering wisdom to shape and constrain the AI's output. The AI is accelerating their implementation, but their expertise is what keeps the code maintainable.

The 70% problem

A tweet that recently caught my eye perfectly captures what I've been observing in the field: Non-engineers using AI for coding find themselves hitting a frustrating wall. They can get 70% of the way there surprisingly quickly, but that final 30% becomes an exercise in diminishing returns.

Addy goes on to describe this "two steps back pattern" where a developer using an LLM encounters an error, they ask the LLM to suggest a fix, the fix sorta works but two other issues crop up, and repeat.

This cycle is particularly painful for non-engineers because they lack the mental models to understand what's actually going wrong. When an experienced developer encounters a bug, they can reason about potential causes and solutions based on years of pattern recognition.

Beyond having the general programming and debugging experience to expedite this cycle, there is also an LLM intuition to be developed. I remember John Lindquist describing that he notices certain "smells" when working with LLMs. For instance, often when you're a couple steps into a debugging cycle with an LLM and it starts wanting to go make changes to config files, that is a smell. It's a "smell" because it should catch your attention and scrutiny. A lot of times this means the LLM is way off course and it is now throwing generative spaghetti at the wall. I learned two useful things from John through this:

  1. You have to spend a lot of time using different models and LLM tools to build up your intuition for these "smells".
  2. When you notice one of these smells, it's likely that the LLM doesn't have enough or the right context. Abort the conversation, refine the context and prompt, and try again. Or feed what you've tried into another model (perhaps a more powerful reasoning one) and see where that gets you.

Being able to do any of that generally hinges on having already spent many, many years debugging software and having already developed some intuitions for what is a good next step and what is likely heading toward a dead end.

These LLM tools have shown to be super impressive at specific tasks, so it is tempting to generalize their utility to all of software engineering. However, at least for now, we should recognize the specific things they are good at and use them for that:

This "70% problem" suggests that current AI coding tools are best viewed as:
- Prototyping accelerators for experienced developers
- Learning aids for those committed to understanding development
- MVP generators for validating ideas quickly

I'd at to this list:

  • Apply context-aware boilerplate autocomplete — establish a pattern in a file/codebase or rely on existing library conventions and a tool like Cursor will often suggest an autocompletion that saves a bunch of tedious typing.
  • Scaffold narrow feature slices in a high-convention framework or library — Rails codebases are a great example of this where the ecosystem has developed strong conventions that span files and directories. The LLM can generate 90% of what is needed, following those conventions. By providing specific rules about how you develop in that ecosystem and a tightly defined feature prompt, the LLM will produce a small diff of changes that you can quickly assess and test for correctness. To me this is distinct from the prototyping item suggested by Addy because it is a pattern for working in an existing codebase.