A Sober guide to using LLMs for code, the pitfalls and issues, and how to avoid becoming the “Reverse Centaur”.

Note that this itself blog was written with the help of AI, to help format my thoughts in an easy to read way- The content itself is all more or less written by me though, it’s merely helped me give it a better shape.

The Disclaimer

Let’s address the elephant in the room immediately. I know my audience. I know many of you on the Fediverse and elsewhere are rightfully skeptical of the current AI boom. There are valid concerns about copyright, energy consumption, and the sheer volume of hype-bro noise drowning out actual discourse.
This post isn’t here to convert you or hype the technology. It is here because LLMs are increasingly part of the developer toolbelt, and if used incorrectly, they generate bad code, create security vulnerabilities, and waste massive amounts of time and energy.
If you are going to use these tools, or if you are forced to use them, you need to remain the pilot. The moment you stop thinking, you become the accessory to the machine. Here is how to avoid that.

1. The “Excited Puppy” Problem

The most dangerous misconception about Agentic coding (where an AI creates and edits files autonomously) is that the AI is a senior engineer. It is not.
LLMs are like excited puppies. They are extremely fast, eager to please, and operate at a confident “intermediate” level across a wide range of domains. But they do not “reason.” They correlate.
If you ask a puppy to fetch a stick, it might run through a screen door to get it. If you ask an LLM to fix a dependency issue, it might hallucinate a solution that uproots your entire tech stack because it doesn’t see the glass door in front of it. They need a leash, and they need to be steered away from dangerous paths.

2. The Explain-First Protocol

Because LLMs are correlation engines, they tend to rush toward the most probable solution, which isn’t always the correct one.
Unless a bug is trivial and you know exactly what is wrong (e.g., “fix this typo on line 40”), do not start with a request for code. Start with a request for analysis.
Force the model to “think” before it acts. Ask it: “Analyze this error, explain why it is happening, and propose 3 potential fixes. Do not write code yet.”
Modern editors like Cursor often have specific “Plan” or “Chat” modes designed exactly for this. Use them to build context and agree on a strategy before you switch to “Composer” or “Agent” mode to execute the changes. This pre-computation step ensures the agent has the full picture before it starts typing.
This prevents the model from hallucinating a fix based on a misunderstanding. It saves you from applying a “fix” that breaks something else. However, be pragmatic: if you know the issue is a missing semicolon, don’t waste tokens and time on a philosophical essay about syntax. Just add the semicolon yourself, if possible.

3. The Spectator Rule: Don’t Look Away

It is tempting to give an agent a prompt and tab-switch to something else while it works.
You can do this—it is your time, after all—but you need to understand the risk.
You must watch the terminal. You must watch the diffs as they happen. This is boring, but it is the difference between a working feature and a debugging nightmare.
I have watched agents decide that a standard websocket error required rewriting the entire server architecture, rather than realizing the two websockets were fighting due to a lack of coordination. If you aren’t watching, you will miss the moment the agent goes off the rails. You will return to code that technically “works” but is filled with obscure, Rube Goldberg-esque logic that you don’t understand and can’t maintain in the long term.

4. The Git Save Point (Boss Fight Strategy)

Treat every significant LLM interaction like a boss fight in a video game: Save before you enter.
Before you prompt the agent to implement a feature or fix a bug, commit your current state. It is infinitely better to have a cluttered commit history than to lose working code.
If the “excited puppy” tears up the sofa- hallucinating a refactor that breaks your build or deleting a critical config file- you want to be able to git reset --hard instantly. Do not rely on the editor’s “undo” stack, which can get messy with multi-file edits. If an incremental change wasn’t caught as the culprit immediately, a granular commit history allows you to bisect and find exactly where the LLM introduced the weirdness.

5. The Hammer and The Screwdriver

You cannot use LLMs to skip the learning phase. If you don’t understand your stack, you cannot effectively judge the code the LLM produces.

  • Express: Error-handling middleware must have exactly 4 arguments: (err, req, res, next). If an LLM writes it with 3, Express treats it as a regular route and fails to catch errors.
  • Go: Go hates “magic.” It despises implicit control flow like Exceptions or “Convention over Configuration.” If an LLM tries to be clever with decorators or hidden logic, it is fighting the language.
  • Django: Prefers its built-in ORM over raw SQL.

If you don’t spend the 5 minutes reading the introduction to the frameworks you are using, you won’t notice when the LLM starts nailing things to the wall with a screwdriver. You need to know enough to intervene and say, “Hey, that’s dumb. Just use a single websocket channel for that.”

6. Documentation First (The Rulebook)

Never dive straight into code generation. Before I let an agent touch a file, I write guidance documents. I recommend maintaining a few specific Markdown files in your repository that the LLM is forced to read.
Crucially, these documents should contain a Rulebook—a list of explicit DOs and DON’Ts.

  • Original_Idea.md: The “Why”. The research, the goal, the philosophy of the project.
  • TODO.md: The Roadmap.
    • If your toolkit lacks a built-in task tracker (though some like Cursor have versions of this), maintain a plain Markdown checklist.
    • Crucial: Explicitly instruct the LLM in your guidance doc to read this file before starting and to check off items only after verification. This prevents the “I thought I did that already” hallucination loop.
  • LLM_Guidance.md: The laws of the land.
    • DO: “ALWAYS create unit tests for new features.”
    • DO: “Prefer functional programming patterns.”
    • DO: “Update TODO.md when a task is verifiably complete.”
    • DON’T: “Do NOT change core project aspects or update dependencies without explicit consent.”
    • DON’T: “Do NOT delete comments.”
    • etc.
    • You could consider making a list of DOs and DONTs and filling these in as you go, because you’ll run into more patterns and issues later on that you might want to tackle.

Tools like Cursor have an explicit “Plan” mode. Use it. Discuss the architecture with the chatbot first, agree on a path, then let it code.

7. Testing is Your Guard Rail

Testing is non-negotiable. As projects grow complex, an LLM (or a human) implementing Feature B will frequently knock Feature A off the table in counterintuitive ways.
Both LLMs and humans will assume their changes are correct. You will not know something is wrong until measurable science tells you it is wrong. You must assume they are wrong, and you must assume you will be wrong too sometimes.
Automated tests are the only way to verify reality. If you don’t have tests, you are trusting a stochastic parrot with your production environment.

8. Modularity and MVP Scaffolding

Keep your files small (under 600-800 lines). The more context an LLM has to juggle in a single file, the higher the probability of it hallucinating.
However, be careful with scaffolding. Don’t ask an LLM to “build the entire backend structure” in one go. You will end up with a lot of generic boilerplate that you have to spend hours cleaning up.
Use an MVP Approach:

  1. Ask for the minimum viable skeleton (interfaces, function signatures) with not too much scaffolding and example content (otherwise you have to clean this up later).
  2. Verify that structure.
  3. Build it out incrementally.

Don’t go for the final form immediately. It usually takes a few cycles to get right, and iterating on a small, working MVP is easier than debugging a massive, hallucinated monolith.

9. Pick the Right Tool (Context Matters)

Not all LLMs are created equal.

  • Massive Context (e.g., Gemini): Excellent for understanding large codebases, analyzing logs, or finding bugs that span multiple modules.
  • High Reasoning (e.g., Claude 4.5 Opus / GPT5.2 “High Max” at the time of writing): Best for complex architectural logic and generation.
  • Smaller Models: (e.g “Claude 4.5 Sonnet” or even smaller ones or local models like “Deepseek-R1”)  Good for grunt work, but can be “penny wise, pound foolish” if they generate bugs you have to spend money fixing later.

Be mindful of your usage. Using a premium model to format JSON is a waste of money. Using a cheap model to design database schema is a recipe for disaster.

10. Radical Ownership

This might be the hardest mindset shift. When you use an LLM, you are not outsourcing responsibility.
Treat the generated code as your own.
Give it the same scrutiny you would give a junior developer’s PR, or code you wrote yourself at 3 AM. If you ship it, you own it. If it creates a security vulnerability, that is on you, not the model.
Scrutinizing the output line-by-line doesn’t just catch bugs; it drastically improves your understanding of the codebase. If you can’t explain what the generated code does, do not commit it.

11. The Pragmatism of Politeness

It actually pays to be nice to the LLM.
This isn’t because the AI has feelings or cares about your manners. It is because the model is a prediction engine trained on real human conversations.
In the training data, high-quality, helpful, and professional answers usually follow polite, professional prompts. Conversely, swearing, aggression, or SHOUTING IN ALL CAPS is statistically correlated with flame wars, low-quality forum arguments, and unhelpful responses.
If you roleplay a situation where you are swearing at the machine, the model may unknowingly “roleplay” back a less helpful or more chaotic persona. Being polite is just good prompt engineering.

12. Don’t Be a Reverse Centaur

Cory Doctorow famously distinguished between the Centaur and the Reverse Centaur:
“A centaur is a human being who is assisted by a machine… A reverse-centaur is a machine that is assisted by a human being, who is expected to work at the machine’s pace.”
The biggest mistake developers make with LLMs is becoming the Reverse Centaur.
When you start blindly copy-pasting code back and forth, hoping the error message goes away, you have lost control. You have become the “accountability sink” for the machine’s bad output. You are the meat-servo for the machine.
Stop. Go back to planning mode. Think.
You are the controller. The LLM is the engine. If the engine starts making weird noises, don’t just turn up the radio; pull over and look under the hood.

Summary: Operational Good Practices

  • Read the Manual: Spend 5 minutes reading the docs of any tool you use.
  • Watch the Terminal: Do not tab away while the agent is coding, you will miss issues when they happen.
  • Maintain the Rulebook: Keep an LLM_Guidance.md file updated with what works and what doesn’t for your specific project.
  • Trust Nothing: Verify every output with tests or manual review.
  • Incremental Builds: Don’t ask for the whole cathedral; ask for the foundation, then the walls, then the roof.

TL;DR: The Pilot’s Checklist

Before you hit ‘Enter’ on that prompt:

  • State Check: Did I commit my working code? (Git Save Point)
  • Context Check: Does the LLM have the relevant files (and only the relevant files) in context?
  • Knowledge Check: Do I understand what I am asking it to do?
  • Strategy Check: Am I asking for code, or should I be asking for a plan/explanation first?
  • Safety Check: Do I have tests in place to catch if it breaks existing features?

Updated: