Musings and misadventures of an expat enterpreneur

With, By, or For - Three ways I use LLMs as a software engineer

anelson May 18, 2025 #chatgpt #llm #ai #tech

Like most everyone in my company (and, if you’re reading this, probably yours as well), my colleagues and I have been enthusiastically adopting AI tech, particularly LLMs, since ChatGPT 3.5 first showed us what was possible with next-token prediction and insane amounts of compute and training. In the 2+ years since then, I’ve used quite a few tools and numerous models, and tried to find how to get the most productive value out of LLMs as a software engineer and engineering leader. In that time I’ve learned quite a bit about how to get value out of LLMs without succumbing to either of the most common failure modes: outright dismissal of modern LLMs as ā€œstochastic parrotsā€, and breathless idiotic ā€œAI cloned AirBnb in one shot devs are NGMI!!!ā€ clickbaiting. This article is a distillation of my thinking on using LLMs as a software engineer, as of May 2025.

Disclaimer: AI technology is changing very rapidly. I expect I’ll read this article five years from now and scoff at how primitive our tools were and how clueless I was. Consider this article a snapshot of the state of my use of LLMs as of this moment in time. I’m neither an AI doomer nor an AI hypebeast, I’m a working SWE and engineering leader who has to deal with these technologies whether I like it or not and is trying to find ways to get the most value from LLMs and avoid drowning in AI slop.

tl;dr - Three ways I use LLMs as an SWE

I break down the ways I use LLMs as an SWE into three categories, in the order in which I discovered them:

None of these is better or worse than the other; you can do stupid things and hurt yourself with all three of these techniques, and you can get value out of LLMs without using any of them. This is the combination that works for me in May 2025. The rest of this article is an elaboration on the three techniques.

With

This was the original use case for LLMs in the ChatGPT 3.5 era, when all we had was the chat interface. You’d prompt it to do something, it would spit out some code or a command, which you’d copy-paste or modify or just run, if you got an error you’d copy-paste that back into the session, ad infinitum. Or you’d use it as a stochastic Google, asking it questions that it hopefully will produce the right answers to. As far as I can tell, for the vast majority of people who use AI, this continues to be the primary workflow.

This can go badly wrong when you don’t understand the limitations of the LLM, but it can also be immensely helpful and save a ton of time if you know what you’re doing.

For my day-to-day coding with LLMs, I use Claude Desktop and the latest thinking model, which as of now is Sonnet 3.7 (update: now Sonnet 4 and Opus 4 are out). If I don’t like what I’m getting, or if Claude is down or I’m throttled, I will use the ChatGPT Desktop or Mobile app. Both of these are $20/mo and are well worth the price for the value I get.

If the task involves up-to-date information or searching the Internet is for some reason key to success, I also have a Perplexity Pro subscription which I use constantly.

Here are a few real-world examples pulled from my recent history in the aforementioned applications:

By

Starting in Q4 last year, ā€œagentic AIā€ became all the rage, and agentic features started to appear in Cursor and similar tools. AI influencers fell over each other to be the first to state the obvious, that 2025 was to be the ā€œyear of agentic AIā€. In February 2025, Andrej Karpathy christened the name vibe coding to refer to the low-effort generation of AI slop code that was already by then rampant. AI grifters on YouTube and X performatively gaped at the ease with which primitive agentic coding tools turned a screenshot of some web app into a React application, heralding the end of software engineering and the urgency with which one must join their Patreon or perish.

If you’ve experimented with these tools, you’ve likely noticed how quickly and confidently they produce garbage code that you wouldn’t accept from the greenest junior developer. You can be forgiven for dismissing agentic coding tools as gimmicks hyped by charlatans who need to somehow justify their absurd VC investments, for indeed they are in many cases exactly that. However, I have been able to get some valuable output from them, and whether you like it or not your laziest and least-capable colleagues are churning out code written by AI anyway so you may as well come to terms with it now.

As of right now my go-to agentic coding tool is Claude Code, although I still pay $20/mo for Cursor and occassionally use it still.

Claude Code uses the Anthropic APIs, which are billed per token, so comparing it to the $20/mo Cursor subscription isn’t really fair, but I’m doing it anyway. My company has some generous Google GCP credits, and Claude Sonnet 3.7 is available via Vertex AI, so for us in particular Claude Code is ā€œfreeā€, in the sense that it doesn’t use up any of our runway. Paying Cursor per token plus their 10% markup would happen with actual dollars, and Claude Code works very well for me so I haven’t put any effort into exploring other options yet.

Cursor’s agent mode has come a long way since I first mentioned it in my year-end GenAI tooling review. There’s even (finally) a background mode so you can have multiple agents churning on tasks. However, I have grown very tired of the throttling on Cursor when I use up all of my ā€œfastā€ credits, which happened usually within a few days of the start of the billing cycle. I also get the sense that Cursor is motivated to minimize the amount of tokens that they pay for on the $20/mo plans which may explain the poor performance I experienced. But Cursor is a VS Code fork, and sometimes I prefer those ergonomics to that of the terminal, which is when I still find myself reaching for Cursor. Also, since I do most of my day-to-day work with Claude Code, Cursor throttling is less of a problem for me.

As for how to get decent code written by LLMs, the best practices in the Anthropic Claude Code best practices doc are what helped me finally get some decent results. That doc is very specific to Claude Code, but the section ā€œ3. Try common workflowsā€ seems like broadly applicable guidance that will improve results with other agentic coding tools that work similarly to Claude Code.

Thanks to that Anthropic guidance, I’ve been able to get several useful results out of LLMs that were done faster than I could have done on my own, even taking into account rework and time spent reviewing and correcting the code. Here are a few examples:

If you do nothing more than follow the Anthropic best practices with Claude Code and make a good-faith effort learn the nuances of how the various coding models work, I think you’ll get good results at least some of the time. This is especially true if you use agents to do tasks that otherwise would not be done at all for reasons of mental energy or familiarity with a codebase or tech stack.

However, I would also urge you augment the Anthropic best practices by investing heavily in the technique described in the next section. Your codebase needs to be written for AI, since it’s probably inevitable at this point that at least parts of it are going to be written by AI.

For

Already in last December’s year-end GenAI tooling review, the kernels of what became ā€œcoding for AIā€ were present:

  • The corollary of the previous bullet is that generating docs optimized for LLM consumption will be much more important, particularly for new tools and languages. I think it’s inevitable that software development agents will need to get much better at looking up documentation, and when they do the extent to which that documentation is easily consumed by whatever mechanism they use will be important. Right now it seems like dumping all documentation content into a big Markdown file is a pretty good approach, but I bet this will be refined over time. This applies not just to developer docs but also end-user docs as well. On the plus side, perhaps this will finally be the death of product docs locked away behind a login?

In the intervening 5 months working with agentic systems, it’s become abundantly clear that that prediction was right but also insufficient: in fact, not just docs help LLMs, but anything that can be invoked as a tool that will provide actionable feedback on their output.

In fact, it turns out that the things I’ve been doing throughout my career to harden a code base against the predations of eager juniors and incompetent offshore ā€œseniorsā€ brought in for the latest ā€œthis-time-it’s-differentā€ management cost-cutting scheme, also go a long way to making agents more useful. Every programmer I know who was done anything with LLMs in the last two years has inevitably characterized the experience as that of working with an eager junior, tireless and overconfident. If you, like me, enjoy the experience of mentoring a promising and eager junior as he or she grows into a more capable programmer, then you will probably protest that a stochastic parrot wrapped in an agentic framework is something entirely different and qualitatively inferior. I won’t argue that point, but just like an eager junior (or the latest outsourcing scammer), LLMs have no actual understanding of anything they write, and they lack any judgement by which to evaluate what they have built. If you force them to get their code to pass a type check or compilation step, a linter, a beautifier, unit tests, integration tests, maybe some dynamic analysis, you automate much of the tediuous and error-prone verification work, so that by the time it gets to you for review you at least know you won’t see any mistakes any previous steps can catch.

In the case of coding agents, this isn’t just a way to spare yourself the brunt of their vibe-coded stupidity. In many cases it seems that this feedback cycle somehow guides the agent along a random walk to more likely arrive at an acceptable answer. I suppose if you think of the underlying LLM as a stochastic parrot, then it makes sense that the more guardrails you put in place the more values the stochastic parrot can sample, thus increasing the odds that it eventually produces something that’s at least acceptable.

Here are some of the actual things I’ve put in place in codebases where I want to enable (or in some cases, lack the power to prohibit) productive use of coding agents:

I’m sure over time we’ll discover more techniques for building guardrails to keep the agents on the road. I personally would love to find a way to detect the zero-value comments that LLMs are so fond of injecting into the code, as well as the deletion of valuable comments that for whatever reason they seem to deem unnecessary.

Conclusion

The tone of this text may suggest that I’m an AI skeptic and possibly even a curmudgeonly gray beard gatekeeper nostalgic for the days of punch cards and walking to school uphill both ways in the snow. That is very much not the case. I am bullish on AI tools in general, I already get a lot of value in the tools as they exist today, and it seems certain that they will continue to increase in capability.

But I am also a jaded and cynical SWE who can clearly see the lazy and careless use of AI by people whose unaugmented abilities are low enough that they are not capable of evaluating the slop that their GenAI tooling is producing in their name. This is already wasting my time by making me read AI generated slop docs and looking at vibe-coded PRs from idiots who don’t know or possibly don’t care how obvious it is that their incompetence in their actual job is matched only by their inability to prompt LLMs to do their job for them.

I’ve written this article for others like me who feel the same way. You cannot stop this AI revolution, you cannot hide from slop, but I urge you to keep an open mind and try to regard this new technology with the wonderment that I remember in my youth as I first discovered programming and then the Internet. There is real value to be had here, and it’s worth your time to figure out how to take advantage of it.

Statement of Provenance

I wrote the text of this article entirely myself, by thinking thoughts and translating those thoughts into words which I typed with my hands on a keyboard. Any emdashes, proper grammar and spelling, or use of the words ā€œunderscoreā€ and ā€œdelveā€ is entirely coincidental.

OpenAI’s GPT 4.5 model was used to copyedit a draft of this text for typos and sloppy or lazy writing. Its feedback was read by me with my eyeballs, the proposed changes were considered by me with my own brain, and changes that I agreed with were again made with my own hands.

All thoughts and clever turns of phrase are my own, and do not necessarily reflect the opinions of Elastio, nor did they emerge from a high-dimensional latent space on NVIDIA silicon.