Agentic Coding

This lecture builds on the AI-powered development material from the Development Environment and Tools lecture.

Coding agents are conversational AI models with access to tools such as reading/writing files, web search, and invoking shell commands. They live either in the IDE or in standalone command-line or GUI tools. Coding agents are highly autonomous and powerful tools, enabling a wide variety of use cases.

Continuing the example from the Development Environment and Tools lecture, we can try prompting a coding agent with the following task:

Turn this into a proper command-line program, with argparse for argument parsing. Add type annotations, and make sure the program passes type checking.

The agent will read the file to understand it, then make some edits, and finally invoke the type checker to make sure the type annotations are correct. If it makes a mistake such that it fails type checking, it will likely iterate, though this is a simple task so that is unlikely to happen. Because coding agents have access to tools that may be harmful, by default, agent harnesses prompt the user to confirm tool calls.

Coding agents support multi-turn interaction, so you can iterate on work over a back-and-forth conversation with the agent. You can even interrupt the agent if it’s going down the wrong track. One helpful mental model might be that of a manager of an intern: the intern will do the nitty gritty work, but will require guidance, and will occasionally do the wrong thing and need to be corrected.

How AI models and agents work

Fully explaining the inner workings of modern large language models (LLMs) and infrastructure such as agent harnesses is beyond the scope of this course. However, having a high-level understanding of some of the key ideas is helpful for effectively using this bleeding edge technology and understanding its limitations.

LLMs can be viewed as modeling the probability distribution of completion strings (outputs) given prompt strings (inputs). LLM inference (what happens when you, e.g., supply a query to a conversational chat app) samples from this probability distribution. LLMs have a fixed context window, the maximum length of the input and output strings.

AI tools such as conversational chat and coding agents build on top of this primitive. For multi-turn interactions, chat apps and agents use turn markers and supply the entire conversation history as the prompt string every time there is a new user prompt, invoking LLM inference once per user prompt. For tool-calling agents, the harness interprets certain LLM outputs as requests to invoke a tool, and the harness supplies the results of the tool call back to the model as part of the prompt string (so LLM inference runs again every time there is a tool call/response).

Most AI coding tools in their standard configurations send a lot of your data to the cloud. Sometimes the harness runs locally while LLM inference runs in the cloud, other times even more of the software is running in the cloud (and, e.g., the service provider might effectively get a copy of your entire repository as well as all interactions you have with the AI tool).

There are open-source AI coding tools and open-source LLMs that are pretty good (though not quite as good as the proprietary models), but at the present, for most users, running bleeding-edge open LLMs locally will be infeasible due to hardware limitations.

Use cases

Coding agents can be helpful for a wide variety of tasks. Some examples:

Advanced agents

Here, we give a brief overview of some more advanced usage patterns and capabilities of coding agents.

For many of the advanced features that require writing prompts (e.g., skills or subagents), you can use LLMs to get you started. Some coding agents even have built-in support for doing this. For example, Claude Code can generate a subagent from a short prompt (invoke /agents and create a new agent). Try creating a subagent with this prompt:

A Python code checking agent that uses `mypy` and `ruff` to type-check, lint, and format *check* any files that have been modified from the last git commit.

Then, you can use the top-level agent to explicitly invoke the subagent with a message like “use the code checker subagent”. You might also be able to get the top-level agent to automatically invoke the subagent when appropriate, for example, after modifying any Python files.

What to watch out for

AI tools can make mistakes. They are built on LLMs, which are just probabilistic next-token-prediction models. They are not “intelligent” in the same way as humans. Review AI output for correctness and security bugs. Sometimes verifying code can be harder than writing the code yourself; for critical code, consider writing it by hand. AI can go down rabbit holes and try to gaslight you; be aware of debugging spirals. Don’t use AI as a crutch, and be wary of overreliance or having a shallow understanding. There’s still a huge class of programming tasks that AI is still incapable of doing. Computational thinking is still valuable.

Recommended software

Many IDEs / AI coding extensions include coding agents (see recommendations from the development environment lecture. Other popular coding agents include Anthropic’s Claude Code, OpenAI’s Codex, and open-source agents like opencode.

Exercises

  1. Compare the experience of coding by hand, using AI autocomplete, inline chat, and agents by doing the same programming task four times. The best candidate is a small-sized feature from a project you’re already working on. If you’re looking for other ideas, you could consider completing “good first issue” style tasks in open-source projects on GitHub, or Advent of Code or LeetCode problems.
  2. Use an AI coding agent to navigate an unfamiliar codebase. This is best done in the context of wanting to debug or add a new feature to a project you actually care about. If you don’t have any that come to mind, try using an AI agent to understand how security-related features work in the opencode agent.
  3. Vibe code a small app from scratch. Do not write a single line of code by hand.
  4. For your coding agent of choice, create and test an AGENTS.md (or analogous for your agent of choice, such as CLAUDE.md), a reusable prompt (e.g., custom slash command in Claude Code or custom prompts in Codex), a skill (e.g., skill in Claude Code or skill in Codex), and a subagent (e.g., subagent in Claude Code). Think about when you’d want to use one of these versus another. Note that your coding agent of choice might not support some of these functionalities; you can either skip them, or try a different coding agent that has support.
  5. Use a coding agent to accomplish the same goal as in the Markdown bullet points regex exercise from the Code Quality lecture. Does it complete the tasks via direct file edits? What are the downsides and limitations of an agent editing the file directly to complete such a task? Figure out how to prompt the agent such that it doesn’t complete the task via direct file edits. Hint: ask the agent to use one of the command-line tools mentioned in the first lecture.

Edit this page.

Licensed under CC BY-NC-SA.