Code Quality + Q&A

This page is under construction for the IAP 2026 offering of Missing Semester. This lecture will cover topics similar to the Metaprogramming lecture from the 2020 offering.

In this lecture, we’ll cover:

Regular expressions for pattern matching

Regular expressions, commonly abbreviated as “regex”, is a language used to represent sets of strings. IDEs support regex for pattern-based search and search-and-replace. Regex patterns are also used commonly for pattern matching in other contexts such as command-line tools. For example, ag supports regex patterns for codebase-wide search (e.g., ag "import .* as .*" will find all renamed imports in Python), and go test supports a -run [regexp] option for selecting a subset of tests. Furthermore, programming languages have built-in support for third-party libraries for regular expression matching, so you can use regexes for functionality such as pattern matching, validation, and parsing.

To help build intuition, below are some examples of regex patterns. In this lecture, we use Python regex syntax. There are many flavors of regex, with slight variation between them, especially in the more sophisticated functionality. You can use an online regex tester like regex101 to develop and debug regular expressions.

Regex syntax

You can find a comprehensive guide to regex syntax in this documentation (or one of many other resources available online). Here are some of the basic building blocks:

Capture groups and references

If you use regex groups (...), you can refer to sub-parts of the match for extraction or search-and-replace purposes. For example, to extract just the month from a YYYY-MM-DD style date, you can use the following Python code:

>>> import re
>>> re.match(r"\d{4}-(\d{2})-\d{2}", "2026-01-14").group(1)
'01'

In your text editor, you can use reference capture groups in replace patterns. The syntax might vary between IDE. For example, in VS Code, you can use variables like $1, $2, etc., and in Vim, you can use \1, \2, etc., to reference groups.

Limitations

Regular languages are powerful but limited; there are classes of strings that cannot be expressed as a standard regex (e.g., it is not possible to write a regular expression that matches the set of strings {a^n b^n | n ≥ 0}, the set of strings of a number of “a”s followed by the same number of “b”s; more practically, languages like HTML are not regular languages). In practice, modern regex engines support features like lookahead and backreferences that extend support beyond regular languages, and they are practically extremely useful, but it is important to know that they are still limited in their expressive power. For more sophisticated languages, you might need to reach for a more capable type of parser (for one example, see pyparsing, a PEG parser).

Learning regex

We recommend learning the fundamentals (what we have covered in this lecture), and then looking at regex references as you need them, rather than memorizing the entirety of the language.

Conversational AI tools can be effective at helping you generating regex patterns. For example, try prompting your favorite LLM with the following query:

Write a Python-style regex pattern that matches the requested path from log lines from Nginx. Here is an example log line:

169.254.1.1 - - [09/Jan/2026:21:28:51 +0000] "GET /feed.xml HTTP/2.0" 200 2995 "-" "python-requests/2.32.3"

Q&A

In the second half, we will cover student questions. Please submit your questions in advance of the lecture:

https://forms.gle/4jnhiok72KQUD3Tn7

See here for the Q&A from the previous offering of this course.

Exercises

  1. Practice regex search-and-replace by replacing the - Markdown bullet markers with * bullet markers in the lecture notes for today. Note that just replacing all the “-“ characters in the file would be incorrect, as there are many uses of that character that are not bullet markers.
  2. Write a regex to capture from JSON structures of the form {"name": "Alyssa P. Hacker", "college": "MIT"} the name (e.g., Alyssa P. Hacker, in this example). Hint: in your first attempt, you might end up writing a regex that extracts Alyssa P. Hacker", "college": "MIT; read about greedy quantifiers in the Python regex docs to figure out how to fix it.

Edit this page.

Licensed under CC BY-NC-SA.