(Optional) Getting Started with Regular Expressions

by Elliott Hauser

24 Nov 2022

The Wild Wonderful World of Wegexes

We didn’t get to regexes in this class, but you might consider using them in your final.

Resources:

  • Pythex is a nice way to build out regular expressions while testing them visually
  • Regex Crossword can be a fun way to learn regexes by producing text that will match multiple regexes, like crossword puzzles have overlapping words.

Regexes are super powerful. It takes a lot of practice to be able to parse the complicated ones. But it’s also OK to start simple:

  • this is a regex that matches these exact characters: this
  • (t)?his is a regex that matches his and, optionally, this. The parentheses plus the ? indicate that the t is optional.
  • [th]is matches his OR tis. The square brackets indicate either/or.

Here’s this example on Pythex

Character Classes

Note: Regex character classes are different from the Classes used to make objects in Python.

The above examples add in some regex syntax but use characters. Regex also provides special symbols which indicate a range of characters. \d indicates a digit, for instance.

So, the pattern of a typical US phone number would be written

  • \d\d\d-\d\d\d-\d\d\d\d

This would match string like 555-867-5309, since there are digits and hyphens that follow the pattern above.

  • \d indicates digit characters, 0-9
  • \w indicates word characters, a-z, A-Z, 0-9, and _
  • \s indicates space characters, like ` , \t (tab), and \n` (newline)

The capitalized versions of each of these match everything not in the classes.

Selecting

Let’s say we only wanted the area code of a phone number. We can use parentheses to select just that portion of a match:

  • (\d\d\d)-\d\d\d-\d\d\d\d

Why not just use \d\d\d? well, that would also find other instances of three digits in a row. We want three digits in a row only when they’re followed by the rest of a phone number. So we write the pattern of a phone number and select what we want out of it.

Elliott Hauser is an Assistant Professor at the UT Austin iSchool. He's hacking education as one of the cofounders of Trinket.io. Find Elliott Hauser on Twitter, Github, and on the web.