The Wild Wonderful World of Wegexes
We didn’t get to regexes in this class, but you might consider using them in your final.
Resources:
- Pythex is a nice way to build out regular expressions while testing them visually
- Regex Crossword can be a fun way to learn regexes by producing text that will match multiple regexes, like crossword puzzles have overlapping words.
Regexes are super powerful. It takes a lot of practice to be able to parse the complicated ones. But it’s also OK to start simple:
this
is a regex that matches these exact characters:this
(t)?his
is a regex that matcheshis
and, optionally,this
. The parentheses plus the?
indicate that thet
is optional.[th]is
matcheshis
ORtis
. The square brackets indicate either/or.
Character Classes
Note: Regex character classes are different from the Classes used to make objects in Python.
The above examples add in some regex syntax but use characters. Regex also provides special symbols which indicate a range of characters. \d
indicates a digit, for instance.
So, the pattern of a typical US phone number would be written
\d\d\d-\d\d\d-\d\d\d\d
This would match string like 555-867-5309
, since there are digits and hyphens that follow the pattern above.
\d
indicates digit characters, 0-9\w
indicates word characters, a-z, A-Z, 0-9, and _\s
indicates space characters, like `,
\t(tab), and
\n` (newline)
The capitalized versions of each of these match everything not in the classes.
Selecting
Let’s say we only wanted the area code of a phone number. We can use parentheses to select just that portion of a match:
(\d\d\d)-\d\d\d-\d\d\d\d
Why not just use \d\d\d
? well, that would also find other instances of three digits in a row. We want three digits in a row only when they’re followed by the rest of a phone number. So we write the pattern of a phone number and select what we want out of it.