Where are regular expressions used?
Better question: where aren't they used?
Webpages are written with HTML tags, where each tag specifies an element on the page.
The input
tag renders a text input field:
<label> Zip code
<input name="zip" type="text" pattern="\d\d\d\d\d">
</label>
→
The pattern
attribute uses a regular expression
to describe what is valid for that field.
Use {}
to specify how many instances to match.
{n}
matches exactly n
instances
{n,}
matches n
or more instances
{n,m}
matches from n
and m
instances
<label> Zip code
<input name="zip" type="text" pattern="\d{5}">
</label>
→
<label>TBD
<input name="tbd" type="text" pattern="[A-Za-z]{3}">
</label>
→
<label>TBD
<input name="tbd" type="text" pattern="\d{4}-\d{2}-\d{2}">
</label>
→
<label>TBD
<input name="tbd" type="text" pattern="[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$">
</label>
→
Let's make a regular expression to match 24-hour times
of the format HH:MM
.
First draft: [0-2]\d:\d\d
[0-2]\d:[0-5]\d
((2[0-3])|([0-1]\d)):[0-5]\d
Try in regexr.com!
Let's make a regular expression to match any tweet talking about GME stock.
First draft: GME
\bGME\b
Try in regexr.com!
start: sentence
sentence: describe_wants | describe_feeling
describe_wants: TODDLER "wants" noun_phrase "!"
noun_phrase: ARTICLE? NOUN
describe_feeling: TODDLER "is" EMOTION "!"
TODDLER: "beverly" | "baggy" | "you"
ARTICLE: "the" | "a" | "an" | "un" | "una"
NOUN: "ball" | "elmo" | "chalk" | "gusano"
EMOTION: "sad" | "mad" | "tired"
%ignore /\s+/
What sentences can that parse?
Try in code.cs61a.org!
Where is BNF used?
You will likely use your BNF reading skills more than your BNF writing skills.
start: calc_expr
?calc_expr: NUMBER | calc_op
calc_op: "(" OPERATOR calc_expr* ")"
OPERATOR: "+" | "-" | "*" | "/"
%ignore /\s+/
%import common.NUMBER
What expressions can that parse?
Try in code.cs61a.org!
A syntax diagram is a common way to represent BNF & other context-free grammars. Also known as railroad diagram.
calc_expr: NUMBER | calc_op
| |
calc_op: '(' OPERATOR calc_expr* ')'
| |
OPERATOR: '+' | '-' | '*' | '/'
|
Adapted from the Python docs:
?start: integer
integer: decinteger | bininteger | octinteger | hexinteger
decinteger: nonzerodigit digit*
bininteger: "0" ("b" | "B") bindigit+
octinteger: "0" ("o" | "O") octdigit+
hexinteger: "0" ("x" | "X") hexdigit+
nonzerodigit: /[1-9]/
digit: /[0-9]/
bindigit: /[01]/
octdigit: /[0-7]/
hexdigit: digit | /[a-f]/ | /[A-F]/
What number formats can that parse?
Try in code.cs61a.org!
decinteger: nonzerodigit digit*
| |
hexinteger: "0" ("x" | "X") hexdigit+
| |
hexdigit: digit | /[a-f]/ | /[A-F]/
| |
digit: /[0-9]/
|
Adapted from the Scheme docs:
?start: expression
expression: constant | variable | "(if " expression expression expression? ")" | application
constant: BOOLEAN | NUMBER
variable: identifier
application: "(" expression expression* ")"
identifier: initial subsequent* | "+" | "-" | "..."
initial: LETTER | "!" | "$" | "%" | "&" | "*" | "/" | ":" | "<" | "=" | ">" | "?" | "~" | "_" | "^"
subsequent: initial | DIGIT | "." | "+" | "-"
LETTER: /[a-zA-z]/
DIGIT: /[0-9]/
BOOLEAN: "#t" | "#f"
%import common.NUMBER
%ignore /\s+/
*This BNF does not include many of the special forms, for simplicity.
expression: constant | variable | "(if " expression expression expression? ")" | application
| |
application: "(" expression expression* ")"
| |
identifier: initial subsequent* | "+" | "-" | "..."
|