- Presentation of your programming language
- Theoretical Background: Lexical Analysis
- Alphabets, Languages, Regular Expressions
- Deterministic Finite Automata (DFA), Non-deterministic Finite Automate (NFA),
$\epsilon$ -NFA
- RegEx exercises
- Intro: Antlr
To warm up a bit, here are some exercises to get used into writing regular expressions. Use https://regex101.com/ to debug your regexes.
- Write a regex to match all the strings containing the sequence "cat".
Test Input:
concatenate
catching
category
caterpillar
- Write a regex to match strings that start with a digit.
Test Input:
123abc
456xyz
789def
0abc
- Write a regex to match any string that contains only uppercase letters.
Test input:
HELLO
World
Rust
zig
JAVA
- Write a regex to match any string that contains exactly 3 digits.
Test input:
69 # no match
123 # match
hello123 # match
4444 # no match
12hei3 # match
12heiho45 # no match
- Write a regex to match any string that contains at least one digit.
Test input:
world1 # match
world # no match
1hallllooooo0 # match
- Extract dates in "YYYY-MM-DD" format from the given text.
Test input:
The event is scheduled on 2022-11-15.
The deadline is 2023-05-20.
Birthday: 2021-01-01.
Expected: 2024-12-25.
- Write a regex to match valid email addresses.
Test input:
[email protected]
[email protected]
[email protected]
[email protected]
@google.com # invalid
hvl@ # invalid
mail@google # invalid
[email protected] # invalid
- Write a regex to match valid IP address (IPv4).
Test input:
192.168.1.1
255.255.255.0
127.0.0.1
10.0.0.1
0.0.0.0
388.123.0.1 # invalid
127.0.1 # invalid
192..1 # invalid
- Install ANTLR, see Instructions
- Create a
<YourLanguageName>Tokens.g4
file and then write yourlexer grammar
rules in it (UPPERCASE names!), see Notes for further references. - For testing your lexer rules, you can use the
TestRig
, i.e.:- after each iteration you compile the Lexer class
javac *.java
(make sure that the antlr jar is on your classpath), - then run the
TestRig
in token mode:
the last parameter ("your language file") is optional. If it is not provided the TestRig reads from standard input, i.e. you can write your tokens directly in there (terminated byantdbg «YourLanguagName»Tokens tokens -tokens «your language file»
CTRL-Z
in POSIX andCTRL-D
in Windows.) - after each iteration you compile the Lexer class