Homework 4: Scanners and Patterns

## A. Introduction

In this homework, you'll learn some things that we won't talk about in lecture: classes and methods dedicated to searching strings for selected patterns and for reading formatted input.

Warning: There's a lot of jargon for this homework. Sorry! Focus on the experiments and writing code and it should all come together. We always suggest you read the spec, but you will be more confused than normal if you skip this reading!

As usual, you can obtain the skeleton with

$git fetch shared$ git merge shared/hw4
• The sequence (?m) always matches the empty string, but has a side effect of causing ^ and $ to match the beginnings and ends of lines as well as of entire strings. • The two-character escape sequences \?, \*, \., \+, etc., match the character after the backslash, ignoring their special significance. Thus, the pattern who\? matches the string "who?", and would be written in a Java program as the string literal "who\\?". ### Experiment #2: Matching Compile and run the Matching class. This class allows you to type in strings and patterns and see if the entire string matches the pattern. If you include any groups (read ahead if you're curious), it will also print those. For example: $ java Matching
Alternately type strings to match and patterns to match against
them. Use \ at the end of line to enter multi-line strings or
patterns (\s are removed, leaving newlines).  The program
will indicate whether each pattern matches the ENTIRE
preceding string.  Enter QUIT to end the program.
String: 123456
Pattern: [0-9]{6}
Matches.
String: 123456
Pattern: [0-9]{5}
No match.
String: 12345
Pattern: [0-9]{6}
No match.
String: abdeffff
Pattern: ab(c|de)f+
Matches.
Group 1: 'de'
String: abbbbdefefgg*h
Pattern: a(b+)d(ef)+gg\*h
Matches.
Group 1: 'bbbb'
Group 2: 'ef'
String: QUIT

Use this class to experiment with how patterns work. Try writing patterns that match the following. Sample answers are given for each problem (drag the mouse over the white area after "Answer:" to see it).

• A single digit between 5 and 8. Answer: [5-8].
• Sequences of lower case letters. Answer: [a-z]+
• Sequences of lower case letters except the letter j. Answer: [a-ik-z]+
• Sequences of characters that start with the uppercase letter A and end with the letter f. Answer: A.*f
• Sequences of three words separated by spaces, where a word is defined as a sequence of lower case letters. Answer: [a-z]+ +[a-z]+ +[a-z]+
• Sequences of three words separated by spaces, and where group 1 corresponds to the second word. Answer: [a-z]+ +([a-z]+) +[a-z]+

To get more practice with writing regular expressions check out RegExr or regular expressions 101. These sites use plain regular expressions rather than Java patterns, which differ slightly as we have discussed above. They are still a great way to build more familiarity with regular expressions, which as we have mentioned, have many different applications involving string matching across multiple different programming languages.

In P2Pattern.java, you are given 5 String variables named P1, P2, P3, P4 and P5. You are supposed to write regular expression as per the directions. Don't forget to use the escape character twice (\\) wherever you need a backslash (\) in regular expression.

1. For P1:

• Define a pattern that matches valid dates of the form MM/DD/YYYY
• For example, 12/25/2019 is a valid date but 25/12/2019 is not.
• Assume that MM ranges from 01-12, DD ranges from 01-31 and YYYY ranges from 1900 onwards.

2. For P2:

• Define a pattern that matches lists of non-negative numerals in the notation we used in homework 2 (e.g. (1, 2, 33, 1, 63)).
• Each numeral but the last should be followed by a comma and one or more spaces.

3. For P3:

• Define a pattern that matches a valid domain name.
• For example, www.support.ucb-login.com is a valid domain name (even if it doesn't really exist!)
• A valid domain name contains set of alphanumeric characters (i.e., a-z, A-Z), numbers (i.e. 0-9) and dashes (-) or a combination of all of these.
• It does not begin or end with dash (-) or period (.)
• It does not contain whitespace ( ) or underscore (_)
• Assume that the top-level domain (last part after period) is between 2 to 6 characters in length.

4. For P4:

• Define a pattern that matches a valid Java variable name
• For example, _myVariable$1 is a valid variable name in Java while 1stVariable is not. • A variable name cannot start with an integer. It can consist of alphanumeric characters as well as _ and $.

5. For P5:

• Define a pattern that matches valid IPv4 address.
• For example, 127.0.0.1 is a valid IP address whereas 299.10.10.1 is not.
• A valid IPv4 address consists of four positive integer parts separated by period (.). Each integer part can range from 0-255.

Run TestP2Pattern.java to verify that your patterns are correct.

## D. Submission

The files you will be turning are:

• ReadInts.java
• P2Pattern.java

Make sure that your regular expression accepts valid strings and rejects invalid ones. Confirm that you pass all the tests defined in TestReadInts.java and TestP2Pattern.java before submission for full credit.

Don't forget to push both your commits and tags for your final submission. As a reminder, you can push your tags by running:

\$ git push --tags