Homework 4: Scanners and Patterns

A. Introduction

In this homework, you'll learn some things that we won't talk about in lecture: classes and methods dedicated to searching strings for selected patterns and for reading formatted input.

Warning: There's a lot of jargon for this homework. Sorry! Focus on the experiments and writing code and it should all come together. We always suggest you read the spec, but you will be more confused than normal if you skip this reading! For this homework, we've also created an introductory video. It will be mostly parallel with the spec, and we strongly recommend you watch it. It even gives a generous "hint" for the second half of the homework...

As usual, you can obtain the skeleton with

$ git fetch shared
$ git merge shared/hw4
$ git push

and submit it by adding and committing all work, tagging, pushing, and pushing tags.

B. Scanners

Intro to Scanners

If you're already familiar with scanners, you can skip straight to the tasks below.

The class java.util.Scanner gives you a way to read substrings of text, also called tokens, sequentially from a stream of text that is furnished to the Scanner by its constructor. Typically, the stream of text comes from a file or from a terminal, but there are ways to convert any source of characters into a stream that a Scanner can process.

One of Scanner's constructors accepts an InputStream--a stream of bytes (8-bit characters). Since System.in, which is normally the standard input stream to your program, is a kind of InputStream (that is, its type is a subtype of InputStream), you can write

java.util.Scanner inp = new java.util.Scanner(System.in);

to get something that scans the input from your terminal. (Normally, of course, you'd put import java.util.Scanner; at the beginning of your source file and just write Scanner instead of java.util.Scanner).

One can also create Scanners from Strings (e.g. Scanner s = new Scanner("hello");), input files, and more.

The simplest way to use a Scanner is to treat the input stream as a sequence of tokens separated by text that matches a delimiter pattern. By default, the delimiter pattern matches stretches of whitespace (blanks, tabs, newlines, carriage returns). For example, consider the input below:

hello i am a half human half

horse

In this case, the tokens separated by our delimiter pattern are "hello", "i", "am", "a", "half", "human", "half", and "horse".

Delimiter patterns can be arbitrary. For example, if our delimiter pattern were stretches of ";" and "," characters then the string:

hello; i am just a, horse

Would yield the tokens "hello", " i am just a", and " horse".

Scanners may be reminding you of Readers, which you saw last homework. The main difference is that a Reader simply reads a stream of characters without any sort of processing, while a Scanner is reading in a stream and parsing it into tokens.

Look at ReadInts.java and TestReadInts.java. ReadInts provides one complete method printInts and two incomplete methods readInts and smartReadInts. TestReadInts calls all three methods, testing as appropriate.

Experiment #1: TestReadInts

Try running TestReadInts. You should see that testReadInts and testSmartReadInts fail, but testPrintInts will print out the contents of the input String inp as you would expect. Yay, the provided code works.

Next take a look at ReadInts.java and look at the source of the printInts method. You'll see that it uses the hasNext() and nextInt() methods from the Scanner s object created.

These methods, along with their most common brethren are described below.

Programming Tasks

Task 1: readInts()

As above, when you run TestReadInts, you'll see that your code fails in readInts. Head to the readInts method, and you'll see that one line of code is missing.

Using the documentation for the ArrayList class, figure out how to modify the code so that the readInts method works correctly.

The testReadInts test feeds your method a bad input, and actually expects an exception. For those of you who are particularly keen on the idea of testing exceptions, JUnit supports a less unwieldy syntax based on the @Rule tag, which you're free to use.

Task 2: smartReadInts()

Complete the smartReadInts method so that it works as described in the comments and TestReadInts. Use the readInts method as a guide.

C. Patterns

Your code in the previous part looked through input token by token, accepting tokens that were integers. How does Java know if a string represents an integer? As you might imagine, it looks for sequences that contain only the digits 0-9, possibly preceded by a - symbol.

What if we want to match, for example, numerals that only contain digits less than 5? Or five-digit numerals only?

Java provides a faculty for this known as Pattern Matching, and supports a rich syntax for specifying patterns. For example, one-digit numerals less than 5 could be expressed by the pattern "[01234]". Five-digit numberals could be expressed by "[0-9][0-9][0-9][0-9][0-9]" or "[0-9]{5}".

These patterns are sometimes referred to as regular expressions (sometimes referred to as RegEx), though strictly speaking, they're more general than the formal notion of a regular expression that might be discussed, for example, in an upper-division CS class in your future.

The full pattern language is quite rich, and is documented under java.util.regex.Pattern in the on-line Java library documentation. Here are just a few constructs that you might find particularly useful. You are not expected to read this entire list for this homework. However, you may find it useful moving forward in 61B.

Experiment #2: Matching

Compile and run the Matching class. This class allows you to type in strings and patterns and see if the entire string matches the pattern. If you include any groups (read ahead if you're curious), it will also print those. You can run it like this:

$ java Matching
  Alternately type strings to match and patterns to match against
  them. Use \ at the end of line to enter multi-line strings or
  patterns (\s are removed, leaving newlines).  The program
  will indicate whether each pattern matches the ENTIRE
  preceding string.  Enter QUIT to end the program.
String: 123456
Pattern: [0-9]{6}
Matches.
String: 123456
Pattern: [0-9]{5}
No match.
String: 12345
Pattern: [0-9]{6}
No match.
String: abdeffff
Pattern: ab(c|de)f+
Matches.
  Group 1: 'de'
String: abbbbdefefgg*h
Pattern: a(b+)d(ef)+gg\*h
Matches.
  Group 1: 'bbbb'
  Group 2: 'ef'
String: QUIT

Use this class to experiment with how patterns work. Try writing patterns that match the following. Sample answers are given for each problem (drag the mouse over the white area after "Answer:" to see it).

To get more practice with writing regular expressions check out RegExr or regular expressions 101. On the regex101 site, note that you should switch the "flavor" on the left to Java8. Additionally, this site doesn't require you to double escape your backslashes, so be careful there. Overall, both sites differ slightly from the type of Java patterns we will be writing. They are still a great way to build more familiarity with regular expressions, which as we have mentioned, have many different applications involving string matching across multiple different programming languages.

Programming task

In P2Pattern.java, you are given 5 String variables named P1, P2, P3, P4 and P5. You are supposed to write regular expression as per the directions. You must complete 3 of the 5 patterns for full credit (though we recommend trying them all for practice). Don't forget to use the escape character twice (\\) wherever you need a backslash (\) in regular expression.

For all of the below, we STRONGLY recommend looking at TestP2Pattern.java to see what potential edge cases there are! Using a tool like an online regex tester can be helpful while building your patterns.

1. For P1:

2. For P2:

3. For P3:

4. For P4:

5. For P5:

Run TestP2Pattern.java to verify that your patterns are correct.

D. Submission

The files you will be turning are:

Make sure that your regular expression accepts valid strings and rejects invalid ones. Confirm that you pass all the tests defined in TestReadInts.java and TestP2Pattern.java before submission for full credit.

Don't forget to push both your commits and tags for your final submission. As a reminder, you can push your tags by running:

$ git push --tags