Homework 4: Scanners and Patterns


In this homework, you'll learn some things that we won't talk about in lecture: classes and methods dedicated to searching strings for selected patterns and for reading formatted input.

Warning: There's a lot of jargon for this homework. Sorry! Focus on the experiments and writing code and it should all come together.

As usual, you can obtain the skeleton with

$ git fetch shared
$ git merge shared/hw4
$ git push

and submit it by committing all work, tagging, pushing, and pushing tags.

B. Scanners

Intro to Scanners

If you're already familiar with scanners, you can skip straight to the tasks below.

The class java.util.Scanner gives you a way to read substrings of text, also called tokens, sequentially from a stream of text that is furnished to the Scanner by its constructor. Typically, the stream of text comes from a file or from a terminal, but there are ways to convert any source of characters into a stream that a Scanner can process.

One of Scanner's constructors accepts an InputStream--a stream of bytes (8-bit characters). Since System.in, which is normally the standard input stream to your program, is a kind of InputStream (that is, its type is a subtype of InputStream), you can write

java.util.Scanner inp = new java.util.Scanner(System.in);

to get something that scans the input from your terminal. (Normally, of course, you'd put import java.util.Scanner; at the beginning of your source file and just write Scanner instead of java.util.Scanner).

One can also create Scanners from Strings (e.g. Scanner s = new Scanner("hello");), input files, and more.

The simplest way to use a Scanner is to treat the input stream as a sequence of tokens separated by text that matches a delimiter pattern. By default, the delimiter pattern matches stretches of whitespace (blanks, tabs, newlines, carriage returns). For example, consider the input below:

hello i am a half human half


In this case, the tokens separated by our delimiter pattern are "hello", "i", "am", "a", "half", "human", "half", and "horse".

Delimiter patterns can be arbitrary. For example, if our delimiter pattern were stretches of ";" and "," characters then the string:

hello; i am just a, horse

Would yield the tokens "hello", " i am just a", and " horse".

Look at ReadInts.java and TestReadInts.java. ReadInts provides one complete method printInts and two incomplete methods readInts and smartReadInts. TestReadInts calls all three methods, testing as appropriate.

Experiment #1: TestReadInts

Try running TestReadInts. Look at TestReadInts.java, and you'll see the contents of the input String inp are printed out as you'd expect by the test. Yay, the provided code works.

Next take a look at ReadInts.java and look at the source of the printInts method. You'll see that it uses the hasNext() and nextInt() methods.

These methods, along with their most common brethren are described below.

Programming Tasks

Task 1: readInts()

When you run make check, you'll see that your code fails in readInts. Head to the readInts method, and you'll see that one line of code is missing.

Using the documentation for the ArrayList class, figure out how to modify the code so that the readInts method works correctly.

The TestReadInts test feeds your method a bad input, and actually expects an exception. For those of you who are particularly keen on the idea of testing exceptions, Junit supports a less unwieldy syntax based on the @Rule tag, which you're free to use.

Task 2: smartReadInts()

Complete the smartReadInts method so that it works as described in the comments and TestReadInts. Use the readInts method as a guide.

C. Patterns

Your code for part 1 looked through input token by token, accepting tokens that were integers. How does Java know if a string represents an integer? As you might imagine, it looks for sequences containing only the digits 0123456789, and possibly starting with a -.

What if we want to match, for example, numerals that only contain digits less than 5? Or five-digit numerals only?

Java provides a faculty for this known as Pattern Matching, and supports a rich syntax for specifying patterns. For example, one-digis numerals less than 5 could be expressed by the pattern "[01234]". Five-digit numberals could be expressed by "[0-9][0-9][0-9][0-9][0-9]" or "[0-9]{5}".

These patterns are sometimes referred to as regular expressions, though strictly speaking, they're more general than the formal notion of a regular expression that might be discussed, for example, in an upper-division CS class in your future.

The full pattern language is quite rich, and is documented under java.util.regex.Pattern in the on-line Java library documentation. Here are just a few constructs that you might find particularly useful. You are not expected to read this entire list for this homework. However, you may find it useful moving forward in 61B.

Experiment #2: Matching

Compile and run the Matching class. This class allows you to type in strings and patterns and see if the entire string matches the pattern. If you include any groups (read ahead if you're curious), it will also print those. For example:

$ java Matching
  Alternately type strings to match and patterns to
  match against them.  Use \ at the end of line
  to enter multi-line strings or patterns (\
  are removed, leaving newlines).
  The program will indicate whether each pattern matches
  the ENTIRE preceding string.
  Enter QUIT to end the program.
String: 123456
Pattern: [0-9]{6}
String: 123456
Pattern: [0-9]{5}
No match.
String: 12345
Pattern: [0-9]{6}
No match.
String: abdeffff
Pattern: ab(c|de)f+
  Group 1: 'de'
String: abbbbdefefgg*h
Pattern: a(b+)d(ef)+gg\*h
  Group 1: 'bbbb'
  Group 2: 'ef'
String: QUIT

Use this class to experiment with how patterns work. Try writing patterns that match the following. Sample answers are given for each problem (drag the mouse over the white area after "Answer:" to see it).

Programming task

In P2Pattern.java, fill in the string with a pattern that matches lists of non-negative numerals in the notation we used in homework 2 (e.g. (1, 2, 33, 1, 63)). Each numeral but the last should be followed by a comma and one or more spaces.

Run TestP2Pattern to verify that your pattern is correct.