Homework 4: Scanners and Patterns

Introduction

In this homework, you'll learn some things that we won't talk about in lecture: classes and methods dedicated to searching strings for selected patterns and for reading formatted input.

Warning: There's a lot of jargon for this homework. Sorry! Focus on the experiments and writing code and it should all come together.

B. Scanners

Intro to Scanners

If you're already familiar with scanners, you can skip straight to the tasks below.

The class java.util.Scanner gives you a way to read substrings of text, also called tokens, sequentially from a stream of text that is furnished to the Scanner by its constructor. Typically, the stream of text comes from a file or from a terminal, but there are ways to convert any source of characters into a stream that a Scanner can process.

One of Scanner's constructors accepts an InputStream—a stream of bytes (8-bit characters). Since System.in, which is normally the standard input stream to your program, is a kind of InputStream (that is, its type is a subtype of InputStream), you can write

java.util.Scanner inp = new java.util.Scanner(System.in);

to get something that scans the input from your terminal. (Normally, of course, you'd put import java.util.Scanner; at the beginning of your source file and just write Scanner instead of java.util.Scanner).

One can also create Scanners from Strings (e.g. Scanner s = new Scanner("hello");), input files, and more.

The simplest way to use a Scanner is treat the input stream as a sequence of tokens separated by text that matches a delimiter pattern. By default, the delimiter pattern matches stretches of whitespace (blanks, tabs, newlines, carriage returns). For example, consider the input below:

hello i am a half human half

horse

In this case, the tokens separated by our delimiter pattern are "hello", "i", "am", "a", "half", "human", "half", and "horse".

Delimiter patterns can be arbitrary, for example if our delimiter pattern were stretches of ; and , then the string:

hello; i am just a, horse

Would yield the tokens "hello", " i am just a", and " horse".

Open up ReadInts.java and TestReadInts.java. ReadInts provides one complete method printInts and two incomplete methods testReadInts and testSmartReadInts. TestReadInts calls all three methods, testing as appropriate.

Experiment #1: TestReadInts

Try running TestReadInts. Look at TestReadInts.java, and you'll see the contents of the input String inp are printed out as you'd expect by the test. Yay, the provided code works.

Next open up ReadInts.java and look at the source of the printInts method. You'll see that it uses the hasNext() and nextInt() methods.

These methods, along with their most common brethren are described below.

Programming Tasks

Task 1: readInts()

When you run make check, you'll see that your code fails in readInts. Head to the readInts method, and you'll see that one line of code is missing.

Using the documentation for the ArrayList class, figure out how to modify the code so that the readInts method works correctly.

Before you move on, observe that the TestReadInts test feeds your method a bad input, and actually expects an exception. It is possible that you'll find this idea useful for Project 1 (though you are not expected or required to use exception catching tests). For those of you who are particularly keen on the idea of testing exceptions, Junit supports a less unwieldy syntax based on the @Rule tag that you're free to use.

Task 2: smartReadInts()

Complete the smartReadInts method so that it works as described in the comments and TestReadInts. Use the readInts method as a guide.

C. Patterns

Your code for part 1 looked through input token by token, accepting tokens that were integers. How does Java know if a string represents an integer? As you might imagine, it looks for sequences containing only the digits 0123456789.

What if we want to match, for example, numbers that only contain numbers less than 5? Or five digit numbers only?

Java provides a faculty for this known as Pattern Matching, and supports a rich syntax for specifying patterns. For example, numbers less than 5 could be expressed by the pattern "[01234]". Five digit numbers could be expressed by "[0-9][0-9][0-9][0-9][0-9]" or "[0-9]{5}".

These patterns are sometimes referred to as regular expressions, though strictly speaking, they're a somewhat different animal than the formal notion of a regular expression that might be discussed, for example, in an upper division CS class in your future.

The full pattern language is quite rich, and is documented under java.util.regex.Pattern in the on-line Java library documentation. Here are just a few that you might find particularly useful. You are not expected to read this entire list for this homework. However, you may find it useful moving forward in 61B.

Experiment #2: Matching

Compile and run the Matching class. This class allows you to type in strings and patterns and see if the entire string matches the pattern. If you include any groups (read ahead if you're curious), it will also print those. For example:

$ java Matching
  Alternately strings to match and patterns to
  match against them.  Use \ at the end of line
  to enter multi-line strings or patterns (\
  are removed, leaving newlines).
  The program will indicate whether each pattern matches
  the ENTIRE preceding string.
  Enter QUIT to end the program.
String: 123456
Pattern: [0-9]{6}
Matches.
String: 123456
Pattern: [0-9]{5}
No match.
String: 12345
Pattern: [0-9]{6}
No match.
String: abdeffff
Pattern: ab(c|de)f+
Matches.
  Group 1: 'de'
String: abbbbdefefgg*h
Pattern: a(b+)d(ef)+gg\*h
Matches.
  Group 1: 'bbbb'
  Group 2: 'ef'
String: QUIT

Use this class to experiment with how patterns work. Try writing patterns that match the following. Sample answers are given for each problem (drag the mouse over the white area after "Answer:" to see it).

Programming task

Open P2Pattern.java, and fill in the string with a pattern that matches lists of non-negative (>= 0) integers in the notation we used in homework 2 (e.g. (1, 2, 33, 1, 63)). Each number should be followed by a comma then any number of spaces.

Run P2PatternTester to verify that your pattern is correct.