Homework 4: Scanners and Patterns

Introduction

In this homework, you'll learn some things that we won't talk about in lecture: classes and methods dedicated to searching strings for selected patterns and for reading formatted input.

Warning: There's a lot of jargon for this homework. Sorry! Focus on the experiments and writing code and it should all come together.

As usual, you can obtain the skeleton with

$ git fetch shared
$ git merge shared/hw4
$ git push

and submit it by committing all work, tagging, pushing, and pushing tags.

B. Scanners

Intro to Scanners

If you're already familiar with scanners, you can skip straight to the tasks below.

The class java.util.Scanner gives you a way to read substrings of text, also called tokens, sequentially from a stream of text that is furnished to the Scanner by its constructor. Typically, the stream of text comes from a file or from a terminal, but there are ways to convert any source of characters into a stream that a Scanner can process.

One of Scanner's constructors accepts an InputStream--a stream of bytes (8-bit characters). Since System.in, which is normally the standard input stream to your program, is a kind of InputStream (that is, its type is a subtype of InputStream), you can write

java.util.Scanner inp = new java.util.Scanner(System.in);

to get something that scans the input from your terminal. (Normally, of course, you'd put import java.util.Scanner; at the beginning of your source file and just write Scanner instead of java.util.Scanner).

One can also create Scanners from Strings (e.g. Scanner s = new Scanner("hello");), input files, and more.

The simplest way to use a Scanner is to treat the input stream as a sequence of tokens separated by text that matches a delimiter pattern. By default, the delimiter pattern matches stretches of whitespace (blanks, tabs, newlines, carriage returns). For example, consider the input below:

hello i am a half human half

horse

In this case, the tokens separated by our delimiter pattern are "hello", "i", "am", "a", "half", "human", "half", and "horse".

Delimiter patterns can be arbitrary. For example, if our delimiter pattern were stretches of ";" and "," characters then the string:

hello; i am just a, horse

Would yield the tokens "hello", " i am just a", and " horse".

Look at ReadInts.java and TestReadInts.java. ReadInts provides one complete method printInts and two incomplete methods readInts and smartReadInts. TestReadInts calls all three methods, testing as appropriate.

Experiment #1: TestReadInts

Try running TestReadInts. You should see that testReadInts and testSmartReadInts fail, but testPrintInts will print out the contents of the input String inp as you would expect. Yay, the provided code works.

Next take a look at ReadInts.java and look at the source of the printInts method. You'll see that it uses the hasNext() and nextInt() methods from the Scanner s object created.

These methods, along with their most common brethren are described below.

Programming Tasks

Task 1: readInts()

As above, when you run TestReadInts, you'll see that your code fails in readInts. Head to the readInts method, and you'll see that one line of code is missing.

Using the documentation for the ArrayList class, figure out how to modify the code so that the readInts method works correctly.

The testReadInts test feeds your method a bad input, and actually expects an exception. For those of you who are particularly keen on the idea of testing exceptions, Junit supports a less unwieldy syntax based on the @Rule tag, which you're free to use.

Task 2: smartReadInts()

Complete the smartReadInts method so that it works as described in the comments and TestReadInts. Use the readInts method as a guide.

C. Patterns

Your code in the previous part looked through input token by token, accepting tokens that were integers. How does Java know if a string represents an integer? As you might imagine, it looks for sequences that contain only the digits 0-9, possibly preceded by a - symbol.

What if we want to match, for example, numerals that only contain digits less than 5? Or five-digit numerals only?

Java provides a faculty for this known as Pattern Matching, and supports a rich syntax for specifying patterns. For example, one-digit numerals less than 5 could be expressed by the pattern "[01234]". Five-digit numberals could be expressed by "[0-9][0-9][0-9][0-9][0-9]" or "[0-9]{5}".

These patterns are sometimes referred to as regular expressions, though strictly speaking, they're more general than the formal notion of a regular expression that might be discussed, for example, in an upper-division CS class in your future.

The full pattern language is quite rich, and is documented under java.util.regex.Pattern in the on-line Java library documentation. Here are just a few constructs that you might find particularly useful. You are not expected to read this entire list for this homework. However, you may find it useful moving forward in 61B.

Experiment #2: Matching

Compile and run the Matching class. This class allows you to type in strings and patterns and see if the entire string matches the pattern. If you include any groups (read ahead if you're curious), it will also print those. For example:

$ java Matching
  Alternately type strings to match and patterns to match against
  them. Use \ at the end of line to enter multi-line strings or
  patterns (\s are removed, leaving newlines).  The program
  will indicate whether each pattern matches the ENTIRE
  preceding string.  Enter QUIT to end the program.
String: 123456
Pattern: [0-9]{6}
Matches.
String: 123456
Pattern: [0-9]{5}
No match.
String: 12345
Pattern: [0-9]{6}
No match.
String: abdeffff
Pattern: ab(c|de)f+
Matches.
  Group 1: 'de'
String: abbbbdefefgg*h
Pattern: a(b+)d(ef)+gg\*h
Matches.
  Group 1: 'bbbb'
  Group 2: 'ef'
String: QUIT

Use this class to experiment with how patterns work. Try writing patterns that match the following. Sample answers are given for each problem (drag the mouse over the white area after "Answer:" to see it).

To get more practice with writing regular expressions check out RegExr or regular expressions 101. These sites use plain regular expressions rather than Java patterns. which differ slightly as we have discussed above. They are still a great way to build more familiarity with regular expressions, which as we have mentioned, have many different applications involving string matching across multiple different programming languages.

Programming task

In P2Pattern.java, fill in the string with a pattern that matches lists of non-negative numerals in the notation we used in homework 2 (e.g. (1, 2, 33, 1, 63)). Each numeral but the last should be followed by a comma and one or more spaces.

Run TestP2Pattern to verify that your pattern is correct.