CompSci-61B Lecture Topic 7
Hash Tables

Instructor: Prof. Robert Burns

Ref: Carrano ch. 19 and 20

Data Structures That Are O(1)
size-independent (almost) add, getValue, contains, remove, and replace
hash tables as key-value dictionary implementations

How Can We Get O(1)?
eliminating (almost) iteration to locate an entry
  and use an array -- no linked structures
each entry is assigned a unique location in an array
  based on value of its key
"hashing" -- converting a key value into an array index
  it’s “desired index”

Practical Problems Of Hashing
there has to be lots of empty space in the array
  to accommodate the possible range of key values
"collisions" are possible -- different keys whose
  “desired index” is the same!
the #of unique hash codes may exceed the #of
  elements in the array (length)
"holes" in the array
  no used/unused separation

Wrapping The Index
determination of desired index should have no
  knowledge of array’s length
use the “wrapped index”:
wrapped index = desired index % array length
always in the range 0 to length-1?
  except for negative hash codes
so if (wrapped index < 0)
  wrapped index += array length
a handy private method: getWrappedIndex(K key)
  get desired index from key’s hashcode
  modulus with array length
  if negative, add array length
  return the int result
increases possibility of collisions...

Adding -- No Duplicates Allowed
first get “wrapped index”
if array location is not “occupied”, add
else if stored entry’s key matches,
  replace the value: revert to “replace”
else collision! -- add fails...
  ”perfect” hash tables do not accommodate collisions
operation is O(1)

remove, getValue, and contains
first get “wrapped index”
if array location is not “occupied”, fail
else if stored entry’s key matches,
  remove: set location to null & return value
  getValue: return value
  contains: return true
else not found -- operation fails
  remove: return null
  getValue: return null
  contains: return false
operations are O(1)

clear
set all array values to null
easiest way: create new array

Being Full
defining “full”: next add may fail on account of
  no space to store the entry
without a way to manage collisions, hash table
  is “full” after one entry

Avoiding Collisions
make it less likely for separate desired indexes
  to result in same wrapped index

let array length be a prime number
a handy private method: getNextPrime(int n)
  if n is even, add one to it
  start loop
    if n is prime, exit loop
    add 2 to n
  end loop and return n
another handy private method: isPrime(int n)

judicious hash code calculations
  key’s hashCode() inherited from Object
if app has too many collisions, extend key’s class
  and override Object.hashCode()
  detect by big oh testing
  ...or counting operations

still, collisions are not completely avoided!

Handling Collisions
overflowing into unused adjacent elements (probing)
stacking entries in a single array element (chaining)

Linear Probing
use unused adjacent index if wrapped index in use
”linear probing” -- traverse array from wrapped index
for (int i = 0; i < data.length; i++)
  ...data[(wrappedIndex + i) % data.length]...
  ...break if keys match...
operations are almost O(1) if array is mostly empty
  worst case, O(n)
  unsuccessful remove and contains are always O(n)

an improvement -- back to almost O(1)
no reason to traverse past a location if
  nothing was ever stored there
  because add would use that location
  before using any after that one
need a way to distinguish bwtn never-used and previously-used
  then traversals can stop at never-used
elements null if never used, “blank” if previously-used
store reference to blank Entry as private constant
  set reference to this instead of null in remove
  private final Entry BLANK;
  BLANK = new Entry(); in constructor
  BLANK’s key and value are null

Testing For Blanks
if (data[index] == BLANK)
if (data[index] == null || data[index] == BLANK)
if (data[index] != null && data[index] != BLANK)
setting to BLANK in remove:
  data[index] = BLANK;

Load Factors And Array Expansion
for “blanks” to work in probing, array needs to remain sparse
solution: array expansion when array becomes “too full”
detecting “too full”: calculate “load factor”
  #of used elements / array length
    beware: integer division!
if load factor exceeds maximum allowable load factor,
  double the array size

BUT do not use simple array copy, because
  wrapped indexes are function of array size (rehashing)

AND make sure doubled size is prime -- seek next higher prime

Chaining
use array of MyLinkedLists of Entry references to “stack”
  entries at their wrapped indexes
handling blanks:
  array element null if never used,
  empty list if previously used but now empty
  although the distinction is not important
sequential search of MyLinkedList makes for imperfect O(1)

Java generics accommodations:
private Object[] data; // array of MyLinkedList<Entry>
data = new Object[size]; // in constructor and clear
apply type casting (pre-generics-style programming)
  (MyLinkedList<Entry>)data[wrappedIndex]).add(...);
traversing a list at an index
  for (Entry entry: (MyLinkedList<Entry>)data[index])
Iterators need to traverse data array and
  each MyLinkedList -- use 2 tracking data members:
  int index; // for data[index]
  Iterator<Entry> it; //iterator inside data[index]

alternative: use java.util.LinkedList class instead of MyLinkedList
alternative: have no nulls -- fill array in constructor with empty lists
  for (int i = 0; i < size; i++)
    data[i] = new MyLinkedList<Entry>();

Big Oh Determinations
with good hash codes and 0.5 max load factor,
  operations are appox O(1)
ops are so fast that timing tests are difficult
  use repetitions that start/stop the clock
  and accumulate elapsed time
  remember to reset before each rep

Programmer-designed Classes As Keys
usually, Java integer and string classes
  are sufficient for keys
but you may want to override hashCode()

class StudentId extends Integer
{
  public StudentId(int id)
  {
    super(id);
  }

  public int hashCode() // override Integer.hashCode()
  {
    return intValue() % 1000;
  }
}
OR special class: e.g., one that is a cell in a 2D or 3D table...
be sure to override Object.hashCode()...

[ Home | Contact Prof. Burns ]