CompSci-61B Lecture Topic 7
Hash Tables

Ref: Carrano ch. 19 and 20

Data Structures That Are O(1)
size-independent (almost) add, getValue, contains, remove, and replace
hash tables as key-value dictionary implementations

How Can We Get O(1)?
eliminating (almost) iteration to locate an entry
and use an array -- no linked structures
each entry is assigned a unique location in an array
based on value of its key
"hashing" -- converting a key value into an array index
it’s “desired index”

Practical Problems Of Hashing
there has to be lots of empty space in the array
to accommodate the possible range of key values
"collisions" are possible -- different keys whose
“desired index” is the same!
the #of unique hash codes may exceed the #of
elements in the array (length)
"holes" in the array
no used/unused separation

Wrapping The Index
determination of desired index should have no
knowledge of array’s length
use the “wrapped index”:
wrapped index = desired index % array length
always in the range 0 to length-1?
except for negative hash codes
so if (wrapped index < 0)
wrapped index += array length
a handy private method: getWrappedIndex(K key)
get desired index from key’s hashcode
modulus with array length
if negative, add array length
return the int result
increases possibility of collisions...

Adding -- No Duplicates Allowed
first get “wrapped index”
if array location is not “occupied”, add
else if stored entry’s key matches,
replace the value: revert to “replace”
else collision! -- add fails...
”perfect” hash tables do not accommodate collisions
operation is O(1)

remove, getValue, and contains
first get “wrapped index”
if array location is not “occupied”, fail
else if stored entry’s key matches,
remove: set location to null & return value
getValue: return value
contains: return true
else not found -- operation fails
remove: return null
getValue: return null
contains: return false
operations are O(1)

clear
set all array values to null
easiest way: create new array

Being Full
defining “full”: next add may fail on account of
no space to store the entry
without a way to manage collisions, hash table
is “full” after one entry

Avoiding Collisions
make it less likely for separate desired indexes
to result in same wrapped index

let array length be a prime number
a handy private method: getNextPrime(int n)
if n is even, add one to it
start loop
if n is prime, exit loop
add 2 to n
end loop and return n
another handy private method: isPrime(int n)

judicious hash code calculations
key’s hashCode() inherited from Object
if app has too many collisions, extend key’s class
and override Object.hashCode()
detect by big oh testing
...or counting operations

still, collisions are not completely avoided!

Handling Collisions
overflowing into unused adjacent elements (probing)
stacking entries in a single array element (chaining)

Linear Probing
use unused adjacent index if wrapped index in use
”linear probing” -- traverse array from wrapped index
for (int i = 0; i < data.length; i++)
...data[(wrappedIndex + i) % data.length]...
...break if keys match...
operations are almost O(1) if array is mostly empty
worst case, O(n)
unsuccessful remove and contains are always O(n)

an improvement -- back to almost O(1)
no reason to traverse past a location if
nothing was ever stored there
because add would use that location
before using any after that one
need a way to distinguish bwtn never-used and previously-used
then traversals can stop at never-used
elements null if never used, “blank” if previously-used
store reference to blank Entry as private constant
set reference to this instead of null in remove
private final Entry BLANK;
BLANK = new Entry(); in constructor
BLANK’s key and value are null

Testing For Blanks
if (data[index] == BLANK)
if (data[index] == null || data[index] == BLANK)
if (data[index] != null && data[index] != BLANK)
setting to BLANK in remove:
data[index] = BLANK;

Load Factors And Array Expansion
for “blanks” to work in probing, array needs to remain sparse
solution: array expansion when array becomes “too full”
detecting “too full”: calculate “load factor”
#of used elements / array length
beware: integer division!
if load factor exceeds maximum allowable load factor,
double the array size

BUT do not use simple array copy, because
wrapped indexes are function of array size (rehashing)

AND make sure doubled size is prime -- seek next higher prime

Chaining
use array of MyLinkedLists of Entry references to “stack”
entries at their wrapped indexes
handling blanks:
array element null if never used,
empty list if previously used but now empty
although the distinction is not important
sequential search of MyLinkedList makes for imperfect O(1)

Java generics accommodations:
private Object[] data; // array of MyLinkedList<Entry>
data = new Object[size]; // in constructor and clear
apply type casting (pre-generics-style programming)
(MyLinkedList<Entry>)data[wrappedIndex]).add(...);
traversing a list at an index
for (Entry entry: (MyLinkedList<Entry>)data[index])
Iterators need to traverse data array and
each MyLinkedList -- use 2 tracking data members:
int index; // for data[index]
Iterator<Entry> it; //iterator inside data[index]

alternative: use java.util.LinkedList class instead of MyLinkedList
alternative: have no nulls -- fill array in constructor with empty lists
for (int i = 0; i < size; i++)
data[i] = new MyLinkedList<Entry>();

Big Oh Determinations
with good hash codes and 0.5 max load factor,
operations are appox O(1)
ops are so fast that timing tests are difficult
use repetitions that start/stop the clock
and accumulate elapsed time
remember to reset before each rep

Programmer-designed Classes As Keys
usually, Java integer and string classes
are sufficient for keys
but you may want to override hashCode()

class StudentId extends Integer
{
  public StudentId(int id)
  {
    super(id);
  }

  public int hashCode() // override Integer.hashCode()
  {
    return intValue() % 1000;
  }
}

OR special class: e.g., one that is a cell in a 2D or 3D table...
be sure to override Object.hashCode()...

[ Home | Contact Prof. Burns ]

CompSci-61B Lecture Topic 7Hash Tables

CompSci-61B Lecture Topic 7
Hash Tables