CSCE 221 Chapter 8

« previous | Tuesday, February 22, 2011 | next »

Dictionaries and Hashing

A searchable collection of key-element items.
Multiple items allowed same key

Functions

Function	Description
`int size()`	Number of items in dictionary
`bool isEmpty()`	Whether dictionary is empty
`ObjectIterator elements()`	Return elements stored in dictionary
`ObjectIterator keys()`	Return keys stored in dictionary
`Position find(k)`	Return position of an item with key equal to `k`
`PositionIterator findAll(k)`	Returns an iterator of all keys that match `k`
`void insertItem(k, o)`	Insert element `o` into the dictionary with key `k`
`void removeElement(k)`	Remove an element with key equal to `k`
`void removeAllElements(k)`	Removes all elements with key equal to `k`

Log File

Dictionary implementation with an unordered sequence

insertItem() is $O(1)$
find() and removeElement() are $O(n)$

Good for insertion but not for retrieval (good for logging access or queries)

Direct Address Table

Functions similar to Vector in which the index can be used

Keys are $\{0,1,\ldots ,N\}$ and are not allowed to be repeated

insertItem(), find(), and removeElement() are $O(1)$

Hash Tables

Keys associated as "addresses"

Bucket Array

Main array of size $N$ where cells act like containers for elements. Keys from hash function h() ranges from $H[0\ldots N]$

Transform key to index that will have nice properties

Avoid collisions where some keys map to same index of H

Chaining

have a singly-linked list come off of every position in the table. As keys fill up, collisions result in the item being added to the tail of the linked list. (Requires $O(N+n)$ space)

Open addressing

Store in another cell of table. For deletion, mark item as inactive to show that an item used to be there.

Linear probing

Put item in next slot over until it runs into an open slot

H(k,i)=(h(k)+i)\mod N

O(N)

possible probe sequences

Quadratic probing

Look in i^2 slots

H(k,i)=(h(k)+i^{2})\mod N

O(N)

possible probe sequences

Double hashing

use a second hash function

h_{2}(k)

that places item sin the next available cell using

h(k,i)=(h_{1}(k)+i\cdot h_{2}(k))\mod N\quad i=[0,N-1]

N\mod d(k)\neq 0

, otherwise secondary hash might not find an empty cell. In other words,

N

has to be prime.

secondary compression function

d(k)=q-k\mod q

where

q

is prime.

load factor (how full the table is

\alpha =n/N

uniform hashing assumption: assuming hash values are like random numbers, expected number of probes for insertion is

{\frac {1}{1-\alpha }}

O(N^{2})

possible probe sequences

Hash Function

Composition of 2 functions such that $h(x)=h_{2}(h_{1}(x))$ :

Hash code map: $h_{1}:{\mbox{keys}}\rightarrow {\mbox{integers}}$
Compression map: $h_{2}:{\mbox{integers}}\rightarrow [0,N-1]$

Goal is to disperse keys randomly and evenly

Modular Arithmetic is used heavily

Division: $h(k)=|k|\mod N$
Multiply, Add, Divide: $h(k)=|ax+b|\mod N$

Ordered Dictionary

Tuesday, March 1, 2011

Additional functions:

closestBefore(k) return position of item with largest key ≤ $k$
closestAfter(k) return position of item with smallest key ≥ $k$

Lookup Table

Dictionary implemented using a sorted sequence Performance:

find()	$O(\log n)$
insertItem()	$O(n)$
removeElement()	$O(n)$

Binary Search

Must be given random access to sorted sequence in order for this to work.

Look at middle item
if middle is what we're looking for, return it
if middle <, recurse on right
if middle >, recurse on left

Skip List

Bunch of lists with each "level" list containing a random subset of the level before it Special keys $-\infty$ and $\infty$

Searching

Find $x$ :

Start at $-\infty$
compare $x$ $x$ with next position $y$ $y$
1. $x=y$ : return element(after(current))
2. $x>y$ : scan forward (current = after(current))
3. $x<y$ : drop down (current = below(current))
If we drop down past bottom list, key does not exist

Randomized Algorithms

Digital "coin tosses"

Insertion

"flip coin" until we get "tails" (let # flips = $i$ )
if $i\geq h$ (height of skip list), add new lists $S_{h+1}\ldots S_{i+1}$
search for $x$ in the skip list and find positions of items with largest key less than $x$ in each list
Insert $i$ entries into each list before positions found earlier

Analysis

Everything depends on random bits used by each insert

Known "facts":

Probability of getting $i$ consecutive heads is $1/2^{i}$
If each of $n$ items is present in a set with probability $p$ , expected size of set is $np$
If each of $n$ events has a probability $p$ , the probability that at least one event occurs is at most $np$

Expected space usage is $O(n)$

Expected height is $O(\log n)$

Summary

Dictionary	`insertItem()`	`find()` and `deleteItem()`	space
Log File	$O(1)$	$O(n)$	$O(n)$
Direct Address Table	$O(1)$	$O(1)$	$N$ (unrelated to $n$ )
Hash Table (unsorted chaining)	$O(1)$	$O(n)$ w.c. $O\left({\tfrac {n}{N}}\right)$ a.c. $O(1)$ b.c.	$O(N+n)$
Hash Table (sorted chaining)	$O(n)$ w.c. $O\left({\tfrac {n}{N}}\right)$ a.c. $O(1)$ b.c.	$O(n)$ w.c. $O\left({\tfrac {n}{N}}\right)$ a.c. $O(1)$ b.c.	$O(N+n)$
Hashing (open addressing)	$O\left({\frac {1}{1-{\frac {n}{N}}}}\right)$	$O\left({\frac {1}{1-{\frac {n}{N}}}}\right)$	$O(N)$

Hash Codes

Questions

I was never able to fully grasp the concept of bitwise operators. Can we review?
What is the difference between high-order and low-order bits?

CSCE 221 Chapter 8

Contents

Dictionaries and Hashing

Functions

Log File

Direct Address Table

Hash Tables

Bucket Array

Chaining

Open addressing

Hash Function

Ordered Dictionary

Lookup Table

Binary Search

Skip List

Searching

Randomized Algorithms

Insertion

Analysis

Summary

Hash Codes

Questions

Navigation menu

Search