CSCE 221 Chapter 8

From Notes
Jump to navigation Jump to search

« previous | Tuesday, February 22, 2011 | next »


Dictionaries and Hashing

  • A searchable collection of key-element items.
  • Multiple items allowed same key

Functions

Function Description
int size() Number of items in dictionary
bool isEmpty() Whether dictionary is empty
ObjectIterator elements() Return elements stored in dictionary
ObjectIterator keys() Return keys stored in dictionary
Position find(k) Return position of an item with key equal to k
PositionIterator findAll(k) Returns an iterator of all keys that match k
void insertItem(k, o) Insert element o into the dictionary with key k
void removeElement(k) Remove an element with key equal to k
void removeAllElements(k) Removes all elements with key equal to k


Log File

Dictionary implementation with an unordered sequence

  • insertItem() is
  • find() and removeElement() are

Good for insertion but not for retrieval (good for logging access or queries)


Direct Address Table

Functions similar to Vector in which the index can be used

Keys are and are not allowed to be repeated

insertItem(), find(), and removeElement() are


Hash Tables

Keys associated as "addresses"

Bucket Array

Main array of size where cells act like containers for elements. Keys from hash function h() ranges from

Transform key to index that will have nice properties

Avoid collisions where some keys map to same index of H

Chaining

have a singly-linked list come off of every position in the table. As keys fill up, collisions result in the item being added to the tail of the linked list. (Requires space)

Open addressing

Store in another cell of table. For deletion, mark item as inactive to show that an item used to be there.

Linear probing
Put item in next slot over until it runs into an open slot
possible probe sequences
Quadratic probing
Look in i^2 slots
possible probe sequences
Double hashing
use a second hash function that places item sin the next available cell using
, otherwise secondary hash might not find an empty cell. In other words, has to be prime.
secondary compression function where is prime.
load factor (how full the table is
uniform hashing assumption: assuming hash values are like random numbers, expected number of probes for insertion is
possible probe sequences


Hash Function

Composition of 2 functions such that :

  1. Hash code map:
  2. Compression map:

Goal is to disperse keys randomly and evenly

Modular Arithmetic is used heavily

  • Division:
  • Multiply, Add, Divide:


Ordered Dictionary

Tuesday, March 1, 2011

Additional functions:

  • closestBefore(k) return position of item with largest key ≤
  • closestAfter(k) return position of item with smallest key ≥

Lookup Table

Dictionary implemented using a sorted sequence Performance:

find()
insertItem()
removeElement()

Binary Search

Must be given random access to sorted sequence in order for this to work.

  1. Look at middle item
  2. if middle is what we're looking for, return it
  3. if middle <, recurse on right
  4. if middle >, recurse on left


Skip List

Bunch of lists with each "level" list containing a random subset of the level before it Special keys and

Searching

Find :

  1. Start at
  2. compare with next position
    1. : return element(after(current))
    2. : scan forward (current = after(current))
    3. : drop down (current = below(current))
  3. If we drop down past bottom list, key does not exist

Randomized Algorithms

Digital "coin tosses"

Insertion

  1. "flip coin" until we get "tails" (let # flips = )
  2. if (height of skip list), add new lists
  3. search for in the skip list and find positions of items with largest key less than in each list
  4. Insert entries into each list before positions found earlier

Analysis

Everything depends on random bits used by each insert

Known "facts":

  • Probability of getting consecutive heads is
  • If each of items is present in a set with probability , expected size of set is
  • If each of events has a probability , the probability that at least one event occurs is at most

Expected space usage is

Expected height is

Summary

Dictionary insertItem() find() and deleteItem() space
Log File
Direct Address Table (unrelated to )
Hash Table (unsorted chaining) w.c.
a.c.
b.c.
Hash Table (sorted chaining) w.c.
a.c.
b.c.
w.c.
a.c.
b.c.
Hashing (open addressing)

Hash Codes

Questions

  • I was never able to fully grasp the concept of bitwise operators. Can we review?
  • What is the difference between high-order and low-order bits?