CSCE 411 Lecture 37

From Notes
Jump to navigation Jump to search
Lecture Slides

« previous | Monday, November 26, 2012 | next »


Multithreaded Algorithms

Measuring Performance

work
total time to execute the entire computation on a single processor
equal to sum of times taken by each thread
span
longest time to execute threads along any path of computational DAG

Running time depends on number of processors and how scheduler allocats strands to processors:

  • is running time on single processor (work)
  • is running time on processors
  • is running time for as many processors as needed (span)

Example

On Fibonacci example, assuming unit time for each thread,

  • work = 17 time units
  • span = 8 time units

Work Law

An ideal parallel computer with processors can do at most units of work. The total work to do is .

Span Law

A -processor ideal parallel computer cannot run faster than a machine with an unlimited number of processors.

However, a computer with unlimited processors can emulate a -processor machine by only using of its processors

Speedup and Parallelism

speedup
parallelism

Greedy Scheduler

Assigns as many strands to processors as possible

Vocabulary:

complete step
at least strands are ready to execute on processors
incomplete step
not a complete step.

Proof

In each complete step, processors perform a total of work. Assume number of complete steps exceeds , then total work is

CONTRADICTION: This exceeds the total work required by the computation.


In each incomplete step,

let be the subgraph of the computation DAG that has yet to be executed at the start of an incomplete step, and
let be the subgraph that will be remaining to be executed at the end of an incomplete step.

Longest path ind DAG must start at vertex with in-degree 0 (otherwise just follow in-edge to new start of longer path).

Greedy scheduler executes all nodes in with in-degree 0, and so the longest path in will be 1 less than the longest path in .

In other words, an incomplete step reduces span of DAG by 1, so number of incomplete steps is at most .

Since each step must either be complete or incomplete, Q.E.D

Q.E.D.

Corollary

Greedy scheduler performance is within a factor of 2 of , the running time of an optimal scheduling:

Slackness

ratio of parellelism by :

If slackness is less than 1, we can't hope to achieve linear speedup


Work of Fibonacci

, where is the golden ratio.

Span:

Parallelism: