CSCE 411 Lecture 37
« previous | Monday, November 26, 2012 | next »
Multithreaded Algorithms
Measuring Performance
- work
- total time to execute the entire computation on a single processor
- equal to sum of times taken by each thread
- span
- longest time to execute threads along any path of computational DAG
Running time depends on number of processors and how scheduler allocats strands to processors:
- is running time on single processor (work)
- is running time on processors
- is running time for as many processors as needed (span)
Example
On Fibonacci example, assuming unit time for each thread,
- work = 17 time units
- span = 8 time units
Work Law
An ideal parallel computer with processors can do at most units of work. The total work to do is .
Span Law
A -processor ideal parallel computer cannot run faster than a machine with an unlimited number of processors.
However, a computer with unlimited processors can emulate a -processor machine by only using of its processors
Speedup and Parallelism
- speedup
- parallelism
Greedy Scheduler
Assigns as many strands to processors as possible
Vocabulary:
- complete step
- at least strands are ready to execute on processors
- incomplete step
- not a complete step.
Proof
In each complete step, processors perform a total of work. Assume number of complete steps exceeds , then total work is
CONTRADICTION: This exceeds the total work required by the computation.
In each incomplete step,
- let be the subgraph of the computation DAG that has yet to be executed at the start of an incomplete step, and
- let be the subgraph that will be remaining to be executed at the end of an incomplete step.
Longest path ind DAG must start at vertex with in-degree 0 (otherwise just follow in-edge to new start of longer path).
Greedy scheduler executes all nodes in with in-degree 0, and so the longest path in will be 1 less than the longest path in .
- In other words, an incomplete step reduces span of DAG by 1, so number of incomplete steps is at most .
Since each step must either be complete or incomplete, Q.E.D
Corollary
Greedy scheduler performance is within a factor of 2 of , the running time of an optimal scheduling:
Slackness
ratio of parellelism by :
If slackness is less than 1, we can't hope to achieve linear speedup
Work of Fibonacci
, where is the golden ratio.
Span:
Parallelism: