CSCE 434 Lecture 34

« previous | Friday, November 15, 2013 | next »

Lecture Slides

Optimization

Instruction Scheduling

Reorder instructions to hide latencies (prevent from waiting; e.g. on store/load to/from memory)

assumes a fixed program
changes demand registers

Program behavior remains unchanged; the only difference is that the code runs faster. This fact is provable.

This can be done easily within a basic block.

Tree-matching or instruction matching

Assume we have enough registers or target "important" information into registers

Register Allocation

Ideally, we do not want to spill registers onto the stack.

Assume infinite number of registers at first.

Data Storage

Data bus size determines word length.

On that note, synchronization variables should be the length of one cache line.

Simple Expressions

In following code, newtemp() should always return a register.

If we run out of registers, it will pick a victim, tell it to get out, spill it to the stack, and proceed to use it. However, it will have to be put back when we are done using the register

int expr(Node *node)
{
    int result, t1, t2, t3;
    switch (node.type) {
        case PLUS:
            t1 = expr(node.left);
            t2 = expr(node.right);
            result = newtemp();
            emit(ADD, result, t1, t2);
            break;
        case ID:
            t1 = base(node.val);
            t2 = offset(node.val);
            t3 = newtemp();
            emit(ADD, t3, t1, t2);
            break;
        case NUM:
            result = newtemp();
            emit(LOADI, result, node.val);
            break;
    }
    return result;
}

Complex Expressions

Array References

First, agree to storage scheme because memory is linear (i.e. agree on linearization scheme):

row-major order: rightmost subscript varies fastest; A[1,1], A[1,2], A[1,3], A[2,1], A[2,2], A[2,3]
column-major order: leftmost subscript varies fastest; A[1,1], A[2,1], A[1,2], A[2,2], A[1,3], A[2,3]
indirection vectors: vector of pointer to pointers to ... to values; uses much more space

Computing an Address

A[i]

${\mbox{base}}(A)+(i-1)\cdot {\mbox{sizeof}}(elem)$
row-major order: ${\mbox{base}}(A)+((i_{1}-low_{1})\cdot (high_{2}-low_{2}+1)+i_{2}-low_{2})\cdot {\mbox{sizeof}}(elem)$ .
column-major order: ${\mbox{base}}(A)+((i_{2}-low_{2})\cdot (high_{1}-low_{1}+1)+i_{1}-low_{1})\cdot {\mbox{sizeof}}(elem)$

Expensive!