CSCE 312 Lecture 2

From Notes
Jump to navigation Jump to search

« previous | Thursday, September 1, 2011 | next »


Great Realities of Computers

Precision

Ints are not integers, Floats are not Real Numbers

Only 32 bits to represent numbers:

  • Large Integers cause wraparound/overflow errors (i.e. is x2 ≥ 0? Not necessarily)
    • 40,000 * 40,000 = 1,600,000,000
    • 50,000 * 50,000 = ??
  • Floats of differing ranges cause truncation errors: (i.e. s (x + y) + z = x + (y + z)? Not for floats)
    • (1e20 + -1e20) + 3.14 = 3.14
    • 1e20 + (-1e20 + 3.14) = ??

Because of these limitations, computer arithmetic:

  • Cannot generate true random values
  • Sometimes doesn't follow usual mathematical rules


Assembly Language

You've got to know Assembly (even though you'll never write a program in it)

  • Be able to understand performance due to bugs
  • Über hackage! :D
  • Understand machine-level execution of higher-level code

Aside: Cycles and Scheduling

Using assembly to access PC may not give actual cycle count of program execution:

  • I/O done asynchronously through buffers
  • Processes handled alternatively by CPU depending on priority
  • Scheduling (i.e what process the CPU is currently focusing on) maintained by Kernel (OS-level)
Throughput
how many jobs the CPU can finish in 1 second (for example)
Response Time (Latency)
how fast a job can be completed

Memory Matters

Memory (unlike in theoretical Turing Machine) is not unbounded; we have a finite supply to allocate and manage.

Referencing bugs are nasty! C and C++ offer no memory protection for out-of-bounds, pointer errors, and memory leaks by default (code can access memory of any process!)

Memory Hierarchy (Top = fast, small, $$$$$; bottom = slow, abundant, $)

  1. Registers (L0)
  2. On-Chip Cache (L1; SRAM)
    • Note: SRAM chips are made from 6 transistors in circuit and are self-enhancing (no recharge required)
  3. Off-Chip Cache (L2; SRAM)
  4. Main Memory (L3; DRAM)
  5. Local Disks
  6. Network

Example: Taking Advantage of Memory Structure

Which code snippet is faster?

int key[1000];

int data[1000]; for (int i=0; i<1000; ++i) {

   int k = key[i];
   int d = data[i];
   // do something with k and d

}

struct pair {
   int key;
   int data;

}; pair data[1000]; for (int i=0; i<1000; ++i) {

   int k = data[i].key;
   int d = data[i].data;
   // do something with k and d

}

The first code snippet will grab a bunch of keys and store them in the cache, then immediately replace the cached keys with data. The code with the struct to bind data together keeps the key and data together. When the memory accesses the key element, it will also grab more information around it, including its data counterpart and store it in the cache.

Performance

There's more to performance than asymptotic complexity

Constant factors matter too!

True optimization requires knowledge of computer's inner workings.

Computer I/O

Computers do more than execute programs

Data Input/Output makes computers interactive and responsive

Network communications allow computers to even talk to each other, but introduces new potential problems:

  • concurrent operations by processes
  • unreliable communication/transfer media
  • platform compatibility


General Terms

Measuring Data

  • 1 bit = 0 or 1
  • 1 nibble = 4 bits
  • 8 bits = 2 nibbles = 1 byte
  • 4 bytes = 1 word
  • 1 kilobyte (KB) = 1024 (210) bytes
  • 1 megabyte (MB) = 1024 KB = 220 bytes
  • 1 gigabyte (GB) = 1024 MB = 230 bytes

Measuring Time

  • 1 ms = 10−3 sec
  • 1 µs = 10−6 sec
  • 1 ns = 10−9 sec
  • 1 ps = 10−12 sec


Model of a Computer

Computer Model.png
  • CPU: Central Processing Unit
  • ALU: Arithmetic/Logic Unit
  • PC: Program Counter
  • USB: Universal Serial Bus

Buses and Bridges

Hardware Bus
comparable to a bus in real-life (not a taxi)
makes scheduled round-trips
data knows when to get off
Hardware Bridge
where buses meet to transfer data to other routes

Main Memory

  • Temporary storage between permanent storage and CPU caches & registers
  • Holds both the program instructions and program data
  • Made of DRAM chips (each DRAM piece is a single transistor)
    • Note: DRAM 1 values must be "recharged" every 80 ms or they will lose capacitance and become 0s
  • Linear array of bytes

Processor

  • Interprets instructions that are stored in Main Memory
  • PC points to next instruction in memory
  • Processor does only one thing (at a very fast rate): execute the next instruction pointed to by the PC
  • Very simple instruction model defined by instruction set architecture (ISA)
    • Sample ISA's: 80x86/Pentium, PowerPC, DEC Alpha, MIPS, SPARC, HP, etc.
    • abstract interface between hardware and low-level software