CSCE 315 Lecture 7

« previous | Friday, February 3, 2012 | next »

Database Implementation Issues

Typically databases are large and used by many people (simultaneously)

Specialized algorithms have been developed for Efficiency and Reliability

Terminology:

Relation → Table
Tuple → Record
Attribute → Field

Storing a Record

Records can often be assumed to have a fixed (maximum) length.
Records are often concatenated

WASTES A LOT OF SPACE!

Location of record fields are stored as number:

Name VARCHAR(100); 100+1 bytes
State CHAR(2); 2 bytes
YearsInSenate INT; 1 byte
Party VARCHAR(11); 11+1 bytes

Pointers do not have a known size. Additionally, dynamic allocation is slow

Common solution to wasted space: store fixed-length fields first and variable length fields afterwards. Keep a header for each record that stores "local pointers" to the variable fields

Records can then be grouped into blocks that correspond with a "unit" of disk/storage. When block overflows, fragment information is stored in record fragment headers (pointers to next/previous)

Addressing

2 types of addresses:

location in database (on disk)
location in memory (when cached)

Translation table maps items in virtual memory to the overall database. "Pointer swizzling" keeps these pointers up-to-date

Insertion, Deletion, and Modification

Insert a record into its appropriate block. When a block overflows, insertion-sort the records, storing half of each record on the original and new blocks.

Deleting can use "tombstone" to mark a deleted record

Modifying a fixed-length record is straightforward, but variable-length is difficult to keep track of memory allocation.

Indices

Special (often sorted) data structures to find all records that satisfy some condition

Usually one record per record, but sparse index can be used to find the record before and start searching sequentially

Multiple indices can be used in a Skip List format.

Reliability

Checksums
Mirroring Disks
Parity Bits
RAID levels