CSCE 315 Lecture 7
« previous | Friday, February 3, 2012 | next »
Database Implementation Issues
Typically databases are large and used by many people (simultaneously)
Specialized algorithms have been developed for Efficiency and Reliability
Terminology:
- Relation → Table
- Tuple → Record
- Attribute → Field
Storing a Record
- Records can often be assumed to have a fixed (maximum) length.
- Records are often concatenated
WASTES A LOT OF SPACE!
Location of record fields are stored as number:
- Name VARCHAR(100); 100+1 bytes
- State CHAR(2); 2 bytes
- YearsInSenate INT; 1 byte
- Party VARCHAR(11); 11+1 bytes
Pointers do not have a known size. Additionally, dynamic allocation is slow
Common solution to wasted space: store fixed-length fields first and variable length fields afterwards. Keep a header for each record that stores "local pointers" to the variable fields
Records can then be grouped into blocks that correspond with a "unit" of disk/storage. When block overflows, fragment information is stored in record fragment headers (pointers to next/previous)
Addressing
2 types of addresses:
- location in database (on disk)
- location in memory (when cached)
Translation table maps items in virtual memory to the overall database. "Pointer swizzling" keeps these pointers up-to-date
Insertion, Deletion, and Modification
Insert a record into its appropriate block. When a block overflows, insertion-sort the records, storing half of each record on the original and new blocks.
Deleting can use "tombstone" to mark a deleted record
Modifying a fixed-length record is straightforward, but variable-length is difficult to keep track of memory allocation.
Indices
Special (often sorted) data structures to find all records that satisfy some condition
Usually one record per record, but sparse index can be used to find the record before and start searching sequentially
Multiple indices can be used in a Skip List format.
Reliability
- Checksums
- Mirroring Disks
- Parity Bits
- RAID levels