CSCE 470 Lecture 2
Jump to navigation
Jump to search
« previous | Wednesday, August 28, 2013 | next »
Abstract IR Architecture
Query → Query Representation + Document → Document Representation → index = hits
Term-Document Incidence Matrix
Rows = terms
Columns = document IDs
is 1 if term appears in document , 0 otherwise
There will be very few ones and very many zeroes, so this will be a sparse matrix
Sorting Keys
- Linear search ()
- Lexicographically ordered: binary search ()
- Hash Map ()
Boolean queries:
AND, OR, NOT
Represent rows of matrix as vector, perform corresponding bitwise operations on vector to get results
no ranking
Index
Throw away all the zeroes, only keep the ones
Dictionary maps term keys to a collection of documents.
Boolean query operations now become set operations (union, intersect, complement)
Search optimizations now come in to play