CSCE 470 Lecture 2

From Notes
Jump to navigation Jump to search

« previous | Wednesday, August 28, 2013 | next »


Abstract IR Architecture

Query → Query Representation
+ Document → Document Representation → index
= hits

Term-Document Incidence Matrix

Rows = terms

Columns = document IDs

is 1 if term appears in document , 0 otherwise

There will be very few ones and very many zeroes, so this will be a sparse matrix


Sorting Keys

  • Linear search ()
  • Lexicographically ordered: binary search ()
  • Hash Map ()


Boolean queries:

AND, OR, NOT

Represent rows of matrix as vector, perform corresponding bitwise operations on vector to get results

no ranking Face-sad.svg

Index

Throw away all the zeroes, only keep the ones

Dictionary maps term keys to a collection of documents.

Boolean query operations now become set operations (union, intersect, complement)


Search optimizations now come in to play