CSCE 470 Lecture 2

From Notes
Jump to navigation Jump to search

« previous | Wednesday, August 28, 2013 | next »


Abstract IR Architecture

Query → Query Representation
+ Document → Document Representation → index
= hits

Term-Document Incidence Matrix

Rows = terms

Columns = document IDs

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle M_{ij}} is 1 if term Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i} appears in document Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle j} , 0 otherwise

There will be very few ones and very many zeroes, so this will be a sparse matrix


Sorting Keys

  • Linear search (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle O\left( n \right)} )
  • Lexicographically ordered: binary search (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle O\left( \log{n} \right)} )
  • Hash Map (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle O\left( 1 \right)} )


Boolean queries:

AND, OR, NOT

Represent rows of matrix as vector, perform corresponding bitwise operations on vector to get results

no ranking Face-sad.svg

Index

Throw away all the zeroes, only keep the ones

Dictionary maps term keys to a collection of documents.

Boolean query operations now become set operations (union, intersect, complement)


Search optimizations now come in to play