Tables and Information retrieval
: Introduction
Objectives of this lecture
q Introduce another approach
to Information Retrieval – other than key comparison
q Learn how to transform
representation from 2-D to Linear – index function
q Learn how to use Access
Tables for searching and sorting.
Introduction:
q Our best searching method so
far (binary
search)
require O(log n) comparisons, while our best sorting method require O(n log
n) --- Can we
do better than that?
q If we have relatively small
number of records with integer index (key field) is small range, 1 … 500, then we can represent the
records in a 1-D array. In this case, we can locate any record
directly O(1).
q In this lecture and the
next, we shall study how we can extend this idea to cover some general cases.
q We shall adopt the
convention of using parenthesized index expression like (i, j) to refer to element a[i, j] of an array.
Rectangular array (2-D array)
q Although we think of 2-D
array as being in a plane (consisting of rows and columns), it has to be
converted to linear format by the compiler for representation in the memory.
q A simple formula for doing
this for an m x n array represented in row-major, is given by the function:
f(i,j) = ni +j,
where (i,j) represent a cell on row i, column j.
q The above formula is called index function.
q Another way of achieving the
above mapping that does not involve multiplication is to use an auxiliary
array to store the values:
0, n, 2n,
3n, . . . , (m-1)n.
q An element in position (i, j) is obtained by taking the entry in position i of the auxiliary array and
adding j to it.
q The following figure
demonstrate this idea.
q
This
auxiliary table is an example of access table.
q In general, an access table
is an auxiliary array that is used to find data stored in another array.
Sparse Matrix:
q In many applications, the
content of a 2-D array (matrix) contains 0- values. Examples are diagonal matrix, triangular matrices, etc. The
figure below shows a lower triangular matrix.
q For this type of matrices,
we can save space by generating index functions that allocate memory only to
the non-zero values.
q For the above example (lower triangular matrix), we observe that there are:
0 entries before row 0,
1 entry before row 1
1+2 entries before row 2
………….
1+2+3+4+… + i = ½ i (i+1) entries before row i.
q Thus, the index function is
given by
f(i,j) = ½ i (i+1) + j
q The access table for this
example is :
0, 1, 1+2,
1+2+3, . . .
General tables:
q Access tables can be used to
access even those arrays that are not in any special form. We only need to know the length of each row.
q
Multi-key access table.
q Another application of
access table is where there is a need to keep information sorted in many
(different) orders (by id_no, name,
address, etc.)
q Instead of keeping multiple
copies, we just keep an access table for of the order required.
q To sort by a given order, we
simply make a sequential (linear) traversal of the particular access
table. The following figure
demonstrates this.