Tables and Information retrieval :  Introduction

 

Objectives of this lecture

q       Introduce another approach to Information Retrieval – other than key comparison

q       Learn how to transform representation from 2-D to Linear – index function

q       Learn how to use Access Tables for searching and sorting.

 

Introduction:

q       Our best searching method so far (binary search) require O(log n) comparisons, while our best sorting method require O(n log n) --- Can we do better than that?

q       If we have relatively small number of records with integer index (key field) is small range, 1 … 500, then we can represent the records in a  1-D array.  In this case, we can locate any record directly O(1).

q       In this lecture and the next, we shall study how we can extend this idea to cover some general cases.

q       We shall adopt the convention of using parenthesized index expression like (i, j) to refer to element a[i, j] of an array.

 

Rectangular array (2-D array)

q       Although we think of 2-D array as being in a plane (consisting of rows and columns), it has to be converted to linear format by the compiler for representation in the memory.

q       A simple formula for doing this for an m x n array represented in row-major, is given by the function:

f(i,j) = ni +j,  where (i,j) represent a cell on row i, column j.

 

q       The above formula is called index function.

q       Another way of achieving the above mapping that does not involve multiplication is to use an auxiliary array  to store the values:

0,  n,   2n,   3n,  . . . , (m-1)n.

 

q       An element in position (i, j)  is obtained by taking the entry in position i of the auxiliary array and adding j to it.

 

 

q       The following figure demonstrate this idea.

q       This auxiliary table is an example of access table.

q       In general, an access table is an auxiliary array that is used to find data stored in another array.

 

Sparse Matrix: 

q       In many applications, the content of a 2-D array (matrix) contains 0- values.  Examples are diagonal matrix, triangular matrices, etc.  The figure below shows a lower triangular matrix.

 

q       For this type of matrices, we can save space by generating index functions that allocate memory only to the non-zero values.

q       For the above example (lower triangular matrix), we observe that there are:

0 entries before row 0,

1 entry before row 1

1+2 entries before row 2

………….

1+2+3+4+… + i = ½ i (i+1) entries before row i.

 

q       Thus, the index function is given by

f(i,j) = ½ i (i+1) + j

 

q       The access table for this example is :

0,   1,   1+2,   1+2+3,    . . .

 

 

General tables:

q       Access tables can be used to access even those arrays that are not in any special form.  We only need to know the length of each row.

q      

Multi-key access table.

q       Another application of access table is where there is a need to keep information sorted in many (different) orders  (by id_no, name, address, etc.)

 

q       Instead of keeping multiple copies, we just keep an access table for of the order required.

q       To sort by a given order, we simply make a sequential (linear) traversal of the particular access table.  The following figure demonstrates this.