Review of Searching Methods

 

Objectives of this lecture

q       Review two methods of searching

q       Introduce the concept of Algorithm analysis

 

Introduction

q       Information retrieval is one of the most important applications of computers

q       It usually involves giving a piece of information called the key, and asked to find a record that contains other associated information.

q       This is achieved by first going through the list to find if the given key exists or not,  a process called searching.

q       Searching ( and sorting) is usually a time-consuming process, and is therefore a good candidate for learning algorithm analysis.

q       Searching can be external (if the data is contained on a disk, tape, CD-ROM, etc) or internal (if the data is contained in the memory).  We shall restrict our discussions to internal searching.

q       We shall assume the following declarations throughout this lecture.

 

#define MAXLIST ….

typedef …. KeyType;

typedef struct { .….

KeyType  key;

…….

   } ListEntry;

typedef struct {int count;

ListEntry entry[MAXLIST];

   } List;

 

Note that KeyType could be of type int, float, or string (char *).

 

q       We shall adopt the convention of returning the index (location) of the search key if it exists or –1 if it does not exists.

 

Macros:

q       In searching, we often need to compare between two keys to determine which comes fist.  For numeric keys, we use the operators ‘<’ and ‘==’.  However, for string keys, we must use the function strcmp().

q       However, we would like to code our algorithms such that they work in both cases.

 

q       One method of achieving this is to introduce functions such as:

Boolean EQ (KeyType key1, KeyType key2);

Boolean LT (KeyType key1, KeyType key2);

 

q       However, this method is not efficient since it involves a function call each time a pair of keys is compared.

q       Fortunately, C provides a feature that allows us to code these functions without the need for functions calls – macros.

q       Macros are like functions – they are called with actual parameters and they return some result.

q       However, unlike functions, they are not handled by the compiler, but by the C pre-processor. 

E.g.  The following macro computes a square of a given argument:

#define square(x)   (x)*(x).

 

Notice that the parenthesis in (x)*(x) are necessary in case the argument is an expression. 

If the definition were:    #define square(x) x*x

the call square(a+b) would be evaluated as:   a+b*a+b

 

q       The macros we need for the comparison of keys are:

For numeric keys

#define EQ(a,b)  ((a) == (b))

#define LT(a,b)  ((a) < (b))

 

For string keys

#define EQ(a,b)  (!strcmp((a),(b)))

#define LT(a,b)  (strcmp((a),(b)) < 0)

 

Sequential Search

q       This is the simplest searching method.  It simply scans the list from the beginning until the search key is found or the end of the list is reached.

 /* SequentialSearch: contiguous version.

Pre:  The contiguous list has been created.

Post: If target exists, the function returns its location (success).

         Otherwise the function returns -1 (failure).

 */

 

int SequentialSearch(List list, KeyType target)

{  int location;

    for (location = 0; location < list.count; location++)

        if (EQ(list.entry[location].key, target))

            return location;

    return -1;

}

 

Informal Analysis of Sequential Search:

q       The significant work for this algorithm is done inside the loop.  For each pass through the loop, one key is compared with the target. Other statements are also executed but they all depends on the key comparison.  Thus, the amount of work done by the algorithm depends on the number of key comparisons made.

q       This number depends on if and when the target is found as follows:

1.   Unsuccessful search: This requires n comparisons

2.   Best case: search element at first position; this requires only one comparison.

3.   Worst case: search element at last position; this requires n comparisons.

4.   Average case: Assuming each element is as likely as any other in the list, then the average number of comparisons required for a successful search is: Sum up the number of comparisons required for each element and then divide by n:

 

(1+2+3+...+n)/n = (½n(n+1))/n = ½(n+1)

 

Binary Search

q       For an ordered list, it is a waste of time to look for an item using linear search (it would be like looking for a word in a dictionary sequentially).  In this case we apply binary search – more efficient.

q       Binary search works by comparing the target with the item at the middle of the list.  This leads to one of three results:

Ø      The middle item is the target – we are done.

Ø      The middle item is less than target – we apply the algorithm to the upper half of the list.

Ø      The middle item is bigger than the target – we apply the algorithm to the lower half of the list.

q       This process is repeated until the item is found or the list is exhausted.

q       The following functions implements this approach:.

 

/* BinarySearch: a version of binary search.

 

Pre:   The contiguous list has been created.

Post: If target exists, the function returns its location (success).

         Otherwise the function returns -1 (failure).

 */

 

int BinarySearch(List list, KeyType target)

{

    int bottom, middle, top;

 

    top = list.count - 1;   // Initialize bounds to cover entire list.

    bottom = 0;

    while (top >= bottom) {     /* Check terminating condition. */

        middle = (top + bottom) / 2;

        if (EQ(target, list.entry[middle].key))

            return middle;

        else if (LT(target, list.entry[middle].key))

            top = middle - 1;    // Reduce to the bottom half of the list

        else

            bottom = middle + 1;  // Reduce to the top half of the list. 

    }

    return -1;

}

 

Informal Analysis of Binary Search:

q       The easiest way to see how binary search works is by drawing its comparison tree. 

q       This is done by tracing the action of an algorithm, representing each comparison of keys by a vertex and the possible outcome of the comparison are represented as branches.

q       The following figure shows the comparison tree for binary search algorithm for a list of 10 elements:

 

 

q       The number of comparison required for a given search is the number of vertices traversed in going from the root of the tree to the down to the appropriate end node (also called leaf node).

q       The maximum number of comparison is the number of vertices in the longest path that occur in the tree.  This is called the height of the tree.

q       Notice that since the list keep reducing by half (1/2, 1/4, 1/8, etc..).  for a list of size n, this maximum number is an integer k such that: 2k ³ n.

Taking the log (to base 2) on both size of the inequality, we have:

K ³ log2n