Hashing : Implementation

Objectives of this lecture

q Learn how Hash Table my be implemented (Open addressing approach)

q Introduce the Chaining approach

q Compare hashing with other searching methods

Implementation of Open addressing method

Specification:

q The major operations of hashing are: CreateTable, ClearTable, InsertTable and RetrieveTable.

q The specification for these operations is as follows:

Implementation:

q A major problem with Hashing is that its implementation depends too much on the particular application being considered. For example, choosing the Hash function depends on the key. The same goes to the other operations.

q Thus, the best we can do is to consider a specific example and show how the above operations may be implemented using it.

q The example we shall consider assumes that:

Ø The key field is a string

Ø The Hash function is obtained by adding the ASCII code of the key and taking the modulus with the table size.

int hash(char *key)

{ int value=0;

while (*key != ‘\0’)

value+=*key++;

return (value/HASHSIZE);

}

Ø Collision is resolved by linear probing but by adding increment of 2 on successive collision.

With this in mind, we could have the following:

/* declarations for a hash table with open addressing */

#define HASHSIZE 997

typedef char *Key_type;

typedef struct item_tag {

Key_type key;

} Item_type;

typedef Item_type Hashtable[HASHSIZE];

/* Initialize Hash Table to empty

Pre: None.

Post: The hash table H has been created and initialized to

be empty.

void CreateTable (HashTable H)

{ int i;

for (i=0; i<HASHSIZE; i++)

H[i].key = NULL;

}

/* Resets Hash Table to empty

Pre: The hash table H has been created.

Post: The hash table H has been cleared and is empty.

void ClearTable (HashTable H)

{ int i;

for (i=0; i<HASHSIZE; i++)

H[i].key = NULL;

}

/* Insert: insert an item using open addressing and linear probing.

Pre: The hash table H has been created and is not full. H has no

current entry with key equal to that of newitem.

Post: The item newitem has been inserted into H.

Uses: Hash.

void Insert(HashTable H, Item_type newitem)

{

int pc = 0; /* probe count to be sure that table is not full */

int probe; /* position currently probed in H */

int increment = 1; /* increment used for linear probing */

probe = Hash(newitem.key);

while (H[probe].key != NULL && /* the location is not empty */

strcmp(newitem.key, H[probe].key) && /* No Duplicate key */

pc <= HASHSIZE / 2) /*Table not exhousted */

{

pc++;

probe = (probe + increment) % HASHSIZE;

increment += 2; /* Prepare increment for next iteration. */

}

if (H[probe].key == NULL)

H[probe] = newitem; /* Insert the new item. */

else if (strcmp(newitem.key, H[probe].key) == 0)

Error("The same key cannot appear twice in the hash table.");

else

Error("Hash table is full; insertion cannot be made.");

}

RetrieveTable : Exercise.

Note:

q Deleting an entry from a harsh table created using open addressing is very difficult. This is so because an empty location is used as a signal to stop the search for a key.

Collision Resolution by chaining:

q Array is the natural choice for Hashing since our aim is to have random access.

q However, we can simplify the collision resolution process and also save space by combining array and linked list. i.e . we use linked list to store the records that hash to the same position on the hash table so that the table is simply an array of pointers as shown below. This method is called chaining, and the individual lists are called chains.

Advantages of Chaining:

q Saving Storage: It is desirable in open addressing method to declare enough array size to avoid collision. If the records are large, considerable space may be saved using chaining since the table itself contains only the pointers, unlike open addressing where the records are in the array. Moreover, it is no longer necessary for the table size to be more than the number of records.

q Simple Collision handling: If collision occurs, we simply need to create a node and link it to the appropriate chain. This simplifies the collision resolution process. Moreover, with good hash function, the lists are short, and can be searched quickly.

q Deletion is easy: Deletion proceeds in exactly the same way as linked list.

Disadvantages:

q Waste of Storage: All the pointers require space. If the record size is large, then this is negligible, but if the records are small, then it is not.

q Could be slow: for a bad hash function, the chains may be long and since liked list must be searched linearly, the process may be slow.

Implementation:

q We shall study this after we study/review linked list, but it is worth noting that the hashing operation are coded by using the linked list operation.

Hashing : Implementation

q Learn how Hash Table my be implemented (Open addressing approach)

q Introduce the Chaining approach

q Compare hashing with other searching methods

Specification:

q The major operations of hashing are: CreateTable, ClearTable, InsertTable and RetrieveTable.

q The specification for these operations is as follows:

q A major problem with Hashing is that its implementation depends too much on the particular application being considered. For example, choosing the Hash function depends on the key. The same goes to the other operations.

q Thus, the best we can do is to consider a specific example and show how the above operations may be implemented using it.

q The example we shall consider assumes that:

Ø The key field is a string

Ø The Hash function is obtained by adding the ASCII code of the key and taking the modulus with the table size.

Ø Collision is resolved by linear probing but by adding increment of 2 on successive collision.

With this in mind, we could have the following:

Note:

q Deleting an entry from a harsh table created using open addressing is very difficult. This is so because an empty location is used as a signal to stop the search for a key.

q Array is the natural choice for Hashing since our aim is to have random access.

Advantages of Chaining:

q Simple Collision handling: If collision occurs, we simply need to create a node and link it to the appropriate chain. This simplifies the collision resolution process. Moreover, with good hash function, the lists are short, and can be searched quickly.

q Deletion is easy: Deletion proceeds in exactly the same way as linked list.

Disadvantages:

q Waste of Storage: All the pointers require space. If the record size is large, then this is negligible, but if the records are small, then it is not.

q Could be slow: for a bad hash function, the chains may be long and since liked list must be searched linearly, the process may be slow.

Implementation:

q We shall study this after we study/review linked list, but it is worth noting that the hashing operation are coded by using the linked list operation.

q A chained hash table takes the simple declaration:

q The code for creating table for example is simply:

Advantage of Hashing

q Insertion, Deletion and Retrieval can be done in constant average time

Disadvantage of Hashing

q Can require a lot more extra storage than other searching methods.

q Ordering of data is not supported.

Ø For example, the following operations are not supported:

Ø finding the minimum value of a data set.

Ø finding the maximum value of a data set.

Ø printing the data in sorted order.

q Try Exercises E3 – E7 on page 365 of your book.