Hashing : Implementation
Objectives of this lecture
q Learn how Hash Table my be
implemented (Open addressing approach)
q Introduce the Chaining
approach
q Compare hashing with other
searching methods
Implementation of Open addressing method
Specification:
q The major operations of
hashing are: CreateTable, ClearTable, InsertTable and RetrieveTable.
q The specification for these
operations is as follows:
Implementation:
q A major problem with Hashing
is that its implementation depends too much on the particular application being
considered. For example, choosing the
Hash function depends on the key. The same goes to the other operations.
q Thus, the best we can do is
to consider a specific example and show how the above operations may be
implemented using it.
q The example we shall
consider assumes that:
Ø The key field is a string
Ø The Hash function is
obtained by adding the ASCII code of the key and taking the modulus with the
table size.
int
hash(char *key)
{ int value=0;
while (*key != ‘\0’)
value+=*key++;
return (value/HASHSIZE);
}
Ø Collision is resolved by linear
probing but by adding increment of 2 on successive collision.
With this in mind, we could have the following:
/*
declarations for a hash table with open addressing */
#define
HASHSIZE 997
typedef
char *Key_type;
typedef
struct item_tag {
Key_type key;
} Item_type;
typedef
Item_type Hashtable[HASHSIZE];
/*
Initialize Hash Table to empty
Pre:
None.
Post:
The hash table H has been created and initialized to
be
empty.
*/
void
CreateTable (HashTable H)
{ int i;
for (i=0; i<HASHSIZE; i++)
H[i].key = NULL;
}
/*
Resets Hash Table to empty
Pre:
The hash table H has been created.
Post:
The hash table H has been cleared and is empty.
*/
void
ClearTable (HashTable H)
{ int i;
for (i=0; i<HASHSIZE; i++)
H[i].key = NULL;
}
/*
Insert: insert an item using open addressing and linear probing.
Pre: The hash table H has been created and is
not full. H has no
current entry with key equal to
that of newitem.
Post: The item newitem has been inserted into H.
Uses: Hash.
*/
void
Insert(HashTable H, Item_type newitem)
{
int pc = 0; /* probe count to be sure that table is not full */
int probe; /* position currently probed in H */
int increment = 1; /* increment used for linear probing */
probe = Hash(newitem.key);
while (H[probe].key != NULL
&& /* the location is not
empty */
strcmp(newitem.key, H[probe].key)
&& /* No Duplicate key */
pc <= HASHSIZE / 2) /*Table not exhousted
*/
{
pc++;
probe = (probe + increment) %
HASHSIZE;
increment += 2; /* Prepare increment for next
iteration. */
}
if (H[probe].key == NULL)
H[probe] = newitem; /* Insert the new item.
*/
else if (strcmp(newitem.key, H[probe].key)
== 0)
Error("The same key cannot appear
twice in the hash table.");
else
Error("Hash table is full;
insertion cannot be made.");
}
RetrieveTable
: Exercise.
Note:
q Deleting an entry from a
harsh table created using open addressing is very difficult. This is so because
an empty location is used as a signal to stop the search for a key.
Collision Resolution by chaining:
q Array is the natural choice
for Hashing since our aim is to have random access.
q However, we can simplify the
collision resolution process and also save space by combining array and linked
list. i.e . we use linked list to store
the records that hash to the same position on the hash table so that the table
is simply an array of pointers as shown below.
This method is called chaining, and the individual lists are called chains.
Advantages of Chaining:
q Saving Storage: It is desirable in open addressing method to declare enough array
size to avoid collision. If the records
are large, considerable space may be saved using chaining since the table
itself contains only the pointers, unlike open addressing where the records are
in the array. Moreover, it is no longer necessary for the table size to be more
than the number of records.
q Simple Collision handling: If collision occurs, we
simply need to create a node and link it to the appropriate chain. This
simplifies the collision resolution process.
Moreover, with good hash function, the lists are short, and can be
searched quickly.
q Deletion is easy: Deletion proceeds in
exactly the same way as linked list.
Disadvantages:
q Waste of Storage: All the pointers require
space. If the record size is large,
then this is negligible, but if the records are small, then it is not.
q Could be slow: for a bad hash function,
the chains may be long and since liked list must be searched linearly, the process
may be slow.
Implementation:
q We shall study this after we
study/review linked list, but it is worth noting that the hashing operation are
coded by using the linked list operation.
q A chained hash table takes
the simple declaration:
typedef
List HashTable[HASHSIZE];
q The code for creating table
for example is simply:
for
(i=0; i<HARSHSIZE; i++)
CreateList(H[i]);
Comparison with other information retrieval methods:
Advantage of Hashing
q Insertion, Deletion and
Retrieval can be done in constant average time
Disadvantage of Hashing
q Can require a lot more extra
storage than other searching methods.
q Ordering of data is not
supported.
Ø For example, the following
operations are not supported:
Ø finding the minimum value of
a data set.
Ø finding the maximum value of
a data set.
Ø printing the data in sorted
order.
Exercises:
q Try Exercises E3 – E7 on
page 365 of your book.