Binary Search Trees (BST)
Objectives of this lecture
q Learn about Binary Search
Trees – how it solves the draw-backs of linked list
q Study the operations of BST,
TreeSearch, InsertTree, TreeSort, and DeleteNode.
What is Binary Search Tree?
q Binary search tree is
another important type of binary tree that is used in information retrieval
applications.
q A binary search tree is
defined as a binary tree that is either empty or in which every node contains a
key such that:
Ø The keys in the left subtree
(if it exists) are less than the key in the root
Ø The keys in the right
subtree (if it exists) are greater than the key in the root
Ø The left and right subtrees
(if they exists) are again binary search trees
q Note the above definition
can be modified to allow for duplicate keys.
q Notice also that in BST,
there must be a key as part of the entry. Thus we shall assume that TreeEntry has the following
declaration.
typedef
… KeyType;
typedef
struct {KeyType key;
……
} TreeEntry;
q We can apply the operations
already defined for general binary trees to BST without difficulty. these includes: CrateTree, ClearTree, TreeEmpty, TreeFull and the three traversal functions.
Further operations on Binary Search Trees
TreeSearch:
q The first important additional
operation for BST is the Search operation.
q To search for a target, we
first compare it with the key of the root.
If it is the same, we are done.
If it is not the same, we go to the left subtree or to the right subtree
as appropriate and repaeat the process.
This continue until we either find the target or we reach a subtree that
is empty.
/*
TreeSearch: search for target starting at node root.
Pre: The tree to which root points has been
created.
Post:
It returns a pointer to a tree node that matches target
or
NULL if the target is not in the tree.
*/
TreeNode
*TreeSearch(TreeNode *root, KeyType target)
{
if (root)
if (LT(target, root->entry.key))
root = TreeSearch(root->left, target);
else if (GT(target,
root->entry.key))
root = TreeSearch(root->right,
target);
return root;
}
Analysis
q If we apply binary search to an ordered (contiguous)
list and draw its comparison tree (figure (a) below), we can see that TreeSearch does the same number of
comparisons when applied to this same tree.
Thus, for a well (or nearly) balanced BST, the performance of TreeSeach
is the same (or about the same) as BinarySearch. i.e O(log n).
q
q
q
q However, since, the
representation of binary tree is not unique (see other possible representations
(b) –
(e)
above), it cannot be guaranteed that TreeSearch will always give O(log n) performance.
q Nevertheless, in practice,
if the keys are built into a binary search tree in random order, it is very
unlikely that it will degenerate as badly as in fig (d) and (e) above.
Insertion into a binary search tree
q The next important operation
is the insertion of a new node into a binary search tree. It must be done such that the keys remain in
order.
q To insert a node into an
empty tree, we only need to make it the root and set its left and right
subtrees to be NULL.
q To insert into a non-empty
tree, we must compare the key with the one on the root. If it is less, we insert on the left
subtree. If it is more, we insert on
the right subtree. If it is equal, we
adopt the convention of inserting the duplicate key into the right subtree.
/*
InsertTree: insert a new node in the tree.
Pre: The binary search tree to which root points
has been created.
The parameter newnode points to a node
that has been created and
contains a key in its entry.
Post:
The node newnode has been inserted into the tree in such a way
that the properties of a binary search
tree are preserved.
*/
TreeNode
*InsertTree(TreeNode *root, TreeNode *newnode)
{
if (!root) {
root = newnode;
root->left = root->right = NULL;
} else if (LT(newnode->entry.key,
root->entry.key))
root->left =
InsertTree(root->left, newnode);
else
root->right =
InsertTree(root->right, newnode);
return root;
}
q The following figure shows
what happens when we insert the keys: e, b, d, f, a, g, c into an initially empty tree in the order given.
q Note that it is quite
possible that a different order of insertion can produce the same binary
tree. For example the following:
e, f, g, b, a, d, c or e, b, d,
c, a, f, g.
TreeSort:
q Observe that another sorting
method, called TreeSort, can be obtained by inserting a list of elements into a binary search
tree and then using Inorder traversal to output them.
q One advantage of this
sorting method is that the elements need not all be available at the start of
the process, but are built into the tree as they become available. Hence TreeSort can be very useful in
applications where elements are recorded one at a time.
q The performance of TreeSort
on a ramdomly ordered list is O(n log n).
q However, it is suffers from
the same draw-back as QuickSort since for a degenerate tree, the performance can be
n2.
Deletion from a binary tree:
q There are three cases to
consider when deleting a node from a binary search tree.
q If the node to be deleted is
a leaf
node, then the deletion is easy; we need only to replace the link to the
deleted node by NULL as shown by the following figure:
q If the node to be deleted
has only one subtree, again the deletion is easy, we need only to link the parent of the
node to its (only one) child as shown in the figure below:
q If the node to be deleted
has two children, then the operation is a bit complicated –how do we link the two
hanging subtrees?. There are two
methods.
Method 1:
q The right subtree is linked
with the parent of the deleted node. If
no such parent exists (i.e the root is being deleted), the right child becomes
the root.
q The left subtree is linked
with the smallest element of the right subtree. The smallest element of the right subtree is obtained by
traversing to the left of the right child until NULL is reached. The following
figure illustrates this.
q
q
q
q The following code
implements these ideas.
/*
DeleteNodeTree: delete a new node from the tree.
Pre: The parameter p is the address of a node in
a binary search
tree, and p
is not NULL.
Post:
The node p has been deleted from the binary search tree and
the
resulting tree has the properties of a binary search tree.
*/
void
DeleteNodeTree(TreeNode **p)
{ TreeNode *r = *p, *q; /* used to find place for left subtree */
if (r == NULL)
Error("Attempt to delete a
nonexistent node ");
else if (r->right == NULL) {
*p = r->left; /* Reattach left subtree. */
free(r); /* Release node space. */
} else if (r->left == NULL) {
*p = r->right; /* Reattach right subtree. */
free(r);
} else { /* Neither subtree is empty. */
for (q = r->right; q->left; q =
q->left); //inorder successor
q->left = r->left; /* Reattach left subtree. */
*p = r->right; /* Reattach right subtree. */
free(r);
}
}
q Note that since most of the
times, we only have the key of the node to be deleted, before we can use the
above function, we must use the TreeSearch Function to find its address.
q The problem with method 1 is
that it could make the height of the resulting tree higher that it was before
deletion as shown by the following:
Method 2:
q To delete a node d, with two children, we
first find the node with the next higher (or lower) key. We then swap this node with d and then delete d. This ensures that the
height of the resulting tree is not higher than the original tree.
q To find the node with next
higher key (also called inorder successor), we traverse to the left of d’s right child until we
reach NULL.
q To find the node with next
lower key (also called inorder predecessor), we traverse to the right of d’s left child until we reach
NULL
q The following figure
illustrates this method but its implementation is left as exercise.
Exercises:
Implement
DeleteNode function using method 2.
Try Exercises E1-E5, of
pages 409-410 of your book.