Lecture 24:  Sorting 4 – Merge & Quick

 

Objectives of this lecture

q       Learn the divide and conquer sorting methods – Merge Sort and Quick Sort

q       Learn the relative advantages and disadvantages of the two methods. 

 

Merge Sort

q       This is one of the algorithms that use the divide-and-conquer principle.

q       It is sometimes called EasySplit/HardJoin. The quicksort we consider next is known as HardSplit/EasyJoin.

q       The sorting process in the merge sort algorithm consists of the following steps:

Ø      Split the list into two equal (or nearly equal) sub-lists  – since smaller lists are easier to sort.

Ø      Repeat the process on the sub-list (recursively) until all the sub-lists are of order1 – which means they are already sorted.

Ø      Rewind the recursion by merging the sub-lists to form larger sorted list.   At the end, the original list would have been sorted.

q       The following diagram illustrates merge sort.

[28  81  36  13  17  47  55  65  23  18  67  38  3]

                   [28  81  36  13  17  47]  [55  65  23  18  67  38  3]

                [28  81  36]  [13  17  47]  [55  65  23]  [18  67  38  3]

     [28]  [81  36]  [13]  [17  47]  [55]  [65  23]  [18  67]  [38  3]

            [81]  [36]        [17]  [47]       [65]  [23]   [18]  [67]  [38] [ 3]

             [36  81]           [17  47]         [23  65]     [18  67]  [3  38]

        [28  36  81]   [13  17  47]    [23  55  65 ]  [3  18  38  67]

             [13  17  28  36  47  81]      [3  18  23  38  55  65  67]

     [3  13  17  18  23  28  36  38  47  55  65  67  81]


q       The following program implements merge sort method.

 

#include <stdio.h>

#define SIZE 13

 

void MergeSort(int x[], int first, int last);

void Merge(int x[], int init1, int final1, int init2, int final2);

 

int main(void)

{   int i;

     int list[SIZE] = {28,81,36,47,17,13,55,65,23,18,67,38,3};

     MergeSort(list, 0, SIZE-1);

     for(i = 0; i < SIZE; i++)

       printf("%d ", num[i]);

     return 0;

}

 

void Merge(int x[],int init1,int final1,int init2,int final2)

{ int n,j,k;

   int temp[SIZE];

   n = init1; j = init1; k = init2;

  while(j <= final1 && k <= final2)

  {   if(x[j] < x[k])

             temp[n++] = x[j++];

      else

             temp[n++] = x[k++];

   }

 

  while(j <= final1)

      temp[n++] = x[j++];

  while(k <= final2)

      temp[n++] = x[k++];

  for(n = init1; n <= final2; n++)

     x[n] = temp[n];

 return 0;

}

 

void MergeSort(int x[], int first, int last)

 { int mid;

    if(last > first)

    { mid = (first + last)/2;

      MergeSort(x, first, mid);

      MergeSort(x, mid + 1, last);

      Merge(x, first, mid, mid + 1, last);

    }

  return;

 }

Performance:

q       Partitioning of a list of size n in the way described above requires log2 n level of recursion and the merging process require about n comparisons.  Thus the performance is about n log2 n which is better than a typical quadratic sorting method. 

q       There is no best or worst case, as the list has to be partitioned to simplest sub-lists always.

q       The disadvantage of merge sort is that a separate array of the same size as the original is required in merging the sub-lists.  This takes extra space and computer time.

 

Quick Sort

q       Quick sort is another divide-and-conquer algorithm which spends more time in the splitting step than merge sort. This is why Quick Sort is called HardSplit/EasyJoin.

q       To do the splitting, Quick Sort first selects an element called the pivot and conceptually split the list into two sub-lists with respect to the pivot: the first sub-list consisting of all elements less than or equal to the pivot and the second consists of all elements that are greater or equal to the pivot.

q       These two sub-lists are then sorted using the same idea.  By the time the list reduces to single elements, the list would have been sorted.

q       The partitioning of a list is achieved by setting pointers left and right to the value of the first and last index and allowing them to move towards each other. 

q       The left pointer is allowed to increase until it reaches an element greater or equal to the pivot. 

q       Similarly, the right pointer is allowed to decrease until it reaches an element less than or equal to the pivot. 

q       Provided the pointers do not cross, the elements they point to, are swapped and the pointers move again.  This process continues until the pointers do cross at which stage the partition would have been achieved. 

q       The following diagram illustrates this idea.

 

Original list:

28

81

36

13

17

47

55

65

23

18

67

38

3

left

 

 

 

 

 

 

 

 

 

 

 

right

q       First we choose a pivot, say the middle element 47

q       The left will move and stop at 81 since 81>=47.  The right however cannot move since 3<=47.

28

81

36

13

17

47

55

65

23

18

67

38

3

 

left

 

 

 

 

 

 

 

 

 

 

right

 

q       We now swap the values to obtain the following.

28

3

36

13

17

47

55

65

23

18

67

38

81

 

left

 

 

 

 

 

 

 

 

 

 

right

 

q       Next left moves and stops at 47 since 47>=47.  The right will moves at stops at 38 since 38<=47. Thus we have the following:

28

3

36

13

17

47

55

65

23

18

67

38

81

 

 

 

 

 

left

 

 

 

 

 

right

 

 

q       We swaps these values to obtain the following:

28

3

36

13

17

38

55

65

23

18

67

47

81

 

 

 

 

 

left

 

 

 

 

 

right

 

 

q       Left moves and stops at 55 and right moves and stops at 18.

28

3

36

13

17

38

55

65

23

18

67

47

81

 

 

 

 

 

 

left

 

 

right

 

 

 

 

q       We swap the values to obtain:

28

3

36

13

17

38

18

65

23

55

67

47

81

 

 

 

 

 

 

left

 

 

right

 

 

 

 

q       Left moves and stops at 65 and right moves and stops at 23.  The two values are swapped to obtain:

28

3

36

13

17

38

18

23

65

55

67

47

81

 

 

 

 

 

 

 

left

right

 

 

 

 

 

q       Finally, left moves to 65 and stops and right moves to 23 and stops. 

28

3

36

13

17

38

18

23

65

55

67

47

81

 

 

 

 

 

 

 

right

left

 

 

 

 

 

q       Since the pointers have crossed, we stop.  We now have a partition in which all elements on the first sub-list are less than the pivot and those on the second are larger than the pivot.  The two lists are:

28

3

36

13

17

38

18

23

65

55

67

47

81

 

 

 

q       The algorithm is now applied on these sub-lists and the process continues.

q       The following program implements this method.

/* Quick sorts an array in increasing order */

 

#include <stdio.h>

#define SIZE 13

void swap(int *a , int *b);

void partition(int x[], int *left , int *right, int pivot);

void QuickSort(int x[], int start, int end);

 

int main(void)

{   int i;

     int list[SIZE] = {28,81,36,13,17,47,55,65,23,18,67,38,3};

     QuickSort(list, 0, SIZE-1);

     for(i = 0; i < SIZE; i++)

         printf("%d ", list[i]);

     return 0;

}

 

void QuickSort(int x[], int start, int end)

 {   int left = start, right = end, pivot = x[(start + end)/2];

 

     partition(x, &left, &right, pivot);

     if(start < right)

         QuickSort(x, start, right);

     if(left < end)

         QuickSort(x, left, end);

     Return 0;

 }

 

void partition(int x[], int *left , int *right, int pivot)

{  do

    {   while(x[*left] < pivot)

                 (*left)++;

            while(x[*right] > pivot)

                (*right)--;

 

            if(*left < *right)

            {  swap(&x[*left] , &x[*right]);

                (*left)++;

                (*right)--;

              }

             else if(*left == *right)

                 (*left)++;

    }while(*left <= *right);

    return  0;

 }

void swap(int *a , int *b)

 {

   int temp;

   temp =  *a;

   *a =  *b;

   *b =  temp;

   return;

  }

 

Performance:

q       If we assume that at each stage a list is partitioned into two sub-lists of approximately the same length, then the performance would be about n log2 n as for merge sort.  However, it requires far less movement of data and does not require another array and so in general it is much faster.

 

Choice of Pivot:

q       For random data,  the choice of pivot is not critical, but for ordered or nearly ordered data, it can be very critical.  For example, consider the following list:

1       2       3       4       5       6       7

q       if we choose 1 as pivot, the we would have the following sub-lists.

1

2       3       4       5       6       7

 

q       This is the worst case, as the performance will be in the order n*n