Lecture #16 (14 March 2002)

Algorithms: Iteration and Recursion
(Case Study: Sorting)

Overall Reading
Brookshear: pp. 191-194,
pp. 206-211,
Question 5 on p. 195, and its solution on p. 572,
bottom of p. 515 through top of p. 518.

Outline:

  • A Case Study: Sorting

  • Iterative Approaches:
  • Selection Sort (Question 5 on p. 195, and its solution on p. 572)
  • Insertion Sort (pp. 191-194)
  • Divide-and-Conquer Approach:
  • Merge Sort (bottom of p. 515 through top of p. 518)

  • A Case Study: Sorting

    Imagine we are given a list of names in arbitrary order and we want to alphabetize them.
    How would you do this?

    We assume that the items are originally sitting in consecutive memory locations, though in arbitrary order.


    Iterative Approaches:

  • Selection Sort (Question 5 on p. 195, and its solution on p. 572)

    Intuition is straightforward:

  • Look for the "smallest" name, swapping any candidate you find to the front of the list, as you go.
  • Find the next "smallest" name, which should be placed as the second entry of the list.
  • etc.
  • Here is one possible pseudocode (from p. 572):

    procedure SelectionSort(List)
      assign N the value 1;
      while (N is less than the length of List) do
        (assign J the value N + 1;
         while (J is no greater than length of List) do
           (if (the entry in position J is less than the entry in position N)
              then (interchange the two entries)
            assign J the value J+1)
         assign N the value N + 1)  
    

    Slightly better would be the following, which avoids swapping until you are sure you have found the proper item for a position.

    procedure SelectionSort2(List)
      assign N the value 1;
      while (N is less than the length of List) do
        (assign MIN_INDEX_SO_FAR the value N;
         assign J the value N + 1;
         while (J is no greater than length of List) do
           (if (the entry in position J is less than the entry in position MIN_INDEX_SO_FAR)
              then assign MIN_INDEX_SO_FAR the value J
            assign J the value J+1)
         interchange entries in positions N and MIN_INDEX_SO_FAR
         assign N the value N + 1)  
    

    Advantage: Number of swaps is at most the length of the list.

    Efficiency Analysis
    If length of list is denoted as N, then Selection Sort always requires a number of operations which grows proportional to N2, though at most N-1 swaps.


  • Insertion Sort (pp. 191-194)

    The intution is as follows:

  • If we consider only the first item in the list by itself, it can be viewed as a (very small) alphabetized list.

  • If we then consider only the first two items in the list, they might be alphabetized already, or else by switching them we can make them alphabetized.

  • In general, if the first N-1 items have already been alphabetized, we can determine the alphabetized list of the first N items by simply determining where the Nth items should be placed relative to the earlier items, moving a group of those earlier items one spot farther down the list to make room for the new item.
  • Let's look at some examples, and then consider expressing the algorithm in pseudocode (as in Figure 4.11):

      procedure InsertionSort (List)
        assign N the value 2;
        while (the value of N does not exceed the length of List) do
          (
           Select the Nth entry in List as the "pivot" entry;
           Move the pivot entry to a temporary location leaving a hole in List;
           while (there is a name above the hole AND
                  that name is greater than the pivot) do
             (
              move the name above the hole down into the hole
              leaving a hole above the name
             )
           Move the pivot entry into the hole in List;
           assign N the value N+1
          )
    

    Advantage: Can be very quick when original list was almost sorted.
    Disadvantage: In some cases, there are many swaps.

    Efficiency Analysis
    Insertion Sort has efficiency which depends very much on the original order of the list. In the best case (when the list is nearly sorted), the overall number of operations is proportional to N. However, in the worst case, the overall number of swaps and other operations is proportional to N2.


  • Divide-and-Conquer Approach:

  • Merge Sort (bottom of p. 515 through top of p. 518)
    We design a recursive sort based on the observation that it is relatively easy to merge two lists together, if both of those lists are already sorted.

    The recursive procedure can be described as:

      procedure MergeSort (List)
        if (List has more than one entry)
          then (Apply the procedure MergeSort to sort the first half of the List;
                Apply the procedure MergeSort to sort the second half of the List;
                Apply the procedure MergeLists to the two halves
               )
    

    The only detail we still need to specify is how a procedure MergeLists can be designed to efficiently merge two lists which are each already known to be sorted.

      procedure MergeLists (InputListA, InputListB, OutputList)
        if (both input lists are empty) then (Stop, with OutputList empty)
    
        if (InputListA is empty)
          then (Declare it to be exhausted)
          else (Declare its first entry to be its current entry)
        if (InputListB is empty)
          then (Declare it to be exhausted)
          else (Declare its first entry to be its current entry)
    
        while (neither input list is exhausted) do
          (Put the "smaller" current entry in OutputList;
           if (that current entry is the last entry in its corresponding input list)
             then (Declare that input list to be exhausted)
             else (Declare the next entry in that input list to be the list's current entry)
          )
    
        Starting with the current entry in the input list that is not exhausted,
          copy the remaining entries to OutputList.
    
    
    To understand how the recursion unfolds, let's consider the order of the characters after each activation of the procedure MergeLists completes. We use underlines to help display sublists which are known to be sorted at each point in time, with the most recently merged portion shown in boldface.
    D  O  S  A  M  P  L  E
    
    D  O  S  A  M  P  L  E
    
    D  O  A  S  M  P  L  E
    
    A  D  O  S  M  P  L  E
    
    A  D  O  S  M  P  L  E
    
    A  D  O  S  M  P  E  L
    
    A  D  O  S  E  L  M  P
    
    A  D  E  L  M  O  P  S
    

  • Efficiency Analysis
    Merge Sort guarantees that the overall number of operations used will be proportional to N log2 N.

    The difference between sorting in time proportional to N log N versus N2 is dramatic
    (as was the difference between searching in log N operations rather than N operations).

    N 100 1,000 10,000 100,000 1,000,000
    N log N 660 10,000 130,000 1,660,000 20,000,000
    N2 10,000 1,000,000 100,000,000 10,000,000,000 1,000,000,000,000


    Proof: Let's look at an explanation of this guarantee.

    If we first consider the number of operations which are used by the MergeLists procedure, we claim that the total is proportional to the sum of the lengths of the two input lists.

    Now, if we analyze the overall MergeSort computation, we can look at the hierarchy of subproblems, as shown in Figure 11.10.

    The two key facts are:

  • The decomposition has log2 N levels.
  • The combined work at each given level is proporitional to N.

  • comp150 Class Page
    mhg@cs.luc.edu
    Last modified: 14 March 2002