COMP 150, Lecture Notes 18, Spring 2002

Lecture #18 (21 March 2002)

Data Structures: Binary Trees

Overall Reading
Brookshear:	Ch. 7.5

Outline:

Trees (pp. 341-345 [Br])

Application: A simple database (pp. 345-352 [Br])

Trees (pp. 341-345 [Br])

We see hierarchies in many different places, both outside of Computer Science and within Computer Science:

Company Organization (President, VPs, Managers, ...)

Biology Taxonomy (Kingdom, Phylum, Class, ...)

Genealogy (Abraham, Isaac, Jacob, ...)

Table of Contents for a text book

Parsing Arithmetic expressions

File Systems (folders and sub-folders)

Web Portals (such as Yahoo)

One Caveat: For today, we will assume that all hierarchies are defined in a way so that a particular item only appears under one category from the previous level.

Terminology:

We will call such a hierarchy a "tree" though we often draw them from the top down.

Each position will be called a "node"

The topmost position is the "root" of the tree.

Nodes at the other extreme are called "leaves"

We will talk about a node's "parent," its "children," and its "siblings"

There is also a nice recursive nature to trees. That is, if you consider any particular node and that node's descendents it has the same structure as a tree. We will call such a structure a "subtree" of the original tree.

Implementation
For today, we will concern ourselves only with a special class of trees called binary trees. These are trees in which each node has at most two children.

Such trees are often stored in memory using a linked structure similar to a linked list. Recall that each node of a linked list stored some data as well as one pointer which referenced the following node of the list. For trees, each node will have the following components:

the data

a pointer to the node's first child ("left child")

a pointer to the node's second child ("right child")

(optionally, we may wish to store an additional pointer, to the parent)

By convention, either or both of a node's pointers can be set to NIL to designate a situation where no such child exists.
Also, we will need to make sure to keep a pointer to the root of the tree somewhere, or we won't be able to find it.

Let's view an example, such as that of Figure 7.18 (p. 344 of [Br])

Alternate Implementation
A binary tree can be represented using an array, rather than a linked list, as follows. We can store the root at the first cell of the array. It's left and right children in cells 2 and 3, respectively. In general, if a node is found in cell n, then its children should be stored in cells 2n and 2n+1.

Figure 7.19 (p. 345 of [Br]) shows the array representation of the same tree that was discussed in Figure 7.18.

Application: A simple database (pp. 345-352 [Br])

Let's revisit the issue of storing a list of names. Assume that we have an application in which we want to have the following operations:

search for the presence of an entry

print the list in alphabetical order

insert a new entry

How do we accomplish this?

Use an array?

If we choose to keep items alphabetized, we can use binary search algorithm to efficiently search for an entry.

Also, straightforward to print in alphabetized order.

However, inserting a new item can be inefficient. We might need to shift a large number of items to make room in the middle

Use a linked list?

If we know where to place a new item, we can insert it into the linked list efficiently. (effort is independent on the overall number of items)

Straightforward to print in alphabetized order.

However, searching for an item is inefficient. We are forced to use sequential search rather than binary search because there is no way to "jump" to the middle of the list for comparison.

Use a "binary search tree" Instead, we can use a binary tree structure with the following additional property:

All entries stored as part of the "left" subtree of a given node should store a name which is alphabetized before the name stored at the given node.

All entries stored as part of the "right" subtree of a given node should store a name which is alphabetized after the name stored at the given node.

In essence, we can take a given sequence of names and build such a "binary search tree" which in effect represents much of the behavior of the binary search algorithm.

Let's view an example, such as that of Figure 7.21 (p. 347 of [Br]) as follows. We can put the 'middle' entry at the root of the tree, with all entries from the first half of the list stored recursively in the "left" subtree of the root. Similarly, all entries with values beyond the 'middle' value are stored recursively in the "right" subtree.

search for the presence of an entry

The binary search tree properties are defined precisely so that they support the search operation.

  procedure BinarySearch(Tree, TargetValue)
    assign CurrentPointer the value in the root pointer of Tree
    assign Found the value "false"
    while (Found is "false" and CurrentPointer is not NIL) do
     [
      case 1:   TargetValue = current node's value
         (assign Found the value "true")         

      case 2:   TargetValue < current node's value
         (assign CurrentPointer the value in
            current node's left child pointer)

      case 3:   TargetValue > current node's value
         (assign CurrentPointer the value in
            current node's right child pointer)
     ]

print the list in alphabetical order

Printing the names in alphabetical order does not seem as straightforward as when using a linked list. However, it can be very straightforward if we think recursively.

  procedure PrintTree(Tree)
    if (Tree is not empty) then
      [
        apply PrintTree to the root node's left subtree

        print the name at the root
 
        apply PrintTree to the root node's right subtree
      ]

Let's try to walk through the execution on an example, recalling how recursion works from Ch. 4.5 of [Br].

insert a new entry

Finally, we can insert new items into an existing binary search tree as follows. To find a proper place to insert the name, start at the root and move down the tree, as if performing a search for the new name.

Since the entry is new, the unsuccessful search will eventually reach the bottom of the tree. We can properly insert the new item as the left or right child of the point where the search ended. (intuitively, this is the correct place to put it because it is where a search for the new name would look for it)

Again, we can walk through an example, such as that of Figure 7.25 (p. 350 of [Br])

Aside: If many items get inserted in one section of the tree, we no longer have the perfect correspondance between the structure of the tree and the behavior of the binary search algorithm. That is, our tree may become unbalanced. In practice, either we can hope that insertions do not drastically unbalance our tree or we can take additional steps to alter our tree at certain points to restore the balance.

A Binary Search Tree demo.

comp150 Class Page
mhg@cs.luc.edu

Last modified: Mon Mar 18 20:37:13 CST 2002