Lecture #19 (26 March 2002)

Databases

Overall Reading
Brookshear: Ch. 9.1, pp. 404-409

Outline:

  • General Issues (Ch. 9.1 [Br])
  • (skipping Ch. 9.2 [Br] on Implementation)
  • Relational Database Model (pp. 404-415 [Br])
  • Relational Design (pp. 404-409 [Br])

  • General Issues (Ch. 9.1 [Br])

    In an earlier lecture, we gave an example of a "simple database" which stored a list of names such that we could:
  • search for the presence of an entry
  • print the list in alphabetical order
  • insert a new entry
  • We saw some data structures which might help do this efficiently. However, as we described it then, this was one-dimensional.
    That is, we were only allowed to search for information based on one particular field of information (the name).

    More generally, we want databases to handle information about many different items, each of which may have many different fields of data.

    Employee Record:

  • First Name
  • Last Name
  • Social Security Number
  • Birthday (Date/Month/Year)
  • Home Address
  • Start Date
  • End Date
  • Job Title
  • Department
  • Manager
  • Office
  • Phone
  • Salary
  • Already, this is a more complicated situation.
  • Searching on a single field:
    If the only operation I needed was to be able to search for employees based on the last name, we could keep all of the records in a flat-file, with all records pre-sorted based on the name. (We could be efficient, and use binary search!)

  • Searching on other fields?
    But what if I wanted to search for people with a birthday today?!
    Sequential search is slow. But cannot use binary search unless all employee records were sorted based on birthdate.

    If efficiency important, perhaps I should keep two copies of all of the records, with one version sorted by name and another version sorted by birthdate.

  • Searching based on multiple fileds?
    What if I want to find all people who have been with the company 10 years or more, but are making less than $50000?

  • Duplication is bad
    Consider the suggestion that we keep two copies of all records, so that we can sort one by name and one by birthday. This wastes memory. Also updating employee information becomes perilous, because we need to make sure of consistency. (And the situation just gets worse if we have three copies or four copies, etc.)

  • Privacy:
    Different people should have different views and access to the information (schema vs. sub-schema)

    For instance, payroll should have access to my salary and home address, but general employees should probably only see my name, office, and phone number. My manager might know my salary, but not my social security number.

  • Integrated data system:
    We want to think about all of these needs ahead of time, and to design an integrated database which can meet all of these needs, as efficiently as possible in terms of time and memory usage.

  • (skipping Ch. 9.2 [Br] on Implementation)

    General theme is that there are many layers of abstraction separating the user's view form the details of the actual data in memory.


    Relational Database Model (Ch. 9.3 [Br])

  • Overview:

    The most popular databases in use today are based on a relational database model (e.g., Oracle, Informix, Microsoft Access)

    The database portrays the data as if it were stored in rectangular tables, called relations.

    Example (from Fig. 9.3):

    EmplIDNameAddressSSNum
    25X15 Joe E. Baker 33 Nowhere St. 111223333
    34Y70 Cheryl H. Clark 563 Downtown Ave. 999009999
    23Y34 G. Jerry Smith 1555 Circle Dr. 111005555
    . . . .
    . . . .
    . . . .

    We call one row of this table a tuple.


  • Relational Design (pp. 404-409 [Br])

  • An example (of a non-relational database):

    Name Birthday Zodiac Birthstone
    Art Alexakis Apr. 12 Aries diamond
    Hank Azaria Apr. 25 Tauras diamond
    Antonio Banderas Aug. 10 Leo peridot
    Lucas Black Nov. 29 Sagittarius citrine
    Matthew Broderick Mar. 21 Pisces aquamarine
    Sandra Bullock July 26 Leo ruby
    Steve Buscemi Dec. 13 Sagittarius turquoise
    Nicolas Cage Jan. 7 Capricorn garnet
    Jim Carrey Jan. 17 Capricorn garnet
    George Clooney May 6 Tauras emerald
    Courteney Cox June 15 Gemini pearl
    John Cusack June 28 Cancer pearl
    Joan Cusack Oct. 11 Libra opal
    Matt Damon Oct. 8 Libra opal
    Robert De Niro Aug. 17 Leo peridot
    Leonardo DiCaprio Nov. 11 Scorpio citrine
    Cameron Diaz Aug. 30 Virgo peridot
    Stephen Dorff July 29 Leo ruby
    Minnie Driver Jan. 31 Aquarius garnet
    Calista Flockhart Nov. 11 Scorpio citrine
    Harrison Ford July 13 Cancer ruby
    Jodie Foster Nov. 19 Scorpio citrine
    Mel Gibson Jan. 3 Capricorn garnet
    Joseph Gordon-Levitt Feb. 17 Aquarius amethyst
    Tom Hanks July 9 Cancer ruby
    Dustin Hoffman Aug. 8 Leo peridot
    Timothy Hutton Aug. 16 Leo peridot
    Jeremy Irons Sep. 19 Virgo sapphire
    Joshua Jackson June 11 Gemini pearl
    Greg Kinnear June 17 Gemini pearl
    Kevin Kline Oct. 24 Scorpio opal
    Jeremy London Nov. 7 Scorpio citrine
    Courtney Love July 9 Cancer ruby
    Peter MacNicol Apr. 10 Aries diamond
    William H. Macy Mar. 13 Pisces aquamarine
    John Malkovich Dec. 9 Sagittarius turquoise
    Julianna Margulies June 8 Gemini pearl
    Matthew McConaughey Nov. 4 Scorpio citrine
    Dylan McDermott Oct. 26 Scorpio opal
    Frances McDormand June 23 Cancer pearl
    Jack Nicholson Apr. 22 Aries diamond
    Edward Norton Aug. 18 Leo peridot
    Gwyneth Paltrow Sep. 28 Libra sapphire
    Sean Penn Aug. 17 Leo peridot
    Matthew Perry Aug. 19 Leo peridot
    Michelle Pfeiffer Apr. 29 Tauras diamond
    Brad Pitt Dec. 18 Sagittarius turquoise
    Natalie Portman June 9 Gemini pearl
    Aidan Quinn Mar. 8 Pisces aquamarine
    Giovanni Ribisi Mar. 31 Aries aquamarine
    Julia Roberts Oct. 28 Scorpio opal
    Keri Russell Mar. 23 Pisces aquamarine
    Adam Sandler Sep. 9 Virgo sapphire
    Susan Sarandon Oct. 4 Libra opal
    Rick Schroder Apr. 13 Aries diamond
    Gary Sinise Mar. 17 Pisces aquamarine
    Kevin Spacey July 26 Leo ruby
    Rider Strong Dec. 11 Sagittarius turquoise
    Billy Bob Thornton Aug. 4 Leo peridot
    Uma Thurman Apr. 29 Tauras diamond
    Skeet Ulrich Jan. 20 Capricorn garnet
    Eddie Vedder Dec. 23 Capricorn turquoise
    Robin Williams July 21 Cancer ruby
    Kate Winslet Oct. 5 Libra opal
    Robin Wright Penn Apr. 8 Aries diamond

    Though this may contain the information that we want, there are several problems with such a structure.

  • Duplication: For example, Jack Nicholson, Uma Thurman and Michelle Pfeiffer were all born in April and all have a diamond as a birthstone. We seem to be storing this information in several places.

  • Making modifications: What if it is decided that April's birthstone will be changed to quartz? It appears you will need to go and change many entries of your table.

  • Deletions: In making deletions, you might accidentally lose some useful information. For instance, Joseph Gordon-Levitt is the only person in our database born in February. We might have a reason to remove that person from our database. But now, we seem to have lost the correspondance between February and amethyst. That is, if we later add a new person to the database born in February, we do not seem to know the proper birthstone.
  • A better way: We can instead represent our database using a combination of three relations, as follows:

    Name Birthday
    Art Alexakis Apr. 12
    Hank Azaria Apr. 25
    Antonio Banderas Aug. 10
    Lucas Black Nov. 29
    Matthew Broderick Mar. 21
    Sandra Bullock July 26
    Steve Buscemi Dec. 13
    Nicolas Cage Jan. 7
    Jim Carrey Jan. 17
    George Clooney May 6
    Courteney Cox June 15
    John Cusack June 28
    Joan Cusack Oct. 11
    Matt Damon Oct. 8
    Robert De Niro Aug. 17
    Leonardo DiCaprio Nov. 11
    Cameron Diaz Aug. 30
    Stephen Dorff July 29
    Minnie Driver Jan. 31
    Calista Flockhart Nov. 11
    Harrison Ford July 13
    Jodie Foster Nov. 19
    Mel Gibson Jan. 3
    Joseph Gordon-Levitt Feb. 17
    Tom Hanks July 9
    Dustin Hoffman Aug. 8
    Timothy Hutton Aug. 16
    Jeremy Irons Sep. 19
    Joshua Jackson June 11
    Greg Kinnear June 17
    Kevin Kline Oct. 24
    Jeremy London Nov. 7
    Courtney Love July 9
    Peter MacNicol Apr. 10
    William H. Macy Mar. 13
    John Malkovich Dec. 9
    Julianna Margulies June 8
    Matthew McConaughey Nov. 4
    Dylan McDermott Oct. 26
    Frances McDormand June 23
    Jack Nicholson Apr. 22
    Edward Norton Aug. 18
    Gwyneth Paltrow Sep. 28
    Sean Penn Aug. 17
    Matthew Perry Aug. 19
    Michelle Pfeiffer Apr. 29
    Brad Pitt Dec. 18
    Natalie Portman June 9
    Aidan Quinn Mar. 8
    Giovanni Ribisi Mar. 31
    Julia Roberts Oct. 28
    Keri Russell Mar. 23
    Adam Sandler Sep. 9
    Susan Sarandon Oct. 4
    Rick Schroder Apr. 13
    Gary Sinise Mar. 17
    Kevin Spacey July 26
    Rider Strong Dec. 11
    Billy Bob Thornton Aug. 4
    Uma Thurman Apr. 29
    Skeet Ulrich Jan. 20
    Eddie Vedder Dec. 23
    Robin Williams July 21
    Kate Winslet Oct. 5
    Robin Wright Penn Apr. 8

    StartDate EndDate Birthstone
    January 1 January 31 garnet
    February 1 February 29 amethyst
    March 1 March 31 aquamarine
    April 1 April 30 diamond
    May 1 May 31 emerald
    June 1 June 30 pearl
    July 1 July 31 ruby
    August 1 August 31 peridot
    September 1 September 30 sapphire
    October 1 October 31 opal
    November 1 November 30 citrine
    December 1 December 31 turquoise

    StartDate EndDate Zodiac
    Feb. 19 Mar. 20 Pisces
    Mar. 21 Apr. 19 Aries
    Apr. 20 May 20. Tauras
    May 21 June 20 Gemini
    June 21 July 22 Cancer
    July 23 Aug. 22 Leo
    Aug. 23 Sep. 22 Virgo
    Sep. 23 Oct. 22 Libra
    Oct. 23 Nov. 21 Scorpio
    Nov. 22 Dec. 21 Sagittarius
    Dec. 22 Jan. 19 Capricorn
    Jan. 20 Feb. 18 Aquarius

    Query: Find all Capricorns


  • More complicated example (from text):
    Revisiting the exployee database, imagine that we want more detailed information such as the person job history within the company. Details should include the job title, a job identification code, the skill code associated with each job, the department, and the period of time duirng which the employee held that job.

    First approach (Fig. 9.4 of text):

    EmplIDNameAddressSSNum JobIDJobTitleSkillCodeDept StartDateTermDate
    25X15 Joe E. Baker 33 Nowhere St. 111223333 F5 Floor manager FM3 Sales 9-1-1998 9-30-1999
    25X15 Joe E. Baker 33 Nowhere St. 111223333 D7 Dept. head D2 Sales 10-1-1999 present
    34Y70 Cheryl H. Clark 563 Downtown Ave. 999009999 F5 Floor manager FM3 Sales 10-1-1998 present
    23Y34 G. Jerry Smith 1555 Circle Dr. 111005555 S25X Secretary T5 Personnel 3-1-1996 4-30-1998
    23Y34 G. Jerry Smith 1555 Circle Dr. 111005555 S25X Secretary T6 Accounting 5-1-1998 present
    . . . . . . . . . .
    . . . . . . . . . .
    . . . . . . . . . .

    Again, this suffers from the same pitfalls

  • Redundancy (such as Joe's address)

  • Deletions may cause loss of information. (if Joe quits, we would lose informaiton such as the Skill Code for job D7)


  • An alternative solution: Use three separate relations. (Fig. 9.5)

  • EMPLOYEE relation
    EmplIDNameAddressSS Num
    25X15 Joe E. Baker 33 Nowhere St. 111223333
    34Y70 Cheryl H. Clark 563 Downtown Ave. 999009999
    23Y34 G. Jerry Smith 1555 Circle Dr. 111005555
    . . . .
    . . . .
    . . . .

  • JOB relation
    JobIDJobTitleSkillCodeDept
    S25X Secretary T5 Personnel
    S26Z Secretary T6 Accounting
    F5 Floor manager FM3 Sales
    . . . .
    . . . .
    . . . .

  • ASSIGNMENT relation
    EmplIDJobIDStartDateTermDate
    23Y34 S25X 3-1-1996 4-30-1998
    34Y70 F5 10-1-1998 present
    23Y34 S25Z 5-1-1998 present
    . . . .
    . . . .
    . . . .
  • Query: Find all people in Accounting

  • A Flawed Design

    Imagine that you had the following relation:

    EmplIDJobTitleDept
    25X15 Floor manager Sales
    28Z25 Secretary Sales
    23Y34 Secretary Accounting

    and that you tried to replace it with a combination of the following two relations:

    EmplIDJobTitle
    25X15 Floor manager
    28Z25 Secretary
    23Y34 Secretary

    JobTitleDept
    Floor manager Sales
    Secretary Sales
    Secretary Accounting

    Is this new formation equivalent? No!

    Example: How do you determine the employees in the Sales department?


  • comp150 Class Page
    mhg@cs.luc.edu
    Last modified: Mon Mar 18 20:37:13 CST 2002