Lecture Notes 03 (22 January 2002)

Bits, Memory and Information


Overall Reading
Brookshear: Ch. 1.1-1.4 and pp. 63-65
Decker/Hirshfield: pp. 206-207

Outline:

  • Overview
  • Analog vs. Digital (Ch. 1.1 [Br])
  • a "bit" (binary digit) of information
  • Orders of Magnitude
  • Data Storage (Ch. 1.2-1.3 [Br])
  • Main Memory (Ch. 1.2 [Br])
  • Mass Storage (Ch. 1.3 [Br])
  • Data Representation (Ch. 1.4, pp. 63-65 [Br]; pp. 206-207 [DH])
  • Numbers (next lecture)
  • Text (pp. 36-37 [Br]; pp. 206-207 [DH])
  • Audio (p. 64 [Br])
  • Images (pp. 42-43, 63-65 [Br])
  • Video (pp. 64-65 [Br])
  • Encryption
  • Proprietary Formats

  • Overview

  • Analog vs. Digital
    Many traditional properties of the physical world are analog, measured or transmitted using a continuous scale (e.g. volume, distance, weight).

    However, a unifying feature of modern computing systems are that they represent information using physical means which generally can be in one of a fixed number of distinct states. We refer to such representations as digital.

    In particular, almost all modern systems rely on a physical means with exactly one of two distinct states. Such a two-state mechanism can be manifested by means of:

  • hand gestures ("thumbs up" vs. "thumbs down")
  • flag and flagpole
  • dots and dashes in Morse Code.
  • light switch with physical circuit opened or closed.
  • "cores" (1960s)
    Magnetic rings threaded on wires. Electromagnetism could be used to set the field in positive/negative direction.
  • "capacitor"
    two small metallic plates with a small distance between them. Can be charged or discharged.
  • "flip/flop"
    electronic circuit with output as high/low voltage
    (more on these circuits later in the course)
  • "quantum bit"
    orientation of electron spin in an atom
    (though issue remains, how do you read value?)

  • Each of the above has a varying level of power usage, cost, and volatility.
  • Core will keep its charge even after power is shut off.
  • Flip-flop will lose its data when power is turned off.
  • Capacitors use such small charges, then sometime lose their charge while running, unless recharged regularly
    (this is called "dynamic memory").

  • A "bit" (binary digit) of information (part of Ch. 1.1 [Br])

    No matter which of the physical means is being used, we think of such a two-state system as an abstraction called the "bit" (binary digit).

    Often we think of as

  • 'on' or 'off'
  • 'open' or 'closed'
  • '0' or '1'
  • 'true' or 'false'
  • Even when the true physical mechanism may differ.


  • Orders of Magnitude

    So, one bit can be in either of two distinct states.

    If you look at two distinct bits together, you will find that the pair can be set in one of four distinct states (00, 01, 10, 11).

    If you look at three distinct bits, you can set them in one of eight distinct states.

    In general, a series of n bits can be set in any of 2ndistinct patterns.

    Common quantities:

    1 byte=8 bits(thus 256 distinct settings)
    1 Kilobyte (KB)=210 bytes=1024 bytes
    1 Megabyte (MB)=210 KB=1048576 bytes
    1 Gigabyte (GB)=210 MB=1 billion+ bytes
    1 Terabyte (TB)=210 GB=1 trillion+ bytes
    1 Petabyte=210 Tb=250 bytes

  • Data Storage (Ch. 1.2-1.3 [Br])

  • Main Memory (Ch. 1.2 [Br])
    Main memory is a collections of cells (or 'words').
    Cells may be one byte or several bytes, depending on the design of the machine.

    Each cell is assigned a unique name, called its "address"
    (e.g., cell #0, #1, #2, #3, ...)

    RAM == Random Access Memory
    Allows "random access" ('arbitrary' access is more fitting term)
    (Note difference between DVD vs. VHS, or CD vs. Audio Tape)

    Memory might be "read-only" or may allow both read/write.


  • Mass Storage (Ch. 1.3 [Br])

  • Main Memory (perhaps up to 1MB)
    Generally, due to the need for speed, main memory is based on a collection of electronic circuits.
    Downside is expense, size and that memory is lost without power supply.

  • Magnetic Disks (1.44MB to 20GB)
    Single or multiple magnetic disks mounted on a common spindle.
    Two degrees of motion:
  • - read/write heads move in/out in distance from center.
  • - disks spin so that each part of disk can be reached by heads.
  • A Track is a circle of a fixed radius.
    A Sector is an arc of that circle of a given size
    A Cylinder is a collection of tracks, one per disk, all at the same radius.

    Relevant times:

  • - seek time: time to move heads from one track to another
  • - rotation delay: time to rotate disk halfway
  • - transfer rate: rate at which data can be transfered.
  • Essentially, disks are a bit slow to start up, though once they get going they can transfer data relatively quickly.

  • Compact Disks -- hold 600+ MB
    reflective material, with irregularities read by laser beam.

    Has single track that spirals.
    (to transfer data at constant rate, disk must actually be rotated at varying speeds)

  • DVD (Digital Versatile Disk) -- hold approximately 10 GB
    Similar to a CD, but able to support multiple, independent layers of 'depth' on the same surface.

  • Magnetic Tape
    reel-to-reel. Old fashioned but high-capacity.

    Downside is that you have to fast-forward/rewind to get to a desired part of the tape.


  • Data Representation (Ch. 1.4, pp. 63-65 [Br]; pp. 203-207 [DH])

  • Numbers
    Representing numbers digitally is the most important of all types of data (also because other types of information depend on representing numbers in part).

    We will study the details of this next time...

  • Text (pp. 36-37 [Br]; pp. 206-207 [DH])

    Text is relatively easy to represent, because an alphabet is already a discrete collection of symbols. We can encode text digitally by using a different pattern of bits in place of each alphabet character of the given text.

    Example: Morse Code [ alphabet, MP3 sample, Real Audio sample, demonstration Applet, another Applet ]

    If all characters are encoded using the same number of bits, this is a fixed-length encoding.
    (Note: Morse Code is not fixed-length; commonly used letters were assigned shorter codes)

    Most important is that both the creator and the reader use the same conventions for numbering the different symbols.

  • ASCII (American Standard Code for Information Interchange)
    Standards were set up for representing characters with 7-bit patterns. Thus can represent 27 = 128 different characters. This allows all upper/lower case letters, punctuation, numbers and several "control" characters. By using 8-bit patterns instead, this fit naturally into a byte and could represent an additional 128 characters, though no consistent standard existed for these additional characters.

  • UNICODE
    More recently developed standard uses 16-bit patterns. This allows 65536 different characters, and thus was developed to support all other languages such as Chinese and Japanese, as well as common typography from other disciplines, such as mathematics, music, etc.

  • ISO (International Organization for Standardization)
    could develop 24-bit patterns to represent symbols (17 million of them, potentially), or even 32-bit patterns (2 billion of them).
  • Audio (p. 64 [Br])
    First issue is converting a physical media (sound waves) from analog to digital and vice versa. Can digitize sound by sampling the amplitude of the sound waves at various increments.

    To achieve acceptable quality for human perception, there are two issues:

  • Must sample often enough. (Audio CDs: 44100 samples per second)
  • Amplitudes values are represented digitally; must decide how many bits to use for range of values. (Audio CDs: 16 bits per channel)
  • Based on above, 1 second of music in stereo requires over 1 million bits.

    Can try to use compression techniques to reduce the space requirements.
    e.g MP3 (MPEG-1 Audio Layer-3) can achieve 12:1 compression ratios.

  • Images (pp. 42-43, 63-65 [Br])
    Again, images are inherently a continuous (analog) media. Two main approaches to digitizing:

  • Bitmap Techniques (two-dimensional array of pixels)
  • BMP (primitive file format used by Microsoft)
  • TIFF (Tagged Image File Format)
  • GIF (Graphics Interchange Format)
  • JPEG (Joint Photographic Experts Group)
  • Vector Techniques (mathematical representations of curves and lines)
    Advantage is that it is independent of any particular display scale.
  • EPS (Encapsulated PostScript)
  • PICT (Macintosh's file format)
  • Note: Many of the above formats also use compression to reduce storage space.
  • Video (pp. 64-65 [Br])
    Audio track can use similar techniques as audio-only.

    Video track can save space by taking advantage both of image compression and temporal similarity from one frame to the next.

  • MPEG (Motion Pictures Expert Group), MPEG-2, MPEG-3
  • Encryption
    Often we want to save or transmit information digitally, so that a recipient can reconstruct the original information, but that an unintended interceptor cannot.

    For those interested, read second full paragraph on p. 159 [Br], though we will not cover this topic in this course.

  • Proprietary Formats
    Companies or industries can develop their own conventions for representing information digitally, when saved to a file. Sometimes, formats are chosen for efficiency; sometimes as a form of encryption.
  • Microsoft Word files
  • DVD Movie Formats

  • comp150 Class Page
    mhg@cs.luc.edu
    Last modified: 22 January 2002