Numbers
Representing numbers digitally is the most important of all types of
data (also because other types of information depend on representing
numbers in part).
We will study the details of this next time...
Text
(pp. 36-37 [Br]; pp. 206-207 [DH])
Text is relatively easy to represent, because an alphabet is already a
discrete collection of symbols. We can encode text digitally by using
a different pattern of bits in place of each alphabet character of the
given text.
Example: Morse Code
[
alphabet,
MP3 sample,
Real Audio sample,
demonstration Applet,
another Applet
]
If all characters are encoded using the same number of bits, this is
a fixed-length encoding.
(Note: Morse Code is not fixed-length; commonly used letters were
assigned shorter codes)
Most important is that both the creator and the reader use the same
conventions for numbering the different symbols.
ASCII (American Standard Code for Information Interchange)
Standards were set up for representing characters with 7-bit patterns.
Thus can represent 27 = 128 different characters. This allows
all upper/lower case letters, punctuation, numbers and several
"control" characters. By using 8-bit patterns instead, this
fit naturally into a byte and could represent an additional 128
characters, though no consistent standard existed for these
additional characters.
UNICODE
More recently developed standard uses 16-bit patterns. This
allows 65536 different characters, and thus was developed to
support all other languages such as Chinese and Japanese, as
well as common typography from other disciplines,
such as mathematics, music, etc.
ISO (International Organization for Standardization)
could develop 24-bit patterns to represent symbols
(17 million of them, potentially), or even 32-bit patterns (2 billion of them).
Audio
(p. 64 [Br])
First issue is converting a physical media (sound waves) from
analog to digital and vice versa. Can digitize sound by sampling
the amplitude of the sound waves at various increments.
To achieve acceptable quality for human perception, there are two
issues:
Must sample often enough. (Audio CDs: 44100 samples per second)
Amplitudes values are represented digitally; must decide how many
bits to use for range of values. (Audio CDs: 16 bits per channel)
Based on above, 1 second of music in stereo requires over 1 million bits.
Can try to use compression techniques to reduce the space
requirements.
e.g MP3 (MPEG-1 Audio Layer-3) can achieve 12:1 compression ratios.
Images
(pp. 42-43, 63-65 [Br])
Again, images are inherently a continuous (analog) media. Two main
approaches to digitizing:
Bitmap Techniques (two-dimensional array of pixels)
BMP (primitive file format used by Microsoft)
TIFF (Tagged Image File Format)
GIF (Graphics Interchange Format)
JPEG (Joint Photographic Experts Group)
Vector Techniques (mathematical representations of curves and lines)
Advantage is that it is independent of any particular display scale.
EPS (Encapsulated PostScript)
PICT (Macintosh's file format)
Note: Many of the above formats also use compression to reduce storage
space.
Video
(pp. 64-65 [Br])
Audio track can use similar techniques as audio-only.
Video track can save space by taking advantage both of image
compression and temporal similarity from one frame to the next.
MPEG (Motion Pictures Expert Group), MPEG-2, MPEG-3
Encryption
Often we want to save or transmit information digitally, so that a
recipient can reconstruct the original information, but that an
unintended interceptor cannot.
For those interested, read second full paragraph on
p. 159 [Br], though we will not cover this topic in this course.
Proprietary Formats
Companies or industries can develop their own conventions
for representing information digitally, when saved to a file.
Sometimes, formats are chosen for efficiency; sometimes as a form of
encryption.
Microsoft Word files
DVD Movie Formats