<< Chapter < Page | Chapter >> Page > |
Often we want to represent data, e.g. characters, images, in a binary form. By binary form we mean representing by the symbols "0", and "1". Using binary representation allows us to conveniently store, retrieve, and manipulate them with a computer. To work with data in binary formwe must have a fixed way of encoding (representing) a fixed data stream. The set of all binary sequences in a representation of some data is called a code . (Note that this has nothing to do with cryptology). Usually we refer to the data that we want to represent by bits as a source .
Let us consider a very practical example of the above ideas. Let our source be a stream of English characters. Now we want to represent this stream of characters as bits, say to store it on a computer or send it over the Internet. First we need to know the number of such characters,which is (traditonally) conveniently set to 128. The number 128 is obtained by summing upper case charachters (26), lower case (26), digits (10), brackets and punctuation (20), odd characters (14) (the "&" is an odd character), and control characters (32).
Obviously we need to have a unique representation of each of the 128 characters, this can e.g. be obtained by exhausting the 128 bit combinations which concatenating 7 bits give. Thus we have devised an 7-bit code. A well known 7-bit code is ASCII , short for "American Standard Code for Information Interchange". Adding a parity bit for error control to the ASCII code forms an 8-bit code. As anexample, the representation of an "A" in ASCII is 1000001.
Now, one can ask whether the 7-bit ASCII code is an optimal representation in terms of using, on average, the minimum number of bits representing the English characters? We will return to this question later (in example 3 ).
When representing a source we want to use as few bits as possible, as this will imply that less disk space is required for storage or that transmission over the Internet is quicker. However, we do not want to use so few bits that the receiver cannot determine what was sent or stored.
So, for a given source what is the minimal representation? Here we consider the minimal representation as the representation that uses the minimum number of bits (on average) to encode the source without errors.According to Shannon's source coding theorem , a source that produces statistically independent outcomes, the minimum average number of bits per symbol is the entropy of the source! (A classical example of a source that produces statistically independent outcomes is throwing a die.)
Average indicates that the number of bits used for a specific symbol may be different from the number of bits representing another. E.g., as opposed to ASCII coding, we might represent an "A" with 7 bits, but an "E" with 3 bits.But it also implies that when you receive a series of symbols, the number you receive per time unit, say per second, will not be exactly the same, but averaged over a long term period, the rate is proportional to time with the rate per symbolas the proportionality constant.
Notification Switch
Would you like to follow the 'Information and signal theory' conversation and receive update notifications?