<< Chapter < Page | Chapter >> Page > |
Be end claime armed yes a bigged wenty for me
fearabbag girl Humagine ther mightmarkling themany the scarecrow pass and I havely and lovery
wine end at then only we pure never
many words appear, and many combinations of letters that
might be words, but aren't quite. “Humagine” is suggestive,though it is not clear exactly what “mightmarkling” might mean.
When
m=4
,
Water of everythinkies friends of the scarecrow
no head we She time toto be well as somealthough to they would her been Them became the
small directions and have a thing woodman
the vast majority of words are actual English, though the
occasional conjunction of words (such as “everythinkies”)is not uncommon. The output also begins to strongly
reflect the text used to derive the probabilities.Since many four-letter combinations occur only once,
there is no choice for the method to continue spellinga longer word; this is why the “scarecrow”
and the “woodman” figure prominently. For
m=5
and above,
the “random” output is recognizably English, and strongly dependent onthe text used:
Four trouble and to taken until the bread
hastened from its Back to you over the emeraldcity and her in toward the will Trodden and
being she could soon and talk to travely lady
Run the program
textsim.m
using the input file
carroll.mat
, which contains the text to Lewis Carroll's
Through the Looking Glass , with
m=1, 2, ..., 8
.
At what point does the output repeat large phrases from theinput text?
Run the program
textsim.m
using the input file
foreign.mat
, which contains a book that is not in English.
Looking at the output for various
m
, can you tell
what language the input is? What is the smallest
m
(if any) at which it becomes obvious?
The following two problems may not appeal to everyone:
The program
textsim.m
operates at the level of letters
and the probabilities of transition between successivesets of
m
-length letter sequences. Write an analogous
program that operates at the level of words and the probabilitiesof transition between successive sets of
m
-length
word sequences. Does your program generateplausible sounding phrases or sentences?
There is nothing about the technique of
textsim.m
that
is inherently limited to dealing with text sequences.Consider a piece of (notated) music as a sequence of
symbols, labelled so that each “C” note is 1, each “C
”
note is 2, each “D” note is 3, etc. Create a table oftransition probabilities from a piece of music, and then
generate “new” melodies in the same way that
textsim.m
generates “new” sentences. (Observe that this procedure can be
automated using standard MIDI files as input.)
Because this method derives the multiletter probabilities directly
from a text, there is no need to compile transition probabilitiesfor other languages. Using Vergil's
Aeneid (with
m=3
)
gives
Aenere omnibus praeviscrimus habes ergemio nam
inquae enies Media tibi troius antis igna volaesubilius ipsis dardatuli Cae sanguina fugis
ampora auso magnum patrix quis ait longuin
which is not real Latin. Similarly,
Que todose remosotro enga tendo en guinada y
ase aunque lo Se dicielos escubra la no fuertapare la paragales posa derse Y quija con figual
se don que espedios tras tu pales del
Notification Switch
Would you like to follow the 'Software receiver design' conversation and receive update notifications?