<< Chapter < Page Chapter >> Page >
Be end claime armed yes a bigged wenty for me fearabbag girl Humagine ther mightmarkling themany the scarecrow pass and I havely and lovery wine end at then only we pure never

many words appear, and many combinations of letters that might be words, but aren't quite. “Humagine” is suggestive,though it is not clear exactly what “mightmarkling” might mean. When m=4 ,

Water of everythinkies friends of the scarecrow no head we She time toto be well as somealthough to they would her been Them became the small directions and have a thing woodman

the vast majority of words are actual English, though the occasional conjunction of words (such as “everythinkies”)is not uncommon. The output also begins to strongly reflect the text used to derive the probabilities.Since many four-letter combinations occur only once, there is no choice for the method to continue spellinga longer word; this is why the “scarecrow” and the “woodman” figure prominently. For m=5 and above, the “random” output is recognizably English, and strongly dependent onthe text used:

Four trouble and to taken until the bread hastened from its Back to you over the emeraldcity and her in toward the will Trodden and being she could soon and talk to travely lady

Run the program textsim.m using the input file carroll.mat , which contains the text to Lewis Carroll's Through the Looking Glass , with m=1, 2, ..., 8 . At what point does the output repeat large phrases from theinput text?

Run the program textsim.m using the input file foreign.mat , which contains a book that is not in English. Looking at the output for various m , can you tell what language the input is? What is the smallest m (if any) at which it becomes obvious?

The following two problems may not appeal to everyone:

The program textsim.m operates at the level of letters and the probabilities of transition between successivesets of m -length letter sequences. Write an analogous program that operates at the level of words and the probabilitiesof transition between successive sets of m -length word sequences. Does your program generateplausible sounding phrases or sentences?

There is nothing about the technique of textsim.m that is inherently limited to dealing with text sequences.Consider a piece of (notated) music as a sequence of symbols, labelled so that each “C” note is 1, each “C ” note is 2, each “D” note is 3, etc. Create a table oftransition probabilities from a piece of music, and then generate “new” melodies in the same way that textsim.m generates “new” sentences. (Observe that this procedure can be automated using standard MIDI files as input.)

Because this method derives the multiletter probabilities directly from a text, there is no need to compile transition probabilitiesfor other languages. Using Vergil's Aeneid (with m=3 ) gives

Aenere omnibus praeviscrimus habes ergemio nam inquae enies Media tibi troius antis igna volaesubilius ipsis dardatuli Cae sanguina fugis ampora auso magnum patrix quis ait longuin

which is not real Latin. Similarly,

Que todose remosotro enga tendo en guinada y ase aunque lo Se dicielos escubra la no fuertapare la paragales posa derse Y quija con figual se don que espedios tras tu pales del

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Software receiver design. OpenStax CNX. Aug 13, 2013 Download for free at http://cnx.org/content/col11510/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Software receiver design' conversation and receive update notifications?

Ask