<< Chapter < Page Chapter >> Page >
Using full scale Wordcorr to guide Sherlock Holmes's thinking.

Data into wordcorr

Putting together a Holmesian story was fun. It didn't require much real work, because I already had a Wordcorr collection, JG-SulSel12, that included the three languages talked about.

To begin, I used the Connexions tutorial Installing Wordcorr to install Wordcorr on a new XP laptop, a Lenovo T43 ThinkPad. I had written other parts of these tutorials on one of the Toshiba laptops used to create Wordcorr itself; but after three and a half years of pretty intense work its hard disk was going bad.

At an early stage in the development of Wordcorr, I had already imported some WordSurv data into Wordcorr, the SulSel (Indonesian abbreviation for South Sulawesi) data on twelve languages of the region. Then I exported it to an XML file in order to test both operations. For fifteen years before that I had kept the same data under an old program, WordSurv 1, that had preserved the data for me but did not have capabilities for handling either the International Phonetic Alphabet symbols (IPA) or the comparative analysis.

So I already had a Wordcorr XML file containing the data and some earlier analyses. Before I imported it to the new computer, I had already verified the correctness of most of the IPA transcriptions on the older computer, and edited the faulty ones using Wordcorr's intuitive approach to typing IPA.

The holmes view

Within the Wordcorr collection that I was doing serious work on, I put up a special view called "Holmes" that showed only the three languages I was focusing on and left the other nine out. (Wordcorr never destroys or loses data; but any new view can be set to bypass some varieties.) I set a threshold value of 60% for the Holmes view, to make sure that all correspondence sets contained information from at least two of the three languages.

Then I went through Wordcorr's basic Annotate-Tabulate-Refine cycle for the first hundred entries. You'll hear enough about that cycle later, so I'll resist getting you bogged down in the details here. It didn't take very long, a few entries one day, a couple of dozen the next, when other projects made me sleepy.

Finally I invoked the Summarize Evidence function from the Refine panel, giving it a cutoff of 0.0 on Frantz's measure of strength in order to filter out reconstructions whose component correspondence sets were only weakly attested.

Pinpointing the relevant data

At that point I settled down to scrutinize the patterns that were emerging. Starting with the best attested correspondence sets (those accompanied in the reconstructions only by other well attested correspondence sets), in about three hours I identified which forms were worth having Holmes call attention to.

Retrogression

Since the International Phonetic Association was just getting going around 1888, I retranscribed the raw data using the phonetic alphabet that Henry Sweet in London (the prototype for Shaw's Henry Higgins in Pygmalion , later My Fair Lady ) and his colleagues in Paris were discussing at that time, with a little fudging on details they hadn't gotten around to yet. The IPA alphabet has been upgraded several times since then, but its essential character has changed little.

While I was doing the real linguistics, I also poked around in Wikipedia.org for all kinds of historical, political, computational, linguistic, and literary information, besides reading through a collection of A. Conan Doyle's Holmes stories paying special attention to their style. I found a suitable MacGuffin that got a sorry excuse for a plot going: the languages it was convenient for me to use are spoken just west of the Spice Islands, and British boiled mutton seems to have been even blander then than it is now.

And I hope you enjoyed reading it as much as I enjoyed writing it. Maybe you saw something there about language that you hadn't noticed before.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Comparative phonology using wordcorr. OpenStax CNX. Aug 30, 2007 Download for free at http://cnx.org/content/col10351/1.19
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Comparative phonology using wordcorr' conversation and receive update notifications?

Ask