Skip to content Skip to navigation


You are here: Home » Content » Behind the Scenes


Recently Viewed

This feature requires Javascript to be enabled.

Behind the Scenes

Module by: Joseph Grimes. E-mail the author

Summary: Using full scale Wordcorr to guide Sherlock Holmes's thinking.

Data into Wordcorr

Putting together a Holmesian story was fun. It didn't require much real work, because I already had a Wordcorr collection, JG-SulSel12, that included the three languages talked about.

To begin, I used the Connexions tutorial Installing Wordcorr to install Wordcorr on a new XP laptop, a Lenovo T43 ThinkPad. I had written other parts of these tutorials on one of the Toshiba laptops used to create Wordcorr itself; but after three and a half years of pretty intense work its hard disk was going bad.

At an early stage in the development of Wordcorr, I had already imported some WordSurv data into Wordcorr, the SulSel (Indonesian abbreviation for South Sulawesi) data on twelve languages of the region. Then I exported it to an XML file in order to test both operations. For fifteen years before that I had kept the same data under an old program, WordSurv 1, that had preserved the data for me but did not have capabilities for handling either the International Phonetic Alphabet symbols (IPA) or the comparative analysis.

So I already had a Wordcorr XML file containing the data and some earlier analyses. Before I imported it to the new computer, I had already verified the correctness of most of the IPA transcriptions on the older computer, and edited the faulty ones using Wordcorr's intuitive approach to typing IPA.

The Holmes View

Within the Wordcorr collection that I was doing serious work on, I put up a special view called "Holmes" that showed only the three languages I was focusing on and left the other nine out. (Wordcorr never destroys or loses data; but any new view can be set to bypass some varieties.) I set a threshold value of 60% for the Holmes view, to make sure that all correspondence sets contained information from at least two of the three languages.

Then I went through Wordcorr's basic Annotate-Tabulate-Refine cycle for the first hundred entries. You'll hear enough about that cycle later, so I'll resist getting you bogged down in the details here. It didn't take very long, a few entries one day, a couple of dozen the next, when other projects made me sleepy.

Finally I invoked the Summarize Evidence function from the Refine panel, giving it a cutoff of 0.0 on Frantz's measure of strength in order to filter out reconstructions whose component correspondence sets were only weakly attested.

Pinpointing the Relevant Data

At that point I settled down to scrutinize the patterns that were emerging. Starting with the best attested correspondence sets (those accompanied in the reconstructions only by other well attested correspondence sets), in about three hours I identified which forms were worth having Holmes call attention to.


Since the International Phonetic Association was just getting going around 1888, I retranscribed the raw data using the phonetic alphabet that Henry Sweet in London (the prototype for Shaw's Henry Higgins in Pygmalion, later My Fair Lady) and his colleagues in Paris were discussing at that time, with a little fudging on details they hadn't gotten around to yet. The IPA alphabet has been upgraded several times since then, but its essential character has changed little.

While I was doing the real linguistics, I also poked around in for all kinds of historical, political, computational, linguistic, and literary information, besides reading through a collection of A. Conan Doyle's Holmes stories paying special attention to their style. I found a suitable MacGuffin that got a sorry excuse for a plot going: the languages it was convenient for me to use are spoken just west of the Spice Islands, and British boiled mutton seems to have been even blander then than it is now.

And I hope you enjoyed reading it as much as I enjoyed writing it. Maybe you saw something there about language that you hadn't noticed before.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks