Everything Wordcorr does is in the framework of a collection. You may have more than one collection, one for each family of languages you are working on. If your focus is research, you may spend most of your time on a single collection. If your focus is teaching, you are likely to keep a number of collections around for your students to work on.
A collection consists of
- two or more possibly related speech varieties (like Buduma, Cibak, and so forth); you can't make comparisons with fewer than two
- one or more entries, each containing words of about the same meaning (like words for 'cat' in one entry, 'sleep' in another, and so forth) ideally with data from each of the varieties, and the more entries the better
- a full name for the collection (like "Chadic-Biu-Mandara")
- a short name for the collection (like "ChadBM")
- information about the collection as a whole
- information about each speech variety
- a file name under which the collection can be stored (usually the same as its short name)
You have two main interests in the collection:
- the raw data, written phonetically (or phonemically if you are confident about the phonologies)
- various analyses you make of the same data
By the time you are ready to interact with other linguists, you will develop some other interests:
- documentation for where the various pieces of data came from, published or unpublished
- accurate information on each of the speech varieties, either taken from the Ethnologue or suitable for inclusion in the Ethnologue
- names and Wordcorr IDs for every linguist who has worked with you
- names or other identifiers (like "70 year old man from Kabundu") for every speaker of one of the languages you cover who have given you data