Sunday, 29 June 2008

"the average language" and possible rarity

One site, n= true, aims to look based on the brilliant World Atlas of Language Structures to see what the most "average" language would theoretically be like - from its phonology to its basic vocabulary.

Although if you look many of the features seem strange or unfamiliar and I am not foolish enough to deny that some of the samples are terribly biased and unrepresentative (especially the ones on colour terms), I did a calculation of the rarity index for a language with the characteristics (minus a few inconsistencies) of the first post mentioned above, and found its rarity to be about 0.525666302.
According to the rarity graph at the site above (unfortunately my efforts to download it failed), which is done based upon natural logarithms of the average rarity index, the lowest rarity indices are seen to be:

- about 0.75 for circa 120 features
- two cases of about 0.71 for circa 105 features
- about 0.60 for circa 90 features
- about 0.55 for circa 47 features
- about 0.52 (the level of the "average language") for circa 41 features
- about 0.50 for circa 38 features
- about 0.45 for circa 19 features

In all, there were fourteen languages with at least 25 features coded that lie below the 1% mark of minimum rarity, which means their mean rarity is less than 990 of a thousand randomised imaginary languages based on the WALS data. Although the rarity map below shows most of these "ultra normal" languages likely to be in New Guinea or the Himalayas and extinct or spoken by extraordinarily small numbers of people, it would still be fascinating to find out the details of where the most "normal" of languages are located and even their actual identity. The lowest rarity areas are almost always de-skewing zones - where isolation causes a reversion to the universal default, unhampered by areal skewing brought about by contact between languages spreading outward.

