# Language as a Clue to Prehistory

#### lpetrich

##### Contributor
Even more evident is shared patterns of irregularity, especially - using different word forms for different parts of a paradigm.

English has some of that, like some verb conjugations and comparison paradigms, like go - went - gone, good - better - best, bad - worse - worst. For "to be", it's a mixture of suppletion and incomplete reduction of ancestral conjugations by the standards of English verbs.

Suppletive comparison paradigms are found in other Germanic languages, in Latin and Romance, in Celtic, in Slavic, etc.

Though other Germanic languages don't have the suppletion that English has for "to go", the Romance languages have plenty of suppletion there.
• Italian: andare -- pres. vado, vai, va, andiamo, andate, vanno -- impf. andavo, pret. andai, fut. andrò
• Spanish: ir -- pres. voy, vas, va, vamos, vais, van - impf. iba, pret. fui, fut. iré
• French: aller -- pres. vais, vas, va, allons, allez, vont - impf. allais, pret. allai, fut. irai
The non-present tenses are much more regular than the present tense.

Suppletion can be traced back to Proto-Indo-European for some verbs, like the copula verb ("to be").

The two main PIE roots are *es- and *bhuH- (> is, be), joined by some others in some of the dialects, like *wes- in Germanic.

Dawn of Verbal Suppletion in Indo-European Languages -- discussing several examples. Suppletive verbs mostly have very general sorts of meanings, as do adjectives with suppletive comparisons in the dialects.

FIrst and second person pronouns are also suppletive in the dialects, with suppletion in them reconstructed for PIE. English I/me, thou/three, we/us. The PIE singular pronouns are *egho-/*me- and *tu-/*te-, with the plural ones being more difficult to reconstruct.

#### Swammerdami

Staff member
The Wikipedia article on Suppletion gives many examples, but every single example comes from an Indo-European language. And many textbooks devote little attention to the topic.

Even defining suppletion isn't easy. Igor Mel'cuk gives the English word 'close' (instead of 'unopen') to be an example!
And some examples that look like suppletion, such as yeux as the plural of French oeil (or IIUC, mice as the plural of English mouse) are the result of regular sound changes.

If you want a long write-up on Suppletion including examples from several non-IE languages you can get a PDF at
(That pdf is free. If you prefer you can pay \$35 for the same file at other sites.)

#### lpetrich

##### Contributor
- the 1st, 2nd personal pronouns:

Case1sg2sg1pl2pl
Nominative*egoH*tuH*wei*yuH
Oblique Stem*me-*te-*nos-, *ns-*wos-, *us-
Verb ending*-m, *-oH*-s, *-eHi*-me*-te

Uralic: - I looked in the Finnish-language version: Kantaurali – Wikipedia (language name: Suomi) Appendix:Finnish possessive suffixes - Wiktionary ,

I have included possessive suffixes and personal verb endings for Finnish and Hungarian. The latter has both definite and indefinite verb conjugations, depending on whether the object is definite or indefinite.

Language1sg2sg1pl2pl
Finnishminäsinämete
Hungarianéntemiti
Proto-Uralic*mi-*ti-*me*te
Finn poss-ni-si-mme-tte
Hung poss-om-od-unk-otok
Finn vb-n-t-mme-tte
Hung id vb-ok-sz-unk-tok
Hung df vb-om-od-juk-játok

Let's look at Altaic. With Personal pronouns in Core Altaic

Family1sg2sg1pl2pl
Turkic*bi*si*bis*sis
Mongolian*bi*ti*ba*ta
Tungusic*bi*si*bö*sö

There is a m-t pattern in them, though Altaic has m > b, and several members have t > s.

#### lpetrich

##### Contributor
Proto-Indo-European, Proto-Uralic, and the Transeurasian languages also share subject-object-verb order and related word orders, like mainverb-auxverb, though such syntactical similarities can be areal effects, from language contact.

The hypothesis of their relationship is part of the Eurasiatic and Nostratic hypotheses - and - along with several other language families and isolates. Joseph Greenberg included in Eurasiatic Indo-European, Uralic-Yukaghir, Altaic, Korean-Japanese-Ainu, Chukchi-Kamchatkan, and Eskimo-Aleut, and Nostratic typically includes Kartvelian, Dravidian, and Afro-Asiatic.

Some similarities are the m-t pattern of personal pronouns, and among some of them, noun dual -k and plural -t.

Going even further is , with Nostratic, , , and . This covers all of premodern humanity except for sub-Saharan Africa, New Guinea, and Australia.

Though that is *very* speculative, there is a rather entrancing feature of it. It covers essentially all the non-Negroid populations of humanity. That means that some offshoot population in Eurasia in the Upper Paleolithic had spoken Proto-Borean. This population had relatively light skin and straight or lightly-curled hair, though continuing to have black-colored hair and brown eyes. Light skin is an adaptation to low sunlight, but straight hair is less explicable. Was it an adaptation? Or sexual selection to distinguish some group? Or a result of Neanderthal admixture?

#### lpetrich

##### Contributor
Returning to much closer to our time, I've found Dated language phylogenies shed light on the ancestry of Sino-Tibetan | PNAS
Given its size and geographical extension, Sino-Tibetan is of the highest importance for understanding the prehistory of East Asia, and of neighboring language families. Based on a dataset of 50 Sino-Tibetan languages, we infer phylogenies that date the origin of the language family to around 7200 B.P., linking the origin of the language family with the late Cishan and the early Yangshao cultures
That's about 5200 BCE. The Sino-Tibetan homeland is located on the lower part of the Yellow River, but inland from the coast and Beijing's location. The people there domesticated broomcorn millet, foxtail millet, pigs, and sheep.

That family's two main branches are the Chinese dialects (Sinitic) and the Tibeto-Burman languages, named from containing Tibetan and Burmese. Proto-Chinese speakers were Sino-Tibetan stay-at-homes, while Proto-Tibeto-Burman speakers moved westward and then southward. They reached Xishanping at the NE end of the Tibetan plateau at 5250 - 4000 BP / 3250 - 2000 BCE. The Proto-Tibeto-Burman people and the Proto-Chinese people then acquired horses and cattle and rice at around this time, the horses and cattle likely from Indo-European speakers to the west and rice from the Baligang people to the south, where it was domesticated since 8700 - 8300 BP.

Proto-Tibeto-Burman speakers continued southwestward into Tibet and southward into the mountains of Southeast Asia, and Chinese speakers expanded southward much later.

Interesting curiosity: the Chinese word for horse, ma, is likely cognate with English "mare", from Proto-Indo-European *mark-. Checking horse - Wiktionary, I find:
• Sino-Tibetan: Chinese: ma, Old Chinese *mra:?, Tibetan: rta, rmang, Burmese: rmang
• Turkic: Turkish: at, Azeri: at, Tatar: at, Kazakh: jılqı, at, Turkmen: at, Kyrgyz: at, jılqı, Chuvash: lasha
• Tungusic: Manchu: morin, Nanai: morin, Oroqen: murin, Evenki: murin, Jurchen muri, Proto-Tungus *murin
Reconstruction:Proto-Tungusic/murin - Wiktionary notes the similar words for this animal in Mongolian, Chinese, Japanese, and some other Central and Southeast Asian languages. Seems like a Wanderwort, a "wander word", a word that travels with what it names.

Last edited:

#### lpetrich

##### Contributor
Chinese is the oldest attested Sino-Tibetan language, going back over 3000 years.
• ~ 400 - 1300 CE
• - Chinese rhyming dictionary
• ~ 1600 (Shang Dynasty) - 200 BCE
A big difficulty is that Chinese writing is not phonetic but instead logographic, with one symbol for each word or morpheme (word part treated as a unit). But Chinese characters often have a phonetic part and a semantic part, like mother = woman + horse (kind of woman whose word sounds like the word for horse, "ma").

For Middle Chinese, one can look back with with the help of rhyming dictionaries, present-day words, words from Chinese in Korean, Japanese, and Vietnamese, and borrowings into Chinese.

Middle Chinese grammar was much like present-day Chinese grammar in being mostly isolating, without inflections. However, Old Chinese had initial and final consonant clusters, something lacking from the present-day dialects. Reduction of these clusters induced the development of tones.
Most researchers trace the core vocabulary of Old Chinese to Sino-Tibetan, with much early borrowing from neighbouring languages. During the Zhou period, the originally monosyllabic vocabulary was augmented with polysyllabic words formed by compounding and reduplication, although monosyllabic vocabulary was still predominant. Unlike Middle Chinese and the modern Chinese dialects, Old Chinese had a significant amount of derivational morphology. Several affixes have been identified, including ones for the verbification of nouns, conversion between transitive and intransitive verbs, and formation of causative verbs.[4] Like modern Chinese, it appears to be uninflected, though a pronoun case and number system seems to have existed during the Shang and early Zhou but was already in the process of disappearing by the Classical period.[5] Likewise, by the Classical period, most morphological derivations had become unproductive or vestigial, and grammatical relationships were primarily indicated using word order and grammatical particles.

#### lpetrich

##### Contributor
Turning to other putative members of Dene-Sino-Caucasian, I take another look at Basque. I searched Google Scholar for "Vasco-Caucasian" and "Euskaro-Caucasian", and also more broadly.
Attempts to estimate the time of Basque-NC divergence yield the early Neolithic, about right for Basque-NC to be a language family spread by the European Neolithic farmers. Basque-NC words in Latin, Germanic, Slavic, and Greek imply a spread all over Europe, making it likely to be *the* language family that those long-ago farmers spread.

#### lpetrich

##### Contributor

Pre-Finno-Ugric substrate refers to substratum loanwords from unidentified non-Indo-European and non-Uralic languages that are found in various Finno-Ugric languages, most notably Sami. The presence of Pre-Finno-Ugric substrate in Sami languages was demonstrated by Ante Aikio.[1] Janne Saarikivi points out that similar substrate words are present in Finnic languages as well, but in much smaller numbers.[2]

The number of substrate words in Sámi likely exceeds one thousand words.[3]

Borrowing to Saami from Paleo-Laplandic probably still took place after the completion of the Great Saami Vowel Shift. Paleo-Laplandic likely became extinct about 1500 years ago.[4]

The Nganasan language also has many substrate words from unknown extinct languages in the Taimyr peninsula.[5]
Lapland is the northern part of the Scandinavian peninsula. The Taimyr Peninsula is on the northern coast of central Siberia.
Vladimir Napolskikh has attempted to link them to the hypothetical Dené–Caucasian language family, but later had to admit that these substrate words have no apparent parallels in any known language on Earth.[9]

Yuri Kuzmenko tried to compare them to the hypothetical Pre-Germanic substrate words, but found no similarities apart from the distinction between central and peripheral accentuation.[10]
So Paleo-Laplandic is related to no known language.

"Irregular correspondences among Uralic languages are frequent among some words, such as 'to milk' and 'hazelnut'. These are presumed to be non-native loanwords by Aikio (2021)"

Such irregular correspondences would be the result of sound shifts in the donor languages, sound shifts not parallel to those in recipient languages. I've seen the same argument about pre-Indo-European substrate vocabulary in Europe, that irregular correspondences mean borrowing from different languages.

#### Swammerdami

Staff member
There are many texts on historical linguistics available for free download on the 'Net. I'll track down some of the URL's if there's interest. Here are just a few:

(1) One interesting book, English: the Language of the Vikings, argues the case that Middle English is descended from Old Norse instead of Old English! Most will reject that as ridiculous but the book has much evidence and interesting discussion. I might conclude that Middle English was a hybrid of the two sources, but the idea of a hybrid is rejected by BOTH sides of the debate! Briefly, Danish residents of the Danelaw retained their language. When William the Bastard and the Normans arrived and the natives resisted, the English-vs-Danish political situation flipped completely. Since one's enemy's enemy is one's ally there was suddenly incentive to merge (or form a koine of) two languages which were already close enough to facilitate bilingualism; it was even politically expedient to try for a 50-50 combination! As London — just to the South of the Danelaw — became the new center of England (and underwent a big influx of immigrants from the Danelaw region), Anglicized Norse (or Norsified English in the traditional view) became the new standard of English.

A very large portion of English words have agreed-upon Norse or Danish etymology. In addition many words have cognates in both Old English and Old Norse and, since by default those words are usually assigned an Old English etymology, the true contribution of Norse/Danish to English may be even higher.

(2) Trask's Historical Linguistics is on-line and has been updated since his death. It may have little new to offer, but has an interesting account of how the discovery of Hittite confirmed Saussure’s laryngeal theory which had been thought of as a clever but unimportant conjecture.

Trask's book also gives this famous story:
By about 1500, it is clear that people were often finding it exceedingly difficult to understand English-speakers from other areas. In a famous passage from 1490, the printer William Caxton reports that a merchant from the north of England walked into a tavern to the east of London and asked for eggys, and was told by the tavern-keeper that she could not understand French. The exchange became quite heated before another man stepped in and explained that the merchant was asking for eyren. This little bit of interpretation did the trick, and the merchant got his eggs. Here the merchant was using a northern word with a northern plural ending, while the tavern-keeper only knew the southern forms typical of Essex and Kent.

(3) Language Classification by Lyle Campbell and William Poser is available on-line. Something I learned from that book is that Gottfried Wilhelm von Leibniz — the great polymath sometimes compared to Leonardo da Vinci — wrote several papers on historical linguistics beginning in the 1690's! If anyone tracks down one of these papers, please let me know. Here's an example of one of several mentions.

#### lpetrich

##### Contributor
- notes several substratum-derived features of that language. One of them is "retroflex" consonants, versions of (t,d,n) produced with the tongue at the roof of the mouth instead of at the teeth ("dental"). This is shared across the Indian subcontinent, and is clearly an areal effect, as linguists say. The Central Asian nomads who brought Sanskrit with them also picked up retroflex consonants when they settled down in India.
In 1955 Burrow listed some 500 words in Sanskrit that he considered to be loans from non-Indo-European languages. ...

These loanwords cover local flora and fauna, agriculture and artisanship, terms of toilette, clothing and household. Dancing and music are particularly prominent, and there are some items of religion and beliefs.[15] They only reflect village life, and not the intricate civilization of the Indus cities, befitting a post-Harappan time frame.[17] In particular, Indo-Aryan words for plants stem in large part from other language families, especially from the now-lost substrate languages.[5]
The sorts of words that settlers might want to borrow.
Mayrhofer identified a "prefixing" language as the source of many non-Indo-European words in the Rigveda, based on recurring prefixes like ka- or ki-, that have been compared by Michael Witzel to the Munda prefix k- for designation of persons, and the plural prefix ki seen in Khasi, though he notes that in Vedic, k- also applies to items merely connected with humans and animals.[9]: 12  ,,,

Witzel remarks that these words span all of local village life. He considers that they were drawn from the lost language of the northern Indus Civilization and its Neolithic predecessors. As they abound in Austroasiatic-like prefixes, he initially chose to call it Para-Munda, but later the Kubhā-Vipāś substrate.[5]
Some of this substrate vocabulary came in early, as Proto-Indo-Iranian nomads went south from their Sintashta homes and overran the Bactria–Margiana Archaeological Complex before splitting up with some going into the Indian subcontinent and some going into Iran.
Terms borrowed from an otherwise unknown language include those relating to cereal-growing and breadmaking (bread, ploughshare, seed, sheaf, yeast), waterworks (canal, well), architecture (brick, house, pillar, wooden peg), tools or weapons (axe, club), textiles and garments (cloak, cloth, coarse garment, hem, needle) and plants (hemp, mustard, soma plant).[3]
Also, "There are an estimated thirty to forty Dravidian loanwords in Vedic" even if not much that can be identified from the Munda languages (east India).

#### lpetrich

##### Contributor
Austronesian influence and Transeurasian ancestry in Japanese in: Language Dynamics and Change Volume 7 Issue 2 (2017) by Martine Robbeets

She figures in a lot of the recent work on Transeurasian languages, especially with that name of them, instead of "Macro-Altaic" or something similar.
Bringing together data from linguistics and archaeology, this paper suggests an alternative way to reconcile prehistoric Austronesian influence with Transeurasian ancestry in Japanese. It proposes that Japanese underwent Austronesian influence at a time when the so-called “Japanic” ancestor of Japanese was still spoken on the eastern coast of the Asian continent, neighbored by a sister language of proto-Austronesian, called “para-Austronesian.”

... The specific interpretation of this hypothesis proposed here is that the Transeurasian homeland correlates with the early Neolithic Xinglongwa culture (6200–3750 BC), situated in Southern Manchuria from the seventh millennium BC onwards, while the homeland of Japanic is situated on the Liaodong Peninsula between the third and second millennium BC, with its speakers adopting rice agriculture from a para-Austronesian population within the Liaodong-Shandong interaction sphere. Sagart first hypothesized that a form of pre-Austronesian was spoken on the Shandong Peninsula during that time (Sagart, 1995), and he suggested that the linguistic ancestors of the Japanese acquired rice cultivation from speakers of an eastern language within the Sino-Tibetan-Austronesian macrofamily with whom they were once in contact (Sagart, 2011).
She first gets into Japanese-Transeurasian similarities.
Japanese and the other Transeurasian languages have a fair number of structural features in common, many of which are not shared with the Austronesian languages: e.g., vowel harmony, absence of initial velar nasals, absence of initial r-, preference for non-verbal strategies of verbal borrowing, mixed verbal and nominal encoding of property words, predominantly suffixing inflectional morphology, SOV (Subject-Object-Verb) sentence order, GAN (Genitive-Noun/Adjective-Noun) phrase order, extensive use of converbs, predominant use of locative existential construction to encode predicative possession, use of the ablative case form to encode predicative comparison, etc. (see Robbeets, 2017b).
The last two are constructions like "At me is a book" for "I have a book" and "Blankets from sheets are thick" for "Blankets are thicker than sheets".

A is a non-finite verb form that expresses adverbial subordination, like 'when', 'because', 'after' and 'while'.
Moreover, these languages can be shown to display a single set of regular correspondences for consonants and vowels, they have basic vocabulary and non-cultural vocabulary in common, count a large proportion of verbs among their cognates, share common verb morphology and spread their correspondences consistently over five branches (Robbeets, 2005, 2015).
She proposes (JK, Altaic) with JK: (Japonic, Korean), Altaic: ((Turkic, Mongolian), Tungusic)

Austronesian?
In contrast, the similarities shared between Japanese and the Austronesian languages, shown in Fig. 3, are of a different nature. Japanese has only few structural features in common with the Austronesian languages that are not shared by other Transeurasian languages as well (see Murayama, 1976, 1978; Kawamoto, 1985: 105–110; Robbeets, 2017b). Examples of these properties exclusively shared between Japanese and Austronesian are a small vowel inventory, open syllable (CVCV) structure, and reduplication to express plurality.

There are at least two different sets of sound correspondences between Japanese and the Austronesian languages, which has led Kawamoto (1984) to propose that Japanese was “Austronesianized” twice. Moreover, the proposed cognates consist mainly of cultural vocabulary and nouns (Kawamoto, 1984; Benedict, 1990; Sakiyama, 1996; Kumar, 2009).
This suggests at least two waves of borrowing.

#### lpetrich

##### Contributor
She has this table:
 Japanese-Transeurasian Japanese-Austronesian Delimiting structural features many few Sets of regular correspondences I min. 2 Common vocabulary basic/non-cultural cultural Word class of cognates mainly verbs mainly nouns Comparative setting five branches binary
As an example of what she has in mind, I'll take English. Is it a Germanic language or a Romance language? From the large amounts of Norman French vocabulary in English, one might think that it is a Romance language. But most of its more basic vocabulary and its grammar are very recognizably Germanic.

After a long discussion of early-Holocene millet agriculture in the parts of China near the Korean Peninsula, she gets into rice cultivation.
3.3. The integration of rice and millet agriculture after 3000 BC

A second major demographic pulse in Northern China is associated with the integration of rice into the millet agricultural assemblage and a subsequent population spread. The Hongshan and Houwa cultures in southern Manchuria were contemporary with the Yangshao (5000–2800 BC) and Dawenkou (4100–2600 BC) cultures of the Yellow River Basin.
Who were they?
Whereas the Yangshao culture is generally associated with the homeland of Sino-Tibetan, some scholars such as Sagart (2008, 2011: 27; Sagart et al., this issue), Blench (2008) and Van Driem (1998: 93–94) suggest that the Dawenkou culture should be linked to a para-Austronesian presence.

Indications for an Austronesian connection to the Dawenkou culture come from various kinds of evidence: the use of pottery with supporting legs, house structure, myths on the sun, burial rituals such as the use of slab tombs (Zhang, 2009), cranial measurements (Wu and Olsen, 2009), and the shared ritual of tooth ablation, notably the extraction of healthy upper lateral incisors as a puberty rite (Han and Nakahasi, 1996: 47–48; Pietrusewsky et al., 2014). Moreover, it is more likely that Austronesian agriculture spread to Taiwan from Shandong than from the Lower Yangtze River, as previously suggested by Blust (1996) and Bellwood (2005), because millets and rice arrived as an integrated assemblage in Taiwan around 3000–2400 BC, while Lower Yangtze agriculture focused exclusively on rice until 2000 BC (Weber and Fuller, 2008: 80; Stevens and Fuller, 2017). The excavation of marine shell midden sites (Yuan et al., 2002) has further revealed that the Dawenkou was a maritime-focused culture, in contrast to the Lower Yangtze culture, which lacked marine sources. The extent of the correlations between the coastal cultures of Shandong and Taiwan remains to be investigated, but it is probable that the millet-rice agricultural assemblage was transmitted around 3000 BC from Shandong to Taiwan over a maritime route. In addition, Ko et al. (2014: 430) find evidence from mitochondrial DNA that supports a separation between Austronesian and Sino-Tibetan populations around 8000–6000 BC, well before Austronesian populations started to expand into Taiwan.
On the spread of rice farming,
Archaeobotanical studies such as Miyamoto (2009) and Ahn (2010) show that wet-rice cultivation came to Korea in the late second millennium BC (1300–1000 BC) via the Shandong and Liaodong Peninsulas. This marks the beginning of the Mumun culture (1300 BC–0 AD) in Korea. Rice agriculture was more popular in the central and southwestern regions of Korea than in the southeast, where dry-field crops including millet and soybean remained important.

...
The final spread of millets and rice into Japan is dated to the beginning of the first millennium BC, marking the beginning of the Yayoi period (1000 BC–300 AD). It is associated with an influx of farmers from the Korean Peninsula (Harunari, 1990; Nelson, 1993; Hudson, 1999; Crawford and Shen, 1998; Crawford and Lee, 2003; Harunari and Imamura, 2004; Barnes, 2015), who probably brought the Japonic language to Japan. Apart from rice, millets and various crops, Northeast Asian influences include pottery, stone and wooden agricultural tools, domesticated pigs, ditched settlements and megalith burials. It is clear that agriculture arrived in Japan as a “package” of Northeast Asian culture, even if this package had a southern, Austronesian-like touch. Wet-rice agriculture was ultimately derived from the south, and certain elements of Yayoi culture such as ritual tooth ablation (Han and Nakahasi, 1996: 58, Brace and Nagai, 1982: 405), tattooing with dragon figures to ward off monstrous fishes (Pauly, 1980: 82; Sasaki, 1991: 26–27; Solheim, 1993: 2; Bellwood, 1997: 108, 135; Oppenheimer, 1998: 77; Palmer, 2007: 51), and granaries with raised floors, curved roof-lines and gable horns (Pauly, 1980: 84; Waterson, 1997: 17; Arbi et al., 2015) indicate an Austronesian connection. As a result, the most parsimonious hypothesis, in my view, is the early, continental insertion of Austronesian elements into an essentially North East Asian cultural package, as illustrated in Fig. 5.

#### lpetrich

##### Contributor
Words for subsistence activities?
As illustrated below, common words indicative of cultivation and weaving can be reconstructed back to proto-Transeurasian, while shared maritime vocabulary and rice terminology are lacking (see also Robbeets, 2017c, for the reconstruction of additional vocabulary that associates proto-Transeurasian with broad-spectrum subsistence, including consumable plants such as nuts and roots and subsistence activities such as “grinding” and “kneading,” and indirect lexical evidence for pottery production). This observation supports the identification of the Xinglongwa culture with proto-Transeurasian. By contrast, Japanese and Korean share coastal subsistence terms, but they lack common rice vocabulary, an observation which supports the association of proto-Japano-Koreanic with the Neolithic Houwa cultures on the Liaodong Peninsula. Finally, the observation that some Japanic rice terms seem to derive from Austronesian supports the addition of rice to the earlier millet agricultural assemblage under influence of the—presumably para-Austronesian—Dawenkou culture.
There are also Transeurasian words for "to twine" and "to weave". Twining is how one makes string or cord or rope. Both activities are dependent on agricultural sources of fibers.

Japanese and Korean also share some coastal-subsistence vocabulary, like for boats and fish and crabs.

Then, rice.
The Transeurasian languages lack a common rice vocabulary. In Japonic many words relating to rice agriculture can be derived language-internally. For instance, OJ momi ‘hulled rice,’ OJ ipi1 ‘steamed rice, cooked millet’ and OJ nuka ‘rice bran’ seem to be deverbal nouns, from the original verbs underlying OJ mom- ‘rub,’ MJ if- ‘to eat’ and OJ nuk- ‘remove,’ respectively (see Robbeets, 2017a).

The analysis of OJ ipi1 ‘steamed rice, cooked millet’ along these lines is given in Vovin (1998: 371–372) and Robbeets (2005: 552). Interestingly, parallel formations of ‘cooked rice’ are found in Old Chinese and Austronesian.
Then some discussion of Austronesian connections of Japanese rice-related vocabulary.

#### lpetrich

##### Contributor
A complication in historical-linguistics work is areal effects, This can produce a of languages that share a lot of features that their ancestors did not have, notably and

is a feature of eastern Asian languages that is most likely an areal feature. WALS Online - Chapter Numeral Classifiers with WALS Online - Feature 55A: Numeral Classifiers shows their distribution, which is rather curiously patchy.

and - these words emerged over the history of the Chinese language, words that were originally distinct words.
Classifier systems in many nearby languages and language groups (such as Vietnamese and the Tai languages) are very similar to the Chinese classifier system in both grammatical structure and the parameters along which some objects are grouped together. Thus, there has been some debate over which language family first developed classifiers and which ones then borrowed them—or whether classifier systems were native to all these languages and developed more through repeated language contact throughout history.
"When a noun is preceded by a number, a demonstrative such as this or that, or certain quantifiers such as every, a classifier must normally be inserted before the noun."

"three cats" in Chinese is 三只猫 - sān zhī māo - three (animal) cat

Very similar systems are in Japanese, Korean, Vietnamese, Khmer, Thai, Burmese, Bengali, etc.

#### Copernicus

A complication in historical-linguistics work is areal effects, This can produce a of languages that share a lot of features that their ancestors did not have, notably and

is a feature of eastern Asian languages that is most likely an areal feature. WALS Online - Chapter Numeral Classifiers with WALS Online - Feature 55A: Numeral Classifiers shows their distribution, which is rather curiously patchy.

and - these words emerged over the history of the Chinese language, words that were originally distinct words.
Classifier systems in many nearby languages and language groups (such as Vietnamese and the Tai languages) are very similar to the Chinese classifier system in both grammatical structure and the parameters along which some objects are grouped together. Thus, there has been some debate over which language family first developed classifiers and which ones then borrowed them—or whether classifier systems were native to all these languages and developed more through repeated language contact throughout history.
"When a noun is preceded by a number, a demonstrative such as this or that, or certain quantifiers such as every, a classifier must normally be inserted before the noun."

"three cats" in Chinese is 三只猫 - sān zhī māo - three (animal) cat

Very similar systems are in Japanese, Korean, Vietnamese, Khmer, Thai, Burmese, Bengali, etc.

Basically, the way to look at classifier linguistics is to consider the distinction between mass and count nouns in English. Mass nouns do not incorporate a countable unit in their semantics, so "beer" in "I would like some beer" would usually be considered a mass noun, but "beer" in "I would like a beer" would be a count noun. When you say, "I would like five glasses of beer", that is the type of structure where a classifier is used with a mass noun. Those Asian languages that have classifiers are essentially languages in which most or all nouns are mass nouns. Such languages are not exclusive to eastern Asia, but that area has a lot of languages that treat nouns as mass nouns and require a classifier in contexts where countability is important in communicating a thought.

#### lpetrich

##### Contributor
I tried comparing lists of Eastern Asian classfiers, but I was not very successful.
• Sheets: Ch. zhāng, Jap. mai, Kor. jang, mai (paper), Viet. tờ (paper), lá (small paper), Thai pàen, bai (paper), Burm. ywet, chat
• Long / thin objects: Ch. gēn (rigid), tiáo (flexible), Jap. hon/pon/bon, Kor. ?, Viet. cây, Thai sâyn, tâeng, Burm. chaung
• Small / round objects: Ch. méi, Jap. ko, Kor. al, Viet. quả/trái, Thai mét (?), Burm. loun
A lot of the classifier categories are rather specialized, and they don't seem to overlap very much. Seems like these classifiers were separately developed from existing words.

#### Copernicus

I tried comparing lists of Eastern Asian classfiers, but I was not very successful.
• - has a list of the most common ones
• Thai Classifiers - 20 Classifiers You Should Know | Thai With Grace
• Sheets: Ch. zhāng, Jap. mai, Kor. jang, mai (paper), Viet. tờ (paper), lá (small paper), Thai pàen, bai (paper), Burm. ywet, chat
• Long / thin objects: Ch. gēn (rigid), tiáo (flexible), Jap. hon/pon/bon, Kor. ?, Viet. cây, Thai sâyn, tâeng, Burm. chaung
• Small / round objects: Ch. méi, Jap. ko, Kor. al, Viet. quả/trái, Thai mét (?), Burm. loun
A lot of the classifier categories are rather specialized, and they don't seem to overlap very much. Seems like these classifiers were separately developed from existing words.
Bear in mind that noun meanings are not inherently count or mass. Among languages that have a lot of count nouns, there can be lots of differences. For example, "information" seems inherently a mass noun to English speakers, but French speakers treat it as a count noun. So you will sometimes hear native French speakers using the plural "informations" when speaking in English. Languages that tend to have few or no count nouns need classifiers when they refer to the countability of things, but the classifiers themselves can also come with all sorts of other semantic baggage. So I don't think it is terribly significant that quantifying classifiers in different languages have conventionally different semantics. Languages always change over time, and they can change in arbitrary ways in speech communities that are isolated from each other.

#### Swammerdami

Staff member
I think Copernicus' distinction between mass nouns and count nouns is a very good way to look at the issue, especially if mass noun includes nouns which are countable but the unit of counting is ambiguous. ("We'll have three beers please." Sarcastic waitress: "Three glasses or three pitchers? Three kegs?")

In Thai some words, e.g. 'finger,' are their own classifier. But you'd probably say "He show three finger," not "He show finger three finger." But the noun/classifier is sometimes repeated, e.g. for emphasis in "He kill person five person."

Many classifiers are based on shape. But 'automobile,' 'vehicle', 'spoon' and 'fork' all use the same classifier ('handled object'). I'll guess this is left over from a time when vehicles were handle-shaped, e.g. palanquin or buffalo-drawn cart.

English has one animal name that needs a classifier: you say "ten head of cattle", not "ten cattle." An English word that functions as a classifier is "train": you say "How many trains leave this railroad station every day?" not "How many railroads? (or cars?)"

Thai has a classifier /kha-buan/ meaning "train", but it's missing from Wiktionary's list of 145 Thai classifiers, perhaps because it's ONLY used with the nouns for 'rail-road' and 'parade.' I use it in the market when buying ears of corn: the little kernels of corn remind me of a parade! Another reason I use this whimsical classifier is that the correct classifier (/fak/) sounds just like an English obscenity. I can indulge in such whimsy since I am a foreigner; if I were a native they'd probably find me idiotic for hopelessly confusing a classifier.

#### lpetrich

##### Contributor
In English, "railroad" refers to the tracks or the owners of the tracks, not the vehicles that travel on them.

As to vehicles being related to handles, I think that it is from how one controls them: they have handled parts for controlling them, and one uses a steering wheel as if it was a handle.

"Train" and "parade" have in common their being sequences: sequences of railcars or marchers.

#### Swammerdami

Staff member
In English, "railroad" refers to the tracks or the owners of the tracks, not the vehicles that travel on them.
Thank you, Captain Obvious!
I wanted to segue into my intended-as-humorous anecdote about the Thai 'train' classifier /kha-buan/ and noticed that in the English "a train of railroad cars," 'train' functions as a collective noun, much like "a pride of lions." Such collective nouns are akin to classifiers.

#### lpetrich

##### Contributor
Wiktionary - part of the Wikimedia family of sies: Wikimedia Foundation - "The Wikimedia Foundation is the nonprofit that hosts Wikipedia and our other free knowledge projects."

Wikimedia Projects – Wikimedia Foundation has
• Reference
• Wikipedia - the well-known online encyclopedia
• Wikibooks - online textbooks
• Wiktionary - online dictionary
• Wikiquote - online collection of quotes
• Collections
• Wikimedia Commons - pictures, videos, soundfiles
• Wikisource - source texts and historical documents
• Wikiversity - online learning resources
• Wikispecies - species database
• Technology
• Wikidata - database used by Wikimedia projects
• MediaWiki - the software behind Wikipedia
• Guides
• Wikivoyage - online travel guide
• Wikinews - online news source
• Collaboration
• Meta-Wiki - project coordination tool
Wiktionary looks comprehensive, and it also has etymologies going back to Proto-Germanic, Proto-Indo-European, and other protolanguages. It also has word inflections, including reconstructed protolanguage inflections.

I recently got the idea of looking at words for "bridge", because of German Brücke being a cognate, but Latin pons and Slavic most not being cognates.

English bridge < Middle English brigge < Old English brycg
Cognate with Dutch brug and German Brücke, and descended from reconstructed Proto-Germanic *brugjôn and Proto-Indo-European *bhrew- / *bherw- “wooden flooring, decking, bridge”

Latin pôns, pont- "bridge" is obviously not cognate, and the Romance languages all have descendants of it: Italian ponte, Spanish puente, Portuguese ponte, French pont, etc. Romanian punte is a small bridge, and the laguage's more general word is pod, a borrowing from Slavic. They are all descended from the accusative or direct-object form pontem, something typical of Romance nouns descended from Latin ones that ended in -s.

That one is descended from PIE *pent- "path" with such descendants as English "to find" and Proto-Slavic *poti "way, path" with descendants pot / put / ... A derivative word is putnik "traveler" a derivative of that is sputnik "fellow traveler, satellite of celestial body".

The Slavic languages have most / mostu / mist, descended from Proto-Slavic *mostu. Likely from PIE *masd-to-s “aggregate of timbers/boards” and related to *mazdos "pole, mast" with descendants like English "mast"

Scottish Gaelic drochaid and Irish drochead are descend from Old Irish drochet, a compound meaning "wheel path".

The Modern Greek word for bridge is gefira, from Classical Greek gephura, with dialect variants bephura, dephura, diphoura. Its origin is unknown, but its suffix -ura suggests that it's from some pre-Hellenic language.

Armenian kamurj may have the same origin.

There's also a Sanskrit word for bridge, setu.

Though all the earlier dialects had words for bridge, most of them are unrelated, meaning that PIE had no clearly-reconstructible word for bridge.

-

I looked in other language families, and I found that the Finnish word silta was borrowed from a Baltic Indo-European language like Lithuanian, with tiltas. For Semitic, we have Arabic jisr, Aramaic gishra, Hebrew gesher, and Akkadian gishru. In Arabic, g > j often happened.

Proto-Turkic had *köpürüg, with descendants like Turkish köprü, and it was likely borrowed into Mongolian, which has güür.

I couldn't find much else that indicates much prehistory. Seems like making enough bridges to have a word for them is something that is not far before having written language.

#### Copernicus

One thing to keep in mind about historical reconstruction is that we can only posit cognate sets--sets of words that we think might descend from the same protoword. Unfortunately, anything can mess up that assumption, most especially the possibility of regional or a borrowing from a related language. One principle established from the 19th century has been that "sound change is regular". That is, the pronunciation of words does not usually change arbitrarily. Pronunciation changes across the entire vocabulary all at once. So valid cognate sets will obey the established rules of regular sound change--consonant and vowel shifts, for example. Hence, it isn't reasonable to speculate that similar words in related languages really belong to a given cognate set. They need to also involve regularities that apply to the pronunciations of a large number of words in cognate sets. The process of reconstructing a historical protolanguage is tedious and requires serious scholarship. Although it is fun to speculate about word origins, we need to validate it by linking the speculation to sound correspondence patterns at the very least.

#### Bomb#20

##### Contributor
I recently got the idea of looking at words for "bridge", because of German Brücke being a cognate, but Latin pons and Slavic most not being cognates.

English bridge < Middle English brigge < Old English brycg
Cognate with Dutch brug and German Brücke, and descended from reconstructed Proto-Germanic *brugjôn and Proto-Indo-European *bhrew- / *bherw- “wooden flooring, decking, bridge”
Based on my completely unscientific anecdotal exhaustive survey, the most commonly appearing phrase in the Icelandic language is...

... "One lane bridge".

#### Swammerdami

Staff member
Examples of sound changes and semantic shifts

Here's an example of four different English words, all meaning 'leader of a group' and cognates of each other: all derive ultimately from the exact same PIE root (*kaput) but with four different initial consonant sounds: head, captain, chief, chef. The four different initial sounds (H, K, CH, SH) are all due to regular sound changes but these words, despite having almost the same meanings, followed different trajectories. captain, chief, chef were all borrowings Latin->French->English but the ordering between sound change and borrowing matters. In addition to the change in initial consonant, a P>F transition has occurred in two of the words and those words have lost their third consonant (that's common: consider English 'captain' > 'cap' [slang]). Head lost the middle consonant from Old English 'heafod.'

These words have all changed their original PIE meaning, which was 'top part of an animal's body.' Just as many sound changes are one-way streets (K can mutate to CH but seldom vice versa), so this semantic shift is one-way. To fill the gap when a word shifts from 'top of animal' to 'leader', a word for 'jug or bowl' may come to mean 'top of animal' by shape analogy (German Kopf or English 'jughead'!) The French word tête has undergone both these transitions: 'cup' [Latin testa] > 'top of animal' > 'leader.'

#### lpetrich

##### Contributor
PIE *kaput, kapwet- > Proto-Germanic *haubudan > Old English heafod > Middle English hed > Modern English head
PIE *kaput > Latin caput, capit-
Latin caput > Late Latin capitâneus > Old French capitaine > Middle English capitain > English captain
Latin caput > Late Latin capus > Old French chief > Middle English chef > English chief
Old French chief > Middle French chief > French chef > English chef

English also has "capital" and "cattle" and "chattel" from Latin capitâlis and some Old French words derived from it.

Italian capo "leader" and Spanish cabeza "head" also have this origin, though the Spanish one is from interpreting the plural of caput - capita - as a singular noun.

Latin testa "earthenware pot" gives rise to Italian testa and French tête

That may be from PIE *ters- "dry" like Latin terra "land" and English "thirst".

Looking at Celtic, Welsh pen and Gaelic ceann are derived from Proto-Celtic *kwennom - a word of obscure origin

The words in Slavic languages, like Russian golova and Serbo-Croatian glava, are from Proto-Slavic *golva, in turn from PIE *gelH- listed in Wiktionary as "naked" and "head".

Appendix I - Indo-European Roots - couldn't find it there.

#### Copernicus

Examples of sound changes and semantic shifts

Here's an example of four different English words, all meaning 'leader of a group' and cognates of each other: all derive ultimately from the exact same PIE root (*kaput) but with four different initial consonant sounds: head, captain, chief, chef. The four different initial sounds (H, K, CH, SH) are all due to regular sound changes but these words, despite having almost the same meanings, followed different trajectories. captain, chief, chef were all borrowings Latin->French->English but the ordering between sound change and borrowing matters. In addition to the change in initial consonant, a P>F transition has occurred in two of the words and those words have lost their third consonant (that's common: consider English 'captain' > 'cap' [slang]). Head lost the middle consonant from Old English 'heafod.'

Unidirectional changes of this sort are called "implicational universals", because they tend to hold across all human languages, although one does find occasional counterexamples. The point is that linguists discovered these types of universals in the first half of the 20th century, thanks to the famous Prague School linguist, Roman Jakobson. He published Kindersprache, Aphasie und allgemeine Lautgesetze from Sweden in 1941, which was the seminal work on these universals. (It was later published for the first time in English in 1968 as Child Language, Aphasia, and Phonological Universals, although the translator mistranslated "implicational universals" as "universal rules of solidarity". Big goof.) Jakobson was in Sweden, because he had to be evacuated as Hitler's troops marched into Prague. The Soviet Union wanted him to come back to his native Russia, but Jakobson feared Stalin as much as Hitler. (I actually met a Russian who claimed to have spent the night with Jakobson in a Prague hotel trying to convince him to return to Russia, and he claimed Jakobson would have agreed but for the rapid advance of the Wehrmacht. Jakobson's famous colleague, Nikolai Trubetzkoy, was later interrogated by the Nazis and died as a result.)

The point is that implicational universals of the sort Swammerdami mentioned had all kinds of implications for linguistics, because Jakobson pointed out that the sounds on the lefthand side of his universals tend to be the first sounds produced by infants during language acquisition. And that the sounds on the righthand side usually tend to be produced afterwards. He also pointed out that people suffering from motor aphasia (loss of pronunciation) tend to lose the last sounds acquired during acquisition first. So, if k>ch, then it will be common for toddlers to mispronounce it as "keese", but someone who suffers aphasia might also mispronounce the word in the same way. That's because the palatal "ch" sound tends to be mastered later than "k".

Now, there are reasons why the implicational rules proposed by Jakobson work, but that's for people interested in the branch of linguistics known as phonology. The relevance of these universals here is that languages undergoing changes of pronunciation produce these universal patterns in daughter languages. So, if you propose cognate sets across a suspected family of related languages and discover patterns of sound correspondence, then you prove that the languages are related. But how do you know what the original sound was in the protolanguage? It is unlikely to be just any sound. If you have a knowledge of common unidirectional sound changes, you can infer the original sound by tracing back to it via implicational rules.

These words have all changed their original PIE meaning, which was 'top part of an animal's body.' Just as many sound changes are one-way streets (K can mutate to CH but seldom vice versa), so this semantic shift is one-way. To fill the gap when a word shifts from 'top of animal' to 'leader', a word for 'jug or bowl' may come to mean 'top of animal' by shape analogy (German Kopf or English 'jughead'!) The French word tête has undergone both these transitions: 'cup' [Latin testa] > 'top of animal' > 'leader.'

I haven't seen much literature on the subject of semantic implicational universals, and the reason for the phonological universals can be linked to the physical difficulties inherent in the articulation of sounds. So I would be much more skeptical that one can establish implicational universals for semantic shifts quite as convincingly as Jakobson did for sound shifts.

#### lpetrich

##### Contributor
In what order does a child learn phonemes? - Quora has one answer, with a detailed chart.
noting
When are Speech Sounds Developed? | Mommy Speech Therapy
noting
sound_development_chart - sound_development_chart.pdf

Which age learned: (initial), (medial), (finall) positions
• p 232, b 222, m 222, f 334, v 665
• t 333, d 243, n 223, th dh 777, s sh 555, z 755, ch j 555
• k 333, g 333, ng _35
• l 555, r 665
• (initial only) y 5, w 3, kw 4, bl 5, br dr fl fr gl gr kl gr pl 6, sl sp sw 7
dh = voiced th
The r is the English r, not the French r or the trilled r (very common). English also doesn't have "kh" (velar fricative, an almost-k h-like sound), another common sound.

Anything on vowels?

#### Copernicus

In what order does a child learn phonemes? - Quora has one answer, with a detailed chart.
noting
When are Speech Sounds Developed? | Mommy Speech Therapy
noting
sound_development_chart - sound_development_chart.pdf

Which age learned: (initial), (medial), (finall) positions
• p 232, b 222, m 222, f 334, v 665
• t 333, d 243, n 223, th dh 777, s sh 555, z 755, ch j 555
• k 333, g 333, ng _35
• l 555, r 665
• (initial only) y 5, w 3, kw 4, bl 5, br dr fl fr gl gr kl gr pl 6, sl sp sw 7
dh = voiced th
The r is the English r, not the French r or the trilled r (very common). English also doesn't have "kh" (velar fricative, an almost-k h-like sound), another common sound.

Anything on vowels?
A more accurate way to look at these studies is to take them as claims about speech production. The two sides of language are perception and production, but the two sides are much further apart in child language learners than adult speakers. So researchers look at the sounds that come out of the mouths of babes, but they don't really have an easy way of knowing what the children are hearing or trying to say. There is considerable evidence that young learners try to pronounce pretty much the full set of adult phonemes, but that their articulation is impeded by the need to "tune" their muscular coordination to produce the adult repertoire. Most studies of language, however, are focused on cataloging sounds that can be detected in speech, not in the mind. This is particularly true of studies by language pathologists, whose job it is to get speakers to produce normal speech.

Added to this problem is the theoretical question of what a phoneme is. Since about the 1930s, the concept was defined as a type of perceived sound contrast. However, since the time that the word "phoneme" was coined (about 1887 in Kazan University by Baudouin de Courtenay) until the 1930s, it was mostly defined as sounds that speakers were trying to produce in speech (aka the "psychological phoneme"). So Baudouin allowed for the fact that the phonetic output could quite different from the phonemic input to articulation. He also noticed that there was a huge discrepancy in the speech of children and a lesser (but still significant) discrepancy in adults.

#### Swammerdami

Staff member
Examples of sound changes and semantic shifts

Here's an example of four different English words, all meaning 'leader of a group' and cognates of each other: ... The four different initial sounds (H, K, CH, SH) are all due to regular sound changes ...

Unidirectional changes of this sort are called "implicational universals", because they tend to hold across all human languages, although one does find occasional counterexamples. The point is that linguists discovered these types of universals in the first half of the 20th century, thanks to the famous Prague School linguist, Roman Jakobson.
...
But how do you know what the original sound was in the protolanguage? It is unlikely to be just any sound. If you have a knowledge of common unidirectional sound changes, you can infer the original sound by tracing back to it via implicational rules.

These words have all changed their original PIE meaning, which was 'top part of an animal's body.' Just as many sound changes are one-way streets (K can mutate to CH but seldom vice versa), so this semantic shift is one-way. To fill the gap when a word shifts from 'top of animal' to 'leader', a word for 'jug or bowl' may come to mean 'top of animal' by shape analogy (German Kopf or English 'jughead'!) The French word tête has undergone both these transitions: 'cup' [Latin testa] > 'top of animal' > 'leader.'

I haven't seen much literature on the subject of semantic implicational universals, and the reason for the phonological universals can be linked to the physical difficulties inherent in the articulation of sounds. So I would be much more skeptical that one can establish implicational universals for semantic shifts quite as convincingly as Jakobson did for sound shifts.
I was intrigued by this comment. Semantic shifts may be much less regular than sound changes, but there are repeating patterns, and the changes may be unidirectional. For example the two changes I mentioned:
. . . . cup > head
have occurred in multiple languages and, or so my intuition tells me, much less likely to occur in the opposite direction.

Here are some other semantic shifts mentioned in a Lyle Campbell textbook, observed in more than one language, and in most cases likely to be one-directional. The first three are shifts due to euphemism.

. . . . sleep/kiss/lay > copulate
. . . . medicine > poison
. . . . girl/child > prostitute

. . . . horse-rider > gentleman
. . . . silver > money
. . . . journal/daily > newspaper
. . . . cool > relax
. . . . excellency > you (polite)

Surely there are other, better, examples of common semantic shifts.

#### lpetrich

##### Contributor
PHOIBLE 2.0 -
PHOIBLE is a repository of cross-linguistic phonological inventory data, which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample. Release 2.0 from 2019 includes 3020 inventories that contain 3183 segment types found in 2186 distinct languages.
It uses the but it is rather strict, distinguishing short and long vowels, and also modified consonants like aspirated ones.

One can see how common each phoneme is with PHOIBLE 2.0 - Segments

Looking under PHOIBLE 2.0 - Inventories I found PHOIBLE 2.0 - Inventory English (American) (UZ 2175) and PHOIBLE 2.0 - Inventory English (British) (UZ 2178)

Though /p/, /t/, /k/ are very common sounds, those pages list English as having the aspirated variants /ph/, /th/, /kh/ which are much less common: /k/ is 90% and /kh/ 20%. However, it lists those variants as having allophones /p/, /t/, /k/.

So I'd have to download the databases and look through them for languages with neither /k/ is 90% nor /kh/.

#### lpetrich

##### Contributor
WALS Online - Home has some such results.

WALS Online - Chapter Consonant Inventories -- the smallest is of Rotokas, with /p t k b d g/ -- all stops, no nasals (/n/, /m/, /ng/, ...), no fricatives (/f/ /v/ /s/ /z/ ...), no affricates (/ts/ /dz/ ...), ...

WALS Online - Chapter Voicing in Plosives and Fricatives -- does it treat (for example) stops /t/ /d/ as separate? Or fricatives /s/ /z/ as separate? Voicing in fricatives but not stops was not as common as the three alternatives.

WALS Online - Chapter Absence of Common Consonants -- the large majority of languages have fricatives, nasals, and bilabials (/p/ /b/ /v/). 1/10 of them lack fricatives, some 1/50 lack nasals, and some 1/100 lack bilabials. There was only one each of languages that lack nasals and either bilabials or fricatives.

WALS Online - Chapter Voicing and Gaps in Plosive Systems -- of /p/ and /b/, /p/ sometimes drops out. Likewise, of /k/ and /g/, /g/ sometimes drops out.

WALS Online - Feature 19A: Presence of Uncommon Consonants - (/th/ /dh/) fricatives, notably present in English, are present only about 1/10 of the time that they are absent. Likewise, labial-velar stops (/k-p/ /g-b/) are present only about 1/10 of the time that they are absent. They are mainly present in West Africa and in eastern New Guinea, but seldom elsewhere.

WALS Online - Chapter Vowel Quality Inventories - languages differ in how many phonemically different vowels that they have. English has a relatively large number of vowel phonemes, and Spanish an average number.

#### lpetrich

##### Contributor
WALS Online - Chapter The Velar Nasal - the "ng" sound.

About half of the sample does not have /ng/ as a separate phoneme, though it may have it as an allophone of /n/ or /m/.

Of the other half, 2/3 of them can have /ng/ be an initial consonant, and English is one of the 1/3 which can't.

WALS Online - Chapter Syllable Structure -- English is on the complex side, allowing plenty of initial and final consonant clusters, like "stink" -- CCVCC (C = consonant, V = vowel).

The type that is always present is CV. Some languages add V (vowel alone), combined as (C)V.

Moderate complexity is CCV, CVC, CCVC, VC, usually with the second consonant in a cluster being limited to liquids /r/ /l/ or glides /j/ /w/ (/j/ is English y).

#### Copernicus

If one studies articulatory phonetics, then it is possible to rank sounds in terms of the complexity of articulation required to physically produce a sound or sound combination. Logically, the most complex sounds will come with the greatest number of ways to mispronounce them. For example, a voiceless consonant like bilabial [p] is produced by closing the lips and releasing them as air passes through the oral cavity. Air passes freely across the glottis without phonation (vocal vibration). The bilabial [b] is produced the same way, except that phonation is required simultaneously. Therefore, [b]is more complex. It can be mispronounced by failing to maintain simultaneity of lip closure and vocal vibration.

Why is this significant? Because it explains why a lot of languages have devoicing processes--for example, final devoicing in languages like Russian and German. English-learning children may tend to mispronounce /b/ as [p] in the early stages of language acquisition by either failing to voice the /b/ or failing to shut down or start up phonation at the same time as lip closure. To acquire English /b/, the child must suppress the tendency to mispronounce it as a voiceless sound. However, Russian and German children don't have to suppress devoicing when the sound comes at the end of a word or syllable, so their language ends up with a final devoicing process. And that is why Russian and German adults tend to devoice final sounds when they learn English. As a person gets older, their ability to acquire fine details of muscular coordination degrades, especially after puberty. So learning English later in life makes it harder for Russian and German learners to pronounce final voiced sounds.

Generally speaking, all of phonology can be understood, if you understand this very fundamental fact about how pronunciation evolves in language learners. The suppression of mispronunciations is extremely important in acquiring a mature pronunciation of a language, and the ability to suppress mispronunciations degrades rapidly, especially after puberty, when the brain undergoes significant changes. One can then understand why typological patterns occur in the phonemic inventories of languages and why some sounds occur in most languages, but others do not.

#### lpetrich

##### Contributor
Children's Consonant Acquisition in 27 Languages: A Cross-Linguistic Review | American Journal of Speech-Language Pathology
I had to look in to see what's what in it.

The ability to make most phonemes is acquired very early in life, around 2 to 4 years, with some of them acquired as late as 6 or 7 years. The basic stops and nasals are acquired early, around 2 to 3 years of age. Fricatives /h/ /f/ /x/ (the kh fricative) are acquired a little later, and sibiliants /s/ /z/ /S/ (sh) and /Z/ (zh) and affricates /ts/ /dz/ /tS/ (ch) /dZ/ (English j) around 3 to 4, though some others are acquired very early. Fricatives /th/ /dh/ (English voiceless and voiced th) are acquired relatively late, at 4 to 5. Semivowels /j/ (English y) /w/ are acquired around 2 to 3, /l/ around 3 to 4, and /r/ (trilled r) around 4 to 5.

This early acquisition of language suggests that we have some adaptations for generating spoken language, adaptations that are absent from even the closest living species. The first attempts to teach human language to chimpanzees yielded "mama" "papa" "cup" "up" -- very phonemically limited. That's in part to the top of the windpipe (trachea) sticking well into the mouth. Human babies are born with that condition, but the top soon moves downward, allowing the tongue more freedom of motion, and also making it easier to choke.

The greatest success in teaching chimpanzees language has been with sign language, and they can learn a large number of individual signs. But the most they can do beyond that is 2-sign or 3-sign compounds, like watermelon "drink fruit" and radish "cry hurt food".

Also, every well-documented full-scale human society has language, with no known exceptions. I say "full-scale" to exclude societies of deaf people. Being unable to hear makes it difficult to generate spoken language.

#### Copernicus

I would say that the ability to pick up phonemes does not depend all that much on having the same general architecture in the oral cavity. Children born with severe defects learn to compensate. The skill that human children have in rapidly acquiring speech articulation is likely a genetic predisposition.

One thing to bear in mind is that one's ability to articulate sounds is severely limited only when attempting to produce speech. That is when native phonology constrains pronunciation, and it is the main reason why adult language learners have trouble acquiring sounds not native to their dominant language. So people attempting to "speak in tongues" (glossolalia) have a very limited range of sounds that they think of as "foreign". They don't tend to produce exotic sounds that we hear in genuine foreign language articulation. If one is trying to just make noises with the vocal tract, no such limitations apply. You can produce any sound that exists in a foreign language unhindered. It is just when you try to speak what you think of as words of a real language that the articulatory "programming" kicks in automatically.

#### lpetrich

##### Contributor
Speaking the f and v sounds was enabled by eating soft foods | Internet Infidels Discussion Board

These are "labiodentals", made with one's upper teeth against one's lower lip. These sounds are enabled by having an overbite, a side effect of a diet of relatively soft foods, something made possible by relatively advanced cooking technology. Without such technology, our ancestors had more even bites, and more difficulty making labiodentals. Looking at the languages of recently low-tech peoples and looking at the protolanguages of those with more advanced technology, they are very short on labiodentals. So labiodentals were separately invented, and their presence doesn't indicate much about ancestry.

There are some other environmental influences on language features that I found out about recently. But for introduction, let us consider the concept of speech-sound sonority or perceived loudness. The most sonorous speech sound is "ah" and the least sonorous is voiceless stop consonants: p, t, k. I have found some sonority hierarchies:
The sonority scales in the literature contradict each other in some details, though they agree in overall hierarchy.
• Vowels:
• Low: a
• Mid: e, o
• High vowels: i, u
• Semivowels (glides): y, w
• Liquids:
• Rhotic: r
• Lateral: l
• Nasals: m, n, ng
• Fricatives
• Voiced: z, zh, dh, v, gh
• Voiceless: s, sh, th, f, kh
• Affricates
• Voiced: dz, dzh
• Voiceless: ts, tsh
• Stops or plosives
• Voiced: b, d, g
• Voiceless: p, t, k
• Approximants: semivowels + liquids
• Sonorants or resonants: vowels + semivowels + liquids + nasals
• Obstruents: fricatives + affricates + stops
• Continuants: vowels + semivowels + liquids + fricatives

This concept has broader value. is for syllable structure. As a general rule, syllables have rising sonority then falling sonority. There are some exceptions, like initial /s/ + stop and final stop + /s/ in English.

#### lpetrich

##### Contributor
I must mention here a rather curious ecological hypothesis for a certain type of speech sound. First some introduction about stop consonants in general. Stops have several possible voicings:
• Breathing just after the stop. Voicing variations:
• Before the stop: voiced
• Just after the stop: voiceless
• A little after the stop: voiceless aspirated
• No breathing just after the stop: ejective or glottalic
Evidence for Direct Geographic Influences on Linguistic Sounds: The Case of Ejectives

Abstract:
We present evidence that the geographic context in which a language is spoken may directly impact its phonological form. We examined the geographic coordinates and elevations of 567 language locations represented in a worldwide phonetic database. Languages with phonemic ejective consonants were found to occur closer to inhabitable regions of high elevation, when contrasted to languages without this class of sounds. In addition, the mean and median elevations of the locations of languages with ejectives were found to be comparatively high. The patterns uncovered surface on all major world landmasses, and are not the result of the influence of particular language families. They reflect a significant and positive worldwide correlation between elevation and the likelihood that a language employs ejective phonemes. In addition to documenting this correlation in detail, we offer two plausible motivations for its existence. We suggest that ejective sounds might be facilitated at higher elevations due to the associated decrease in ambient air pressure, which reduces the physiological effort required for the compression of air in the pharyngeal cavity–a unique articulatory component of ejective sounds. In addition, we hypothesize that ejective sounds may help to mitigate rates of water vapor loss through exhaled air. These explications demonstrate how a reduction of ambient air density could promote the usage of ejective phonemes in a given language. Our results reveal the direct influence of a geographic factor on the basic sound inventories of human languages.
In short, the higher the altitude where one lives, the more likely one is to use ejective stops.

The authors note six regions that are both high in altitude and relatively flat. "These regions consist of (1) the North American cordillera, including the Rocky Mountains, Colorado plateau, and the Mexican altiplano, (2) the Andes and the Andean altiplano, (3) the southern African plateau, (4) the plateau of the east African rift and the Ethiopian highlands, (5) the Caucasus range and the associated Javakheti plateau, and (6) the massive Tibetan plateau and adjacent plateaus, most notably the Iranian plateau."

"We see as well that there are eight visual clusters of languages with ejectives, highlighted via white rectangles. Two of the largest of these are located within the North American cordillera. Another is located immediately to the east of the cordillera, on the associated Colorado plateau. A fourth cluster is located just southeast of Mexican altiplano. A fifth cluster is located on the southern African plateau. The sixth and seventh clusters are located along the East African rift, on two areas of the plateau associated with this rift. The eighth cluster is located in the region of the Caucasus mountains and the Javakheti Plateau. In addition, a glance at South America reveals that a number of the languages with ejectives on that landmass are located in the Andean cordillera or on the Andean altiplano in Bolivia, as Maddieson has noted."

A good fit.

"Remarkably, then, the clusters of languages with ejectives tend to be located on or very near five of the six major non-contiguous regions of high elevation on the earth’s inhabitable surface. The only major region of high elevation where languages with ejectives are absent is the large Tibetan plateau, along with adjacent regions of high altitude. It is not particularly surprising that one region should present such an exception, and in fact it strikes us as remarkable that only one region presents an exception."

"Conversely, some of the richest areas of the world linguistically, in terms of languages and linguistic stocks, are largely devoid of languages with ejectives. The areas in question are Oceania (including New Guinea and Australia), Southeast Asia, West Africa, and Amazonia."

All without large high-altitude areas.

"More generally, the languages with ejectives in high altitude zones represent myriad language stocks including Southern Khoisan, Central Khoisan, Caucasian, Athapaskan (Na-Dene), Semitic (Afro-Asiatic), Lezgic (Nakh-Daghestanian), Armenian, Aymaran, Hadza, Mayan, Salishan, Cahuapanan, Quechuan, Siouan, Cushitic (Afro-Asiatic), Nilo-Sharan, Oto-Manguean, and Eyak (Na-Dene)."

A very mixed bag.

So ejectives likely indicate living at high altitudes or else having recently done so, rather than shared ancestry.

#### Swammerdami

Staff member
On the general topic of teeth configuration affecting one's voice ...
Speaking the f and v sounds was enabled by eating soft foods | Internet Infidels Discussion Board

These are "labiodentals", made with one's upper teeth against one's lower lip. These sounds are enabled by having an overbite, a side effect of a diet of relatively soft foods, something made possible by relatively advanced cooking technology. Without such technology, our ancestors had more even bites, and more difficulty making labiodentals.
... by coincidence my son mentioned to me just today that the famous singer Freddie Mercury has four extra teeth (mesiodens incisors). It seems he credits this configuration for his large singing range. (Top Google hits seem quite doubtful of this.)

#### lpetrich

##### Contributor
has some ecological hypotheses directly connected to sonority, mentioning climate and vegetation cover.

Frontiers | Language Adapts to Environment: Sonority and Temperature | Communication
This study looks at brief samples of spoken material from 100 languages, dividing the speech into sonorous and obstruent time fractions. The percentage of sonorous material is the sonority score. This score correlates quite strongly with mean annual temperature in the area where the languages are spoken, with higher temperatures going together with higher sonority scores. The role of tree cover and annual precipitation, found to be important in earlier work, is not found to be significant in this data.
So it's sonorants vs. obstruents that they checked on, finding a greater fraction of sonorants in warmer climates.

The authors speculate that
This result may be explained if absorption and scattering are more important than reflection. Atmospheric absorption is greater at higher temperatures and peaks at higher frequencies with increasing temperature. Small-scale local perturbations (eddies) in the atmosphere created by high air temperatures also degrade the high-frequency spectral characteristics that are critical to distinguishing between obstruent consonants, leading to reduction in contrasts between them, and fewer clusters containing obstruent strings.
That hypothesis can be tested by considering long-distance propagation of sound. In arenas and stadiums, it does not get distorted very much, and it's only long-distance sound that sounds noticeably muffled, with much less high-frequency parts reaching our ears than low-frequency parts. That high-frequency parts were present in the original is evident from relatively close thunder, so it takes several kilometers of propagation before increased extinction of high-frequency sound becomes apparent.

Sonority and Climate in a World Sample of Languages: Findings and Prospects - John G. Fought, Robert L. Munroe, Carmen R. Fought, Erin M. Good, 2004
In a world sample (N = 60), the indigenous languages of tropical and subtropical climates in contrast to the languages spoken in temperate and cold zones manifested high levels of sonority. High sonority in phonetic segments, as found for example in vowels (versus consonants), increases the carrying power of speech sounds and, hence, audibility at a distance. We assume that in the course of daily activities, the speakers in warm/hot climates (a) are often outdoors due to equable ambient temperatures, (b) thereby frequently transmit messages distally, and (c) transmit such messages relatively intelligibly due to the acoustic and functional advantages of high sonority. Our conceptual model is similar to that of population biology, where there are well-known correlations between climate and somatic variables, and where it is assumed that communicative modalities and behaviors are selected or designed for success in specific habitats. We also take up possible alternative hypotheses and consider directions for future research.
I find that much more plausible than air attenuation, because sound follows the inverse-square law in addition to being absorbed or scattered as it travels.

#### lpetrich

##### Contributor
Climate, Econiche, and Sexuality: Influences on Sonority in Language - EMBER - 2007 - American Anthropologist - Wiley Online Library
also at
untitled - Climate_Econiche_and_Sexuality_Influence.pdf
Previous cross-cultural research by Robert Munroe and colleagues has linked two features of language to warm climates—a higher proportion of consonant-vowel syllables and a higher proportion of sonorous (more audible) sounds. The underlying theory is that people in warmer climates communicate at a distance more often than people in colder climates, and it is adaptive to use syllables and sounds that are more easily heard and recognized at a distance. However, there is considerable variability in warm as opposed to cold climates, which needs to be explained. In the present research report, we show that additional factors increase the predictability of sonority. We find that more specific features of the environment—such as type of plant cover and degree of mountainous terrain—help to predict sonority. And, consistent with previous research on folk-song style, measures of sexual restrictiveness also predict low sonority.
Vegetation density is negatively correlated with sonority in warm climates but positively in cold climates. Being more sexually restrictive is also negatively correlated with sonority.

The fraction of consonant-vowel (CV) syllables varies among languages, with some having that as the only allowed kind with consonants and with others allowing consonant clusters at both ends.

Like sonority, CV fraction is also larger in warm climates, and it is also negatively correlated with sexual restrictiveness. But it had only a weak correlation with vegetation density. CV fraction has a positive correlation with holding of babies, and a negative correlation with literacy.

#### lpetrich

##### Contributor
Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots
Considers how languages with "complex tone" are distributed, those with at least three phonemic tones, tones that distinguish words. Chinese is the best-known tonal language. Compares that to "simple tone", with two phonemic tones, and no phonetic tone.

No tone was the most common in low-humidity areas, and complex tone was the most common in high-humidity areas, like sub-Saharan Africa, Southeast Asia, and New Guinea. It is rare in the Americas, mainly found in southern North America and in the Amazon basin.

Languages in Drier Climates Use Fewer Vowels
Finding the "vowel index", the fraction of sounds that are vowels. The authors used absolute humidity, not relative humidity, and cold climates were thus coded as low-humidity ones. Climate as a separate variable would have been interesting.

There was a correlation, but not a very big one. By languages, the line goes from (humidity - vowel index) (0, 0.39) to (0.02, 0.49) with a scatter of 0.1. By language families, it's (0, 0.38) to (0.02, 0.51) with a scatter of 0.05 - 0.1.

#### Politesse

##### Lux Aeterna
Alright, repeat after me, class:

"CORRELATION..."

"...DOES NOT PROVE CAUSATION!"

"I CRAFTED AN UNFALSIAFIABLE 'THEORY' TO EXPLAIN THE DATA SET I ALREADY POSSESSED..."

"...AND IT WAS JUST SO!"

#### lpetrich

##### Contributor
I agree that one has to be careful about the direction of causation.

As to what may cause the correlations that I'd mentioned earlier, in a warm climate, thick vegetation may muffle sound going through it, and otherwise make it difficult to do long-range interaction, while in a cold climate, relatively thick vegetation means being warm enough to allow it to grow, and thus warm enough to enable being talkative outdoors. The authors coded by number of cold months, without precisely defining "cold". Below freezing? Halfway between room temperature (20 C) and freezing? (10 C)

As to literacy being inversely correlated with sonority, it seems like a side effect of what places have the most advanced economic development: temperate climates. David Landes's book "The Wealth And Poverty Of Nations" proposes that before electrically-powered air conditioning, tropical and subtropical climates were too hot for doing a lot of work, and as a result, a midday siesta is a time-honored custom in some such places.

So with sonority and literacy, they have a shared cause, climate, rather than one causing the other.

#### lpetrich

##### Contributor
In dry climates, one may want to avoid losing water in one's breath, and that may account for greater fractions of consonants.

Languages Support Efficient Communication about the Environment: Words for Snow Revisited
Abstract:
The claim that Eskimo languages have words for different types of snow is well-known among the public, but has been greatly exaggerated through popularization and is therefore viewed with skepticism by many scholars of language. Despite the prominence of this claim, to our knowledge the line of reasoning behind it has not been tested broadly across languages. Here, we note that this reasoning is a special case of the more general view that language is shaped by the need for efficient communication, and we empirically test a variant of it against multiple sources of data, including library reference works, Twitter, and large digital collections of linguistic and meteorological data. Consistent with the hypothesis of efficient communication, we find that languages that use the same linguistic form for snow and ice tend to be spoken in warmer climates, and that this association appears to be mediated by lower communicative need to talk about snow and ice. Our results confirm that variation in semantic categories across languages may be traceable in part to local communicative needs. They suggest moreover that despite its awkward history, the topic of “words for snow” may play a useful role as an accessible instance of the principle that language supports efficient communication.

Franz Boas observed that certain Eskimo languages have unrelated forms for subtypes of snow (e.g. aput: snow on the ground, qana: falling snow), and thus subdivide the notion of snow more finely than English does
Many of the Inuit languages' words for snow are words for different kinds of snow, and many of them are derived from only a few roots, like the ones mentioned here. They are thus comparable to the compound words for kinds of snow that English has: snowpack, snowdrift, snowstorm, ...

The authors then address some alternative causes.
For example, it is known that complexity of the lexicon in several semantic domains correlates with societal complexity [27], and societal complexity tends to be lower in regions near the equator [28]. Thus, languages spoken in warm regions might tend to have fewer and broader semantic categories generally, not just for ice and snow. Moreover, the link between temperature and ice/snow might be only rather weakly significant relative to other comparable links in the same dataset.
So they checked a lot of words for closely related concepts, and they found only two that were greater than ice/snow: man/male-animal and air/wind.

As to how much snow and ice are mentioned, they looked at statistics -- the lower the average temperature the more the mentions.
As it happens, a recent study has argued that Boas’ original claim was in fact correct. Krupnik and Müller-Wille [5] have argued, contra Pullum, and on the basis of several empirical datasets, that “the English vocabulary for snow and related phenomena is clearly inferior to those recorded in several Eskimo/Inuit languages and dialects” (p. 391). They argue further that this phenomenon is not limited to Eskimo/Inuit languages, but also extends to other languages spoken in cold climates where snow is common, such as Russian. They illustrate this point with several Russian lexemes, including one that interestingly captures the absence of snow where it might be expected: “protalina (open ground where the snow has melted)” (p. 394). Finally, they suggest that the entire debate has been somewhat empirically misdirected, in that Eskimo languages tend to exhibit a richer vocabulary for types of sea ice than for types of snow—and that a truly rich snow vocabulary may be found elsewhere, among the Norwegian Sámi.
I recall in another forum someone listing several compound words for kinds of snow in Swedish, and English has a sizable snow vocabulary.

#### lpetrich

##### Contributor
As to snow and ice not being well-distinguished in some languages, there are some that had no words for either until recent centuries. John Locke: An Essay Concerning Human Understanding
If I myself see a man walk on the ice, it is past probability; it is knowledge. But if another tells me he saw a man in England, in the midst of a sharp winter, walk upon water hardened with cold, this has so great conformity with what is usually observed to happen that I am disposed by the nature of the thing itself to assent to it; unless some manifest suspicion attend the relation of that matter of fact. But if the same thing be told to one born between the tropics, who never saw nor heard of any such thing before, there the whole probability relies on testimony: and as the relators are more in number, and of more credit, and have no interest to speak contrary to the truth, so that matter of fact is like to find more or less belief. Though to a man whose experience has always been quite contrary, and who has never heard of anything like it, the most untainted credit of a witness will scarce be able to find belief.

The king of Siam. As it happened to a Dutch ambassador, who entertaining the king of Siam with the particularities of Holland, which he was inquisitive after, amongst other things told him that the water in his country would sometimes, in cold weather, be so hard that men walked upon it, and that it would bear an elephant, if he were there. To which the king replied, Hitherto I have believed the strange things you have told me, because I look upon you as a sober fair man, but now I am sure you lie.

An interesting issue about snow vs. ice is whether or not the words are related, like "dust ice" vs. "rock ice". English "ice" and "snow" don't look related, and looking at etymologies in wiktionary.org reveals that they are not. That was true in most other languages and language families that I looked at in detail, like Germanic, Latin/Romance, Slavic, and Greek and their superfamily Indo-European, also Uralic, Turkic, Mongolian, Japanese, Chinese, Eskimo, and Semitic.

Tibetan may be an exception, with gangs "snow" and 'khyags pa "ice".

For the Semitic languages, words for snow come from Proto-Semitc *talq-, while Hebrew, Aramic, and Arabic words for ice look like they were borrowed from Latin gelidus "icy".

Wiktionary has some entries for some Eskimo/Inuit languages: Greenlandic and Inuktitut. For Greenlandic, aput "snow (on the ground)", siku "ice (on water)", sermeq "ice (on ground)". For Inuktitut, aput "snow (in general)", mauja "deep soft snow", siku "ice".

I looked in snow - Wiktionary and ice/translations - Wiktionary

#### lpetrich

##### Contributor
On the universal structure of human lexical semantics
Using data from "polysemy", words having multiple meanings distinguished by context. Which sets of multiple meanings do words tend to have?

For example, English "moon" and "month". They and their Germanic cognates are from Proto-Germanic *mênô and *mênôths, in turn from PIE, with *mêh1ns for both words. Many other languages have the same word for the celestial body and the period of time.

The authors found a sematic network of concept related by polysemies, having the same word in some languages. Some of the strongest relationships:
• Moon - month
• Sun - day/daytime
• sky - heaven
• wind - air
• earth/soil - ground, country
• earth/soil - dust - ash(es)
• smoke - mist
• stone - mountain - hill
• water - liquid
• river - stream
• lake - pond

They state their results in their "Significance" section:
Semantics, or meaning expressed through language, provides indirect access to an underlying level of conceptual structure. To what degree this conceptual structure is universal or is due to properties of cultural histories, or to the environment inhabited by a speech community, is still controversial. Meaning is notoriously difficult to measure, let alone parameterize, for quantitative comparative studies. Using cross-linguistic dictionaries across languages carefully selected as an unbiased sample reflecting the diversity of human languages, we provide an empirical measure of semantic relatedness between concepts. Our analysis uncovers a universal structure underlying the sampled vocabulary across language groups independent of their phylogenetic relations, their speakers’ culture, and geographic environment.

#### lpetrich

##### Contributor
Larger communities create more systematic languages
Research over the past decade has suggested that linguistic diversity may result from differences in the social environments in which languages evolve. Specifically, recent work found that languages spoken in larger communities typically have more systematic grammatical structures. However, in the real world, community size is confounded with other social factors such as network structure and the number of second languages learners in the community, and it is often assumed that linguistic simplification is driven by these factors instead.
Testing that result by having people invent languages for communicating with each other. Their procedure:
Participants were asked to create a fantasy language and use it in order to communicate about different novel scenes. Participants were not allowed to communicate in any other way besides typing, and their letter inventory was restricted: it included a hyphen, five vowel characters (a,e,i,o,u) and 10 consonants (w,t,p,s,f,g,h,k,n,m), which participants could combine freely.

The experiment had 16 rounds, comprising three phases: group naming (round 0), communication (rounds 1–7; rounds 9–15) and test (round 8; round 16).
Then going into a lot of further detail.

They find, from the abstract:
Here, we show that in contrast to previous assumptions, community size has a unique and important influence on linguistic structure. We experimentally examine the live formation of new languages created in the laboratory by small and larger groups, and find that larger groups of interacting participants develop more systematic languages over time, and do so faster and more consistently than small groups. Small groups also vary more in their linguistic behaviours, suggesting that small communities are more vulnerable to drift. These results show that community size predicts patterns of language diversity, and suggest that an increase in community size might have contributed to language evolution.