• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

From Wiktionary, the free dictionary

Proto-Germanic *hlaibaz "bread" > Old English hlâf "bread" > English "loaf" meaning "block of bread or something similar"
Also > Proto-Slavic *khlebu > Russian khleb among others
Proto-Slavic *xlěbъ is actually an example of a word presumed to be a loan from Germanic into Proto-Slavic, but it's not a particularly informative one for dating sounds shifts and/or contact situations since it doesn't show, and isn't expected to do so, any of the sound changes in question (except the "fall off the yers", ie dropping of the short lax high vowels thanks to which Proto-Slavic had almost exclusively open syllables and very restricted consonant clusters while modern Slavic languages have among the fewest restrictions on consonant clusters among extant IE languages - but that change happened late enough to be documented in writing, ie Old Church Slavonic still had yers in the late 9th century).

 
Returning to macrolinguistics, I note  Vladimir Orel who worked on an etymological dictionary of Afroasiatic.
Orel also dealt with the Indo-European languages, especially the Balto-Slavic, Germanic, Albanian, and Celtic branches. He also took interest in Semitic languages, Hebrew in the first place, and more broadly in Afroasiatic languages as a whole, where lie his most controversial results. Through collaboration with Olga Stolbova he published Hamito-Semitic Etymological Dictionary (1995) which on one hand brought a number new sub-lexical comparisons, especially Semitic-Chadic. On the other hand, the value of the benefits of reduced transcriptions used and inaccurate translations, absence of primary sources for non-written languages, and especially countless pseudo-reconstructions formulated ad hoc often on two or even a single word were seriously frowned upon by specialists, who also pointed out other serious errors in the work (especially in Cushitic material, as well as not neglecting the massive amount of Arabic loanwords in Berber languages).
So there are problems with his reconstruction of Afrasian.

Etnograficheskoe obozrenie :: №4 :: The Difficulties of Reconstructing the Cultural Lexicon for a Macrofamily-Level ProtoLanguage (Based on the Afrasian Example) by George Starostin
Abstract:
The paper offers a critical analysis of several Afrasian etymologies with presumably “military” semantics, put forward by Alexander Militarev. The conclusion is that these etymologies typically suffer from multiple problems, such as lack of proper attention to the historical typology of semantic shifts and insufficient consideration for the distribution of potential reflexes in daughter languages. Because of this, the reconstructability of a large Proto-Afrasian lexical layer of specifically “military” terms remains questionable – a t least not until such a reconstruction has been diligently conducted on each of the chronological levels preceding Proto-Afrasian (particularly on the various intermediate levels of the Cushitic family, since only a secure reconstruction of any select etymon on the Proto-Cushitic level can in turn properly guarantee its Proto-Afrasian status).
From the full paper,
In modern comparative studies, the hypothesis of Afroasiatic relationship, from the point of view of the “basic” material, is in a much more favorable position than, for example, the Nostratic hypothesis of Vladislav M. Illich-Svitych or the Sino-Caucasian hypothesis of Sergei A. Starostin: comparative material confirming it from the field of grammatical paradigmatics and basic vocabulary, although not numerous, is still usually considered sufficient for the world linguistic community to accept as a given the descent of the Semitic, Berber, Chadic, Egyptian and Cushitic languages from a common linguistic ancestor (some doubts have recently been expressed only regarding the Omotic languages).
Paradigmatic - especially the pronouns, both independent and bound: possession, verb conjugations
However, reconstruction of the Proto-Afroasiatic cultural vocabulary and, in general, Afroasiatic etymology as such, are in a much more difficult situation. None of the two Afroasiatic etymological dictionaries published to this date (by Christopher Ehret and by Vladimir Orel and Olga Stolbova) enjoys significant authority, and although in general the dictionary of Orel and Stolbova is significantly stronger than the works of Ch. Ehret from the point of view of the quality of phonetic and semantic comparisons, most of the etymologies presented in it can be considered “raw material” for building a full-fledged dictionary rather than the final product.
I checked on VO and OS, and they worked in Moscow universities, so they are Moscow-school macro-linguists, like Vladislav Illich-Svitych, Aharon Dolgopolsky, Sergei and George Starostin, etc.

So is this some Muscovite thinking that some fellow Muscovites are doing good work because they work most like how he works?
 
Afroasiatic Comparative Lexica: Implications for Long (and Medium) Range Language Comparison
One way to test the reliability of the comparative method would be to undertake the following experiment. Take a set of languages for which a relationship has been suggested, but for which regular sound correspondences and a reconstructed phonemic system of the proto-language have not yet been established. Furnish two libraries on opposite sides of the world with all the available and relevant information on the languages (dictionaries, grammars, texts). Take two researchers trained in the comparative method, put them in the libraries, keep them in isolation from each other and see what they come up with. If it is a reliable procedure then two trained practicioners of it confronted with the same body of data should come up with broadly similar results-- repeatability of experiments should be expected as in natural science. Unfortunately, as so often in linguistics, ethical considerations prevent us from subjecting real human beings to such an experiment.

The world of comparative linguistics is fortunate, therefore, that something very close to this experiment came to be performed by accident.
Referring to the Ehret and Orel-Stolbova etymological dictionaries of Afroasiatic or Afrasian.

E: Semitic, Egyptian, Cushitic, Omotic, Chadic, Berber, but ignoring Berber
OS: breaking Cushitic up into Beja, Agaw, “East Cushitic,” Dahalo, Mogogodo, Rift

"A further difficulty is that E often gives reconstructed forms only without attestation of actual language data."

E and OS disagree on proto-phonology and on sound correspondences. On 2-consonant vs. 3-consonant roots -- E: 3C ones are all 2C ones with fossilized suffixes -- OS: some 2C ones are 3C ones with dropped-out sounds, some 3C ones have fossilized prefixed.

Even when the two agree on cognate sets, they differ in reconstruction. For instance, "to die" and related words -- Proto-Semitic *mawut-, Egyptian mwt, Berber mmt, Hausa (Chadic) mutù -- E maaw OS mawut -- this meaning is one of the most stable ones.

E and OS agree on only a small fraction of entries: 59, 6% of E, 2% of OS -- OS propose 2.5 * as much as E.
Die-hard opponents of long-distance comparison may gleefully leap to the conclusion that the method is 94 to 98% inaccurate even at medium depths, but such a conclusion would be premature. Still the fact remains that two sets of scholars have been able to reconstruct mutually unrecognizable proto-languages, and this demands an explanation.
One or the other of the two must have found lots of spurious correspondences -- or both of them did.

Reviewer Robert R. Ratcliffe noted that if semantics are loose enough, then that makes coincidences very likely. He used as an example different kinds of birds - OS have 52 sets of putative cognate sets with the meaning "bird" - the sets are mostly of different kinds of birds, like:
Egyptian "falcon" ~ Central Chadic "vulture", "hen" ~ Eastern Chadic "great bustard" ~ Agaw "kind of bird"
Semitic "parrot" ~ Western Chadic "quail"
Central Chadic "hawk" ~ Eastern Chadic "dove"
Berber "butterfly, small bird" ~ Chadic "guinea fowl" ~ Beja "pelican"

On average, OS reconstructs 10 cognate sets for each member of the Swadesh 100-word list; 52 for "bird" is the largest number. By comparison, E did not reconstruct most of the entries in that list, "bird" and most others. E preferred to reconstruct verbs, while OS preferred to reconstruct nouns.

A further problem: E used an Arabic etymological dictionary for Semitic, and that gave him numerous derived forms to choose from - why not some Proto-Semitic one?


Reviewer RRR summed up with noting several problems, like looking in several langs without attempting to reconstruct subfamily protoforms, and looking at different times in langs with long histories, like Egyptian. Also the atypical nature of the reconstructions, like numerous synonyms, and E's reconstruction of the basic vocabulary as mostly abstract sorts of verbs. "In general the underived, basic vocabulary of a language and specific and concrete, while abstract words are formed by derivation."
 
Jabal al-Lughat: The unreliability of Afroasiatic etymologies - blog name "Climbing the Mountain of Languages"

Using Alexander Militarev's etymologies in Tower of Babel Databases - finding numerous problems.
Suspiciously many entries are listed as having a cognate in only one Berber language (eg earth, hide, skin, run away). ...

In several cases, a single proto-Berber root is split across several AA ones, due to mistaken sound correspondences. ...

Similarly, unrelated forms may be grouped together due to accidental similarity ...

Another problem is undetected loans; this applies especially in sub-Saharan Africa, where little work has been done on their impact ...

Interestingly, most of the problem cases I've noticed in this quick skim are related to agricultural terminology. I wonder if that has anything to do with the particular interest of such terms for archeologists motivating a more intense search for cognates.
 
On a positive note, I've found some positive things on Austro-Tai.

Phylogenetic evidence reveals early Kra-Dai divergence and dispersal in the late Holocene | Nature Communications - its origin time is around 2000 BCE / 4000 BP.

We have a correlation between language and archeology for the origin of the Austronesian family in the arrival of the  Dapenkeng culture on Taiwan around 4000 - 3000 BCE / 6000 - 5000 BP

I'd earlier mentioned Austro- Tai revisited (2013) by Weera Ostapirat

He used a 24-word test list of very stable vocabulary: blood, bone, ear, eye, hand, nose, tongue, tooth, dog, fish, horn, louse, fire, stone, Sun, water, I, thou, one, two, to die, name, full, new

Old Chinese and Tibeto-Burman have 15 matches, and Austronesian and Kra-Dai 14 matches. But between them, much fewer, only 2 or 3.

He then addresses the question of whether KD is an offshoot of AN and he decides that it is not, and after that, sound correspondences.

Kra-dai and Austronesian: notes on phonological correspondences and vocabulary distribution
Schlegel and Benedict base their arguments on lexical similarities and call attention to some lexical items which do appear common to Tai and Austronesian. Unfortunately for the Austro-Tai case, many additional far less convincing relationships are presented by Benedict (1975, 1990), who not infrequently resorts to loose resemblances, semantic leaps and to a practice known as ‘proto-form stuffing’ – the making up of maximal earlier forms to account for all desired modern cognate relationships.
 
Macrophyletic Trees of East Asian Languages Re-examined by Weera Ostapirat

First noting the issue of the earliest branchings in Sino-Tibetan. (Chinese, Tibeto-Burman)? Chinese from within "Tibeto-Burman"? Like being closest to Bodic, a branch that includes Tibetan.

He then mentions his 24-word test, and his comparison of more-stable and less-stable 100 items each of the 200-item Swadesh list. Proto-Tai with:
  • Austronesian: 21, 8
  • Old Chinese: 6, 20
Austro-Tai is well-supported, but Sino-Tai isn't -- that one is most likely borrowings.

After that, WO takes on Miao-Yao (Hmong-Mien) numerals. Those >= 10 are from Chinese, those from 4 to 9 are from some Tibeto-Burman source except maybe for 5, and those from 1 to 3 most likely related to Austroasiatic.

Given that words for numbers go from native to borrowed, that supports AA-MY/HM. Also supporting that macrofamily is some basic vocabulary, like "name" and "nose".

But he's noncommittal about Austric: AA, MY/HM, KD, AN, saying that it needs further research.
 
Papers from the 30th Annual Meeting of SEALS - JSEALS_Special_Publication_8_SEALSXXX.pdf
EW EVIDENCE FOR AUSTRO-TAI AND OBSERVATIONS ON VOWEL CORRESPONDENCES
Alexander D. SMITH
In this paper, a newly created list of Kra-Dai and Austronesian shared lexical items is presented, along with already-proposed shared lexical items from earlier works resulting in a list of some 71 shared lexical items which may be ultimately inherited from a putative Proto-Austro-Tai language. The list is also used to analyze correspondences between Kra-Dai and Austronesian vowels in final syllables. It is shown that several regular correspondences are shared by the two families and that these regular correspondences most likely date back to a shared common ancestor.
Assessing the work of Paul Benedict some decades ago, author AS asks:
(1) Can the comparisons be reconstructed to at least one primary-level protolanguage in both AN and KD?

(2) Based on our current understanding of sound correspondences, are the comparisons regular?

(3) Are the proper syllables being compared between the two groups (specifically, do the KD monosyllables correspond to AN final-syllables)?
(1) AS rejected some of PB's comparisons as only using Indonesian and not Proto-Malayo-Polynesian or Proto-Austronesian.

(2) and (3) Proto-AN had two-syllable roots while Proto-KD had one-syllable roots. But with the stress on the final syllable, the initial syllable dropped out. PB sometimes compared a KD root to the first syllable of an AN one, and AS was careful to omit such comparisons.

About 1 to 10, he notes final-syllable preservation in Buyang, a Kra lang, but I checked for myself in The Numbers List and Numeral Systems of the World and while that is indeed correct for some members, it is not universally correct. Borrowings? Reworkings?

He accepts some additional comparisons, and he finds sound correspondences for vowels, adding to Weera Ostapirat's ones for consonants.

Finally,
With an increase in comparisons between AN and KD comes an increase in our understanding of the sound correspondences which exist between the two families. Although irregularity certainly exists, there is a high amount of regularity as well.

...
The AT Hypothesis remains a tentative hypothesis, although the evidence in its favor continues to grow. The evidence for a special relationship between AN and KD is both of a higher quality and quantity now than any time in the past, and it is hoped that more research in the area will help us understand the precise nature of this relationship.
 
ESTABLISHING GENETIC RELATIONSHIP BETWEEN LANGUAGE FAMILIES IN SOUTHEAST ASIA ON A MORE SOLID LINGUISTIC BASIS on JSTOR by Paul Jen-kuei Li
This is an evaluation of the various hypotheses regarding the possible genetic relationships between different language families in Southeast Asia, including Sino-Tai, Austro-Tai, Austro-Kra-Dai, Austric, Sino-Austronesian, and Sino-Miao-Yao. In order to establish reliable genetic relations between different language families, we need more solid linguistic evidence to distinguish between true cognates and loanwords. Vocabulary gets borrowed easily, whereas morphology is the most resistant to change. The genetic relationship between Chinese and Tibetan-Burman languages is well established, and perhaps so are Austric and Austro-Tai or Austro-Kra-Dai. However, there is not much chance for the genetic relationships of Sino-Tai, Sino-Austronesian or Sino-Miao-Yao.
Austric - was he discussing Narrow Austric or Broad Austric?
Narrow Austric = Austroasiatic + Miao-Yao
Broad Austric = Narrow Austric + Austro-Tai
 
Last edited:
To get back to number bases again, I decided to look for words for 100 to see how common it was to invent words for this number.

First I note this oddity. English "hundred" < Proto-Germanic *hundaradan < *hundan (inherited from PIE) + *radan "count". For distinguishing the word from *hundaz "dog"?

  • Indo-European 10 *dekm "two hands"? 100 *kmtóm "big ten"? 1000 *ghéslom "full hand"? *tuHsont- * "swollen hundred"? 10,000 Greek murios
  • Turkic 10 on 100 *yür' 1000 *bing
  • Mongolic 10 *harban 100 *jaxun (dZakhun) 1000 *mingan 10,000 Mongolian tum
  • Cherokee 10 sgohi 100 sgohitsiqua
  • Navajo 10 neeznáá 100 neeznádiin
  • Quechua 10 chunka 100 pachak 1000 waranqa
  • Sino-Tibetan 10 *gip, *tsi ~ *tsyay 100 *brgya, 1000 *stawng 10,000 Old Chinese *m(l)ans
  • Abkhaz-Abaza 10 *Zwá 100 *Swé 1000 *zaké
  • Nakh 10 *itt 100 Chechen-Ingush *bia
  • Austronesian 10 *puluq 100 *gatus 1000 Malayo-Polynesian *ribu
  • Dravidian 10 *paHtu 100 *nûtu
  • Semitic 10 *asar- 100 *mi/at- 1000 */alp- 10,000 Hebrew rebaba, Aramaic rebbuta
  • Egyptian 10 medju 100 shet 1000 kha 10,000 djeba
  • Hausa 10 gōmà 100 ɗàrī 1000 dubū
  • Zulu 10 ishumi 100 ikhulu
I found a heck of a lot of borrowings of relatively high numbers. Finno-Ugric *seta 100 from early Iranian, Korean baek and Japanese hyaku, both 100, from Middle Chinese paek, ... not sure which was borrowed from which in Turkic *bing and Mongolic *mingan 1000.

Base 10 seems close to universal for counting past squares of one's number base. The main exceptions I know of are the Mayans, with base 20, and the Sumerians, with base 60.
 
On numeral complexity in hunter-gatherer languages (2012)

I had to read it carefully to understand it.

Australian, South American, and African hunter-gatherers usually have low-limit number systems, featuring counting up to 3 or 4, sometimes 1 or 2, sometimes 5 or 6. The exception is North American ones, sometimes going to 100 or more, and a difference may be needing to store food for surviving in the winter. Farmers usually have higher limits, though with a lot of overlap. Quechua goes up to 1000, but Quechua was the language of the Inca empire.

The authors also found the highest "atomic" number words, atomic meaning unanalyzable. South America has mainly 2, while North America is divided between 2 and 4 and Australia 2 and 3. Are North American systems recent elaborations?

By comparison, Indo-European 1 to 10 are all atomic.

A common form of elaboration is to use a number base. The highest base: (none), 2, 3, 4, 5, 10, 16, 20, 25, with varying distributions in the different study regions. North Americans usually use base 10, and to a lesser extent, base 20. In general, higher bases ~ higher upper limits. Of the lower bases, Australians preferred 2 and South Americans 5, though base 2 is implmemented as adding lots of 2's, in all the examples I've seen.

There is one case with bases 2 and 5: 1, 2, 2+1, 2+2, 5, 5+1, 5+2, ... Huaorani of South America, the only one which goes past 10. Sandawe in Africa has bases 5 and 10, and in North Asia, Ket and Ainu have bases 10 and 20.
 
Thus, while we may observe that restricted systems tend to cluster in regions in which hunting and gathering are focal subsistence practices (Australia, South America, southern Africa), it is difficult to tease out the relevance of subsistence, as opposed to other regional trends, to numerosity. Moreover, the fact that relatively complex numeral systems exist widely among hunter-gatherers in North America, Siberia, and other parts of the world indicates that agriculturally oriented subsistence is not required for the development of numeral complexity.
My food-storage hypothesis would explain this discrepancy very nicely. In cold climates, one has to survive winters, times when it's too cold to do very much, while in warm climates, that is not so necessary. Warm climates may have dry seasons, however.

"A likely variable associated with numerosity is exchange." - one might trade animal skins for spearheads, for instance.

Then a section on "Etymological sources of numeral terms". 1 and "alone" or "this" are often related. That reminds me of Merritt Ruhlen's claim that *tik often means 1 and/or "finger, to point". But words for 2 can't be analyzed. In "base 2" systems, common in Australia, one counts 1, 2, 2+1, 2+2, 2+2+1, 2+2+2, ... "Hand" > 5 and "foot" > 10 are common.

In North America, however, 1 to 3 are unanalyzable, 4 is derived from 2 in some of the langs, and 5 is often "hand", with 6 to 9 often derived from other ones. In Africa, some number terms are derived from words for fingers, like some Bantu above 5.

But in South America, non-numerical etymologies are common, like 2 ~ "eye" and "deer footprint" and 3 ~ "rhea bird footprint" (three toes) and "rubber tree seed". Words for 4 often involve social relationships.

"We therefore emphasize the possibility that, despite the enormous variation among numerals in contemporary hunter-gatherer languages, such restricted systems were much more common throughout most of human history."
 
Regarding Austro-Tai:

Many years ago I read of a hypothesis that Daic is descended from an Austronesian language of the Philippines. Some ancient group migrated back to the mainland, and managed to survive despite being surrounded by non-Austronesians.

IIRC, at least two mainland adstrata are identified in Daic. (That's not including the extensive and more recent(?) borrowings from Sanskrit/Pali.)
 
First I note this oddity. English "hundred" < Proto-Germanic *hundaradan < *hundan (inherited from PIE) + *radan "count". For distinguishing the word from *hundaz "dog"?
Presumably the similarity between "hundan" and "hundaz" implies a common origin, likely in the ancient mythological saga of the Hundred and One Dalmatians.
 
Yes, I've seen that hypothesis myself, and if Austro-Tai is supported well enough, then that support can be used to test that sort of hypothesis. Weera Ostapirat has attempted to do that in "Austro-Tai Revisited", and he concludes that Kra-Dai and Austronesian are separately descended from Proto-Austro-Tai instead of KD branching off from inside of AN.

Looking in Austro-Tai Revisited | WO and Wiktionary, the free dictionary and Austronesian Comparative Dictionary - PAN Index and "Tower of Babel" StarlingDB Databases


A difficulty in doing historical linguistics with Thai and related langs is its lack of word morphology:  Thai language

Nouns have no plural form, and one can indicate plurality with phuak "many" and indicate collectives by "reduplication" as linguists call it: dek "child", dek dek "group of children".

Possession is indicated with khong "of, belonging to" - "my" is literally "of me".

Adjectives do not change form when used as adverbs, thus no counterpart of English "-ly". They follow what they modify. Adjectives can also act as verbs: "to be <adj>". Comparatives are done with (what) <adj> kwa (ref), much like English "more <adj> than" and unlike English "-er", and superlatives with <adj> thi sut, much like English "the most <adj>" and unlike English "-est".

Classifiers / measure words are used with some kinds of adjectives, like numerals.

Verbs have no personal conjugations, and tenses and aspects and the like are done with adverbs and auxiliary verbs: mueawan "yesterday", phrungni "tomorrow", kamlang (before: ongoing action), yuu (after: state of ongoing action), dai (before: past/completed action, after: potential action), laeo "already" (after: state of being completed), cha (before: future), cha <verb> laeo (future state of being completed), thuk (before: passive voice), mai "not" (before: negation), ...

Verbs are often chained, and chained without function words. (PDF) Basic serial verb constructions in Thai is a linguistic analysis, and it has examples like

yaak kin -- want eat -- he wants to eat

The usual word order is subject-verb-object.

Looking at examples of Thai in pages on Thai grammar, and using Google Translate to translate into Thai, I couldn't find any suppletion. Nothing like good-better-best or go-went-gone or be-is-are-was.
 
Thai pronouns do not change form for different grammatical/syntactical roles, though they vary a lot by social context. Thanx to Swammerdami for pointing out the low status of the ancestral ones.

 Thai language
กู ku [kūː] I/me (informal/impolite)
มึง mueng [mɯ̄ŋ] you (informal/impolite/vulgar)

กู - Wiktionary, the free dictionary
กู -- k ū, guu, ku, /kuː˧/ -- (now considered vulgar and offensive) a first person pronoun: I.

From Proto-Tai *kuːᴬ (“singular first-person pronoun (weak)”).
Cognate with Lao ກູ (kū), Lü ᦅᦴ (kuu), Tai Dam ꪀꪴ, Zhuang gou.
Compare Proto-Hlai *ɦuː (“I”), Proto-Austronesian *aku (whence Malay aku, Tagalog ako and Javanese aku).

An Austro-Tai comparison without mention of AT itself.

มึง - Wiktionary, the free dictionary
มึง -- m ụ ŋ, mʉng, mueng, /mɯŋ˧/ -- (vulgar, derogatory, offensive) a second person pronoun: you.

From Proto-Tai *mɯŋᴬ (“singular second-person pronoun (weak)”).
Cognate with Lao ມຶງ (mưng), Lü ᦙᦹᧂ (mueng), Tai Dam ꪣꪳꪉ, Zhuang mwngz.
Compare Proto-Hlai *C-mɯː (“you (singular)”).

Looking in Starling Databases I've found

Tai-Kadai 100-wordlists : Query result - page 3 - look under "I"
My reconstruction *kV
Tai-Kadai 100-wordlists : Query result - page 5 - look under "thou"
My reconsrtuction *mV

V = vowel

Both of them have Proto-Zhuang-Tai and Proto-Kam-Sui but no further reconstructions of Kra-Dai (Tai-Kadai). The source for this database is work by  Ilia Peiros from 1998.

Weera Ostapirat 2013:
1sg: *aku:
2sg: *tsu, *amu:

 Proto-Austronesian language
1sg: *i-aku
2sg: *i-(ka)Su
2pl: *i-kam
 
A difficulty in doing historical linguistics with Thai and related langs is its lack of word morphology:  Thai language

Yes, I think Thai may be the most isolational (or analytic -- what exactly is the difference?) language that there is!
Chinese is the "go-to example" of isolational language, but one can find journal articles comparing "grammaticalization" in Chinese and Thai. It is Chinese which is grammaticalizing more.

Nouns have no plural form, and one can indicate plurality with phuak "many" and indicate collectives by "reduplication" as linguists call it: dek "child", dek dek "group of children".

Thai has no inflections, no conjugations. There are a few commonly used prefixes, e.g. to convert verbs or adjectives into nouns.

Many markers which are obligatory in English are optional and frequently omitted. My wife has relayed stories about an acquaintance's sibling. She didn't know if the sibling was a brother or sister (though she knew he/she was older than the acquaintance!). (In many contexts to speak of a "sibling" in English would mystify: "why is speaker keeping the gender secret?")

I've been a bystander in conversations between native speakers of the same dialect where lack of adverb or helping verb left confusion. As just one example, on a visit to the health clinic there was a question about a "vaccine on the 15th of the month," with question and answer leaving all mystified. It was on the ride home that I suddenly realized: Nurse was asking about NEXT month; Wife answered about LAST month. Another silly exchange I overheard was between two government officials.

thuk (before: passive voice)

This is an example of grammaticalization mentioned in afore-mentioned journal article.

The passive voice marker ถูก (tùuk) is only used when the target of the action undergoes something unfavourable.
/doon/ is also used as a passive-voice marker, again only when unfavorable.

Verbs are often chained, and chained without function words. (PDF) Basic serial verb constructions in Thai is a linguistic analysis, and it has examples like

I've previously posted a supposedly genuine example of a 13-word sentence: a pronoun followed by TWELVE consecutive verbs!

An interesting feature of Thai grammar is long sequences of verbs. Two consecutive verbs are very common: 'lie-down sleep', 'look see', 'think reveal', 'eat sate.' In each of these examples, 'not' can be inserted between the two verbs to form a three-word sentence with obvious meaning, e.g. '(I) lie-down (but did) not sleep.' But verb sequences can be MUCH longer than just two verbs.

Multi-verb phrases occur in English: "Please hurry and go fetch a mop and try to clean it." This is 6 verbs (if we call 'please' a verb), but 6 other (non-verb) words are included to make the sentence grammatical. But here's a Thai sentence that consists ONLY of six consecutive verbs:
. . . 'Climb Return Ascend Go Collect Continue'
. . . /Piin Glaap Kheun Pai Kep Tor/
meaning "climb back up (the tree) and continue gathering (coconuts?)"

And six consecutive verbs is NOT the record. The record may be ... TWELVE consecutive verbs in a 13-word sentence!
. . . 'He intend walk go arrange seek buy come collect keep use provide amuse'
. . . /khao tang-jai deuen pai chat haa seu maa gep wai chai hai sanuk/
Is the meaning clear? He intends to go (by foot) to market and arrange the discovery and purchase of something to collect and set aside for future fun. One can quibble that this isn't quite 12 verbs: /sanuk/ 'amuse(verb)' can be treated as 'fun(noun)' and /gep-wai/ 'collect keep' can be treated as a single compound verb. ('Collect' and 'keep' are nearly synonyms: concatenation of near-synonyms like this is common. /tang-jai/ 'intend' is a compound word that decomposes into 'set heart.')

TWELVE consecutive verbs! I asked my wife about this sentence, and it sounded like ordinary Thai to her. Three informants younger than my wife told me it sounded contrived, but not ungrammatical. This amazing example comes from page 220 of "Tai peoples and their languages: A preliminary observation" by Suriya Ratanakul who introduces it with "Another tour de force example was heard by the present writer in an actual conversation."
 

That paper has a lot of information. It may keep me occupied for a while! :)

Rereading pages 220 and 221 just now, I see that Suriya next discusses final particles (always single-syllable, as are most Thai words). These occur in the spoken language, not written. They don't fit into any of the usual "part of speech" catgories (noun, verb, adjective etc.); they often relate to tone or emotion, and are untranslatable. Although called "final particles," a sentence will often end with TWO of these particles. The younger generation seems to be experimenting with novel such syllables.

Do some other languages have such particles?

ETA: The above is all my own musing, NOT Suriya's.

Suriya writes "[these Final Particles] are very important for Tai speakers who want smooth, clear and polite conversation ... [they reveal] the speaker's attitude, mood, emotion [etc.]."
 
Last edited:
Austronesian language phylogenies: myths and misconceptions about Bayesian computational methods
SIMON J. GREENHILL and RUSSELL D. GRAY
A second study (Atkinson et al. 2008) used phylogenetic methods to test claims that speakers often use their language as a social tool for increasing group cohesion and demarcating groups (Labov 1994). The results showed a strong relationship between the total amount of lexical change and the number of language splitting events along the tree: between 10% to 33% of the total lexical change in the Bantu, Indo-European, and Austronesian languages occurred as a rapid burst of change shortly after languages diverged. This punctuational change (e.g. Bowern 2006) is consistent with rapid language change in small founder populations and differentiation as a cultural marker.
Wanting to distinguish one's group from other groups. That may also be true of other language changes, like phonology and grammar.

If one has some new method, then one ought to compare it to old methods. Indo-European, Austronesian, and Bantu have been abundantly researched, so they make good test cases.

The authors then discussed some early lexicostatistical work that placed the origin of Austronesian in Melanesia, like in the Bismarck Archipelago, north of New Guinea. That is rather grossly at variance with previous results, and the difference was due to using a tree-finding algorithm that assumes a constant rate of change. If Melanesians had changed their vocabulary at a greater rate than the others, then that would make them seem like the earliest branch. Could this be from settling on some islands that already had some inhabitants?

Then getting into Bayesian phylogenetics. This uses Bayes's theorem, a sort of inverse-probability theorem. If one has some hypotheses H and some possible data values D, with P(A|B) = P (A if B), then one can find P(H|D) from P(D|H) and some hypothesis about P(H) in general:

P(H|D) = P(D|H) * P(H) / P(D)

where P(D) is sum over H of P(D|H) * P(H)

So for one's data, one calculates a probability of getting it from a family tree, and then one randomly samples family trees. One starts with a random initial tree and one then makes small tweaks to it, finding the probability for it at each step. One skews the selection of new trees with this probability: Markov Chain Monte Carlo (MCMC). One saves only trees after several tweaks, to avoid too-similar trees.

Once one has some trees, one looks for which subgroupings they agree on at least partially, making a consensus tree.
 
The results: an initial split in Taiwan, with at least 7 branches being represented by Taiwanese stay-at-homes and Malayo-Polynesian being only one branch. So Proto-Austronesian speakers settled in Taiwan, dispersing over the island, with some of their descendants going onward.

MP has a branching between the Philippine langs and the rest of MP, and there are some other well-defined groups like Malayic-Chamic, South Halmahera/West New Guinea, and Oceanic. Well-supported subgroupings of Oceanic: Polynesian,
Micronesian, Southeast Solomonic, Eastern Outer Islands, Admiralties. There is a split between a western and an eastern part, with the western part having Papuan Tip, North New Guinea and Willaumez Meso-Melanesian, and the eastern part the majority of Meso-Melanesian and the Remote langs (Polynesian, ...).

For dating, one can calibrate with settlement times for Eastern Polynesian (1200 - 1300 years ago) and Chamic speakers of Vietnam being mentioned in Chinese sources 1800 years ago. For one tree, one finds for Proto-AN 5310 BP and for Proto-MP 4240 BP. One finds in general an interval of about 1000 years between the settlement of Taiwan and the spread outward from Taiwan.

Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement | Science from 2009 by R. D. Gray, A. J. Drummond, S. J. Greenhill

Proto-Austronesian: 5230 BP

Proto-Malayo-Polynesian: 4500 - 3800 BP
Then a relatively quick spread from the Philippines to Fiji.

Habitation of Reefs / Santa Cruz: 3200 - 3000 BP, New Caledonia, Vanuatu 3000 BP, Tonga, Samoa, Fiji (W Polynesia) 3200 - 2900 BP.

Greater Central Philippines expansion: 2500 - 2000 BP

The first pause between the settlement of Taiwan and the Philippines may have been due to the difficulties in crossing the 350-km Bashi channel between Taiwan and the Philippines (4, 6). The invention of the outrigger canoe and its sail may have enabled the Austronesians to move across this channel before spreading rapidly over the 7000 km from the Philippines to Polynesia (4). This is supported by linguistic reconstructions showing that the terminology associated with the outrigger canoe complex can only be traced back to Proto-Malayo-Polynesian and not Proto-Austronesian (41).
An outrigger is a mini-hull a short length from the main hull. An outrigger boat may have one or two of them, one on each side. Outriggers make a boat more stable, enabling it to safely travel long distances.  Outrigger boat
One possible reason for the second long pause in Western Polynesia is that the final pulse into the far-flung islands of Eastern Polynesia required further technological advances. These might have included the ability to estimate latitude from the stars, the ability to sail across the prevailing easterly tradewinds, and the use of double-hulled canoes with greater stability and carrying capacity (4, 42). Alternatively, the vast distances between these islands might have required the development of new social strategies for dealing with the greater isolation found in Eastern Polynesia (42). These technological and social advances in Eastern Polynesia may also underlie the fourth pulse into Micronesia.
 
How Accurate and Robust Are the Phylogenetic Estimates of Austronesian Language Relationships? | PLOS ONE
Simon J. Greenhill, Alexei J. Drummond, Russell D. Gray

Their estimated age of Austronesian: 4,750 - 5,200 - 5,800 BP (95% confidence)

At the east end of the main Malayo-Polynesian expansion:  Lapita culture - E New Guinea - Bismarck Archipelago - Solomon Islands - Vanuatu - New Caledonia - Fiji - Tonga - Samoa - over 3600 - 2500 BP (1600 - 500 BCE)
The intrusion of the Lapita cultural complex into Near Oceania brought a marked shift in cultures from the non-Austronesian societies to the Austronesian-style agricultural society. Lapita society was not only agricultural, but many of the common food plants and domesticated animals can be traced back to Southeast Asia origin [44], [45], [46]. The social organisation of Lapita was distinctively Austronesian [47], [48]. Many Lapita characteristics can be reconstructed in the Proto-Oceanic (POc) lexicon [44], [49].
Like words for axe, types of houses, fishing equipment, and outrigger-canoe parts.

"Driving a Volvo does not make one a Swede; however, if you also eat distinctively Swedish cuisine, live in a distinctively Swedish-type society, and have a wide collection of Swedish cultural artifacts, then there is a very high probability that you are indeed Swedish."

Then pointing out some items that have reconstructed Proto-Oceanic words for them, like kinds of houses and fishing equipment.
 
Back
Top Bottom