• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

Greenberg and his disciples regard Amerind as very well-supported, certainly higher than Austric or Nostratic.
His method seems to me too hand-wavy, so that's why I bumped it down.

Problematic Use of Greenberg’s Linguistic Classification of the Americas in Studies of Native American Genetic Variation - PMC
Whereas Greenberg’s classification has been widely and uncritically used by human geneticists, it has been rejected by virtually all historical linguists who study Native American languages. There are many errors in the data on which his classification is based (Goddard 1987; Adelaar 1989; Berman 1992; Kimball 1992; Poser 1992), and Greenberg’s criteria for determining linguistic relationships are widely regarded as invalid. His method of multilateral comparison assembled only superficial similarities between languages, and Greenberg did not distinguish similarities due to common ancestry (i.e., homology) from those due to other factors (which other linguists do). Linguistic similarities can also be due to factors such as chance, borrowing from neighboring languages, and onomatopoeia, so proposals of remote linguistic relationships are only plausible when these other possible explanations have been eliminated (Matisoff 1990; Mithun 1990; Goddard and Campbell 1994; Campbell 1997; Ringe 2000). Greenberg made no attempt to eliminate such explanations, and the putative long-range similarities he amassed appear to be mostly chance resemblances and the result of misanalysis—he compared many languages simultaneously (which increases the probability of finding chance resemblances), examined arbitrary segments of words, equated words with very different meanings (e.g., excrement, night, and grass), failed to analyze the structure of some words and falsely analyzed that of others, neglected regular sound correspondences between languages, and misinterpreted well-established findings (Chafe 1987; Bright 1988; Campbell 1988, 1997; Golla 1988; Goddard 1990; Rankin 1992; McMahon and McMahon 1995; Nichols and Peterson 1996).
Excrement, night, grass -- all dark-colored.

While there are plenty of odd semantic shifts that have happened, it is best not to rely on them to assess language relationship, because they make relationship hypotheses difficult to falsify.

One should avoid using them outright or stick to very common ones.
 
Grass dark-colored? That may seem surprising, but there is some motivation for that, from limited development of generic color terms.

I'll use abbreviations W: white, R: red, Y: yellow, G: green, B: blue, K: black.

How the Munsell Book of Color Revolutionized Linguistics Part 5 | Munsell Color System; Color Matching from Munsell Color Company

That page has an illustration from "Color Naming across Languages", Kay, Berlin, Maffi and Merrifield (1997), expanding on "Basic Color Terms", Berlin and Kay (1969), and a diagram showing the illustration's groupings of colors.

I've also found handprint : the geometry of color perception with an illustration: "development hierarchy in color terms"

The second one has:
  • (light) -> W + (warm)
  • (warm) -> R + Y
  • (dark) -> K + (cool)
  • (cool) -> G + B
with splits WRY GBK -> W RY GBK -> W RY GB K -> W R Y GB K -> W R Y G B K

The first one has these splits:
  1. WRY GBK
  2. W RY GBK
  3. W RY GB K -- W RY G BK -- W R Y GBK -- W R YGB K
  4. W R Y GB K -- W R Y G BK -- W R YG B K
  5. W R Y G B K

So green is one of the dark colors.
 
My impression is that the Amerind family is accepted by almost every linguist who is a "lumper"; this includes all linguists who accept hypotheses like Dene-Caucasian, Nostratic, Austric. or even Eurasiatic, Afro-Asiatic or Nilo-Saharan. It was Greenberg himself applying the same methods, who achieved great fame and respect by postulating the four macro-families of Africa, and he has said that he found the evidence for Amerind much greater than the evidence for Nilo-Saharan, etc. Lumpers have rebuttals for the arguments of the anti-Amerindians but, much like with the Shakespeare Authorship Question, this is an issue where dissent is ridiculed and drowned out at "consensus" sites, most notably Wikipedia.

The age of families like Dene-Caucasian or Eurasiatic corresponds roughly with the arrival of Amerinds and their genes into the New World. So it would be logical that the Amerind languages form a family with similar age. Or, to turn the argument about, IF there were two (or more) unrelated macro-languages spoken in pre-Columbian South America THEN Where did that 2nd family of languages come from anyway?

Almost all linguists agree that ALL languages could be pushed back farther and farther, though with unknowable evidence, perhaps all the way to a single proto-World language. The controversy is about how far macro-families can be pushed back based on available evidence before speculation becomes impossible or pointless.

One may well ask why "splitters" protest so vehemently against the Amerind hypothesis, without being equally nasty against proponents of Eurasiatic, Nilo-Saharan, etc. Some attribute this to the large number of linguists researching Amerindian language subfamilies, who have a vested interest against the Amerind hypothesis. Ruhlen points to one such researcher who wasted considerable effort trying to pin down kinship terms in one language subfamily -- a puzzle that disappears when the (t'ana, t'una, t'ina) pattern of proto-Amerind is considered.
 
Relevant to the Amerindian controversy:

Greenberg's Reply to Campbell on the Classification of American Indian Languages

A 60-page article by Bengtson and Ruhlen called "Global Etymologies" provides constructive criticism of Greenberg's methods.

There was a critical review of an Amerindian paper with objections so trivial and picayune it seemed like a parody. In the "mass comparison" method, many THOUSANDS of words are first collected (strength in numbers! But very time-consuming). Yet despite this, the review's picayune objections included glossing a word which means 'boy' as 'son' and a substitution of /kana/ for /k'ana/ or some such. I'd display that paper, but can't find it just now.
 
More stable words being better preserved than less stable ones: Preliminary Lexicostatistics as a Basis for Language Classification: a New Approach by George Starostin (2010) son of Sergei Starostin.

He calls it "Jaxontov's law", using an alternate transcription of linguist Sergei Yakhontov's last name.  Sergei Yakhontov - Серге́й Евге́ньевич Я́хонтов

The Yakhontov Test is usually applied by dividing the reference list in two, and comparing the number of cognates in the more-stable half to that in the less-stable half.

I've thought of an alternative method, one that uses all ranks. To use it, sort one's cognate list in the order of one's reference list, then find the average of

(index in cognate list) / (max of indices in cognate list)
-
(index in reference list) / (max of indices in reference list)

I also plot (index in reference list) against (index in cognate list).

I've used two ordered reference lists as comparisons: the 110-word Swadesh-Starostin list and the 200-word extended Leipzig-Jakarta list. Here are the best numbers:
  • Indo-Uralic: 0.20
  • Samoyedic (Uralic) - Yukaghir: 0.23
  • Altaic (Narrow): 0.14, (Broad): 0.12 (minimum of maximum)
  • Chukotko-Kamchatkan - Nivkh: 0.17
  • Basque - N Caucasian: 0.25
  • Yeniseian - Burushaski: 0.00
  • Yeniseian - Na-Dene (AET): 0.10 (AET = Athabaskan-Eyak-Tlingit; Haida has too few cognates)
  • Hurro-Urartian - Dene-Caucasian: 0.21
  • Hattic - Dene-Caucasian: 0.16
  • Sumerian - Hurrian: 0.13
  • Austro-Tai: 0.31
  • Austric: 0.17
  • Proto-World: 0.11 (Bengtson & Ruhlen "Global Etymologies", with *kaka "older male relative", *ma "not")
Only some of these were scramble-tested in the papers that I've found: Indo-Uralic, Altaic, and the circumpolar ones (Sam-Yuk, CK-Nivkh, Yen-ND, Yen-Bur).

Austric seems well-supported, as are various subgroupings of Eurasiatic and Dene-Caucasian, those whole groupings somewhat less so.

I'm surprised that Proto-World does as well as it does in this criterion. Could greater stability mean more data to work from?
 
Last edited by a moderator:
I first define Terminal Pleistocene as from the Last Glacial Maximum, about 26,000 - 20,000 years ago, to the end of that geological epoch, about 12,000 years ago, the beginning of the Holocene Epoch.

It is often defined as longer, however, the last glacial period, from the end of the previous interglacial period, the Eemian, about 130,000 - 115,000 years ago.

Afroasiatic is Terminal Pleistocene in age, and Eurasiatic and Dene-Caucasian probably are also, being bounded from below by Early Holocene macrofamilies Altaic and Euskaro-Caucasian. Austric I think is likely Early Holocene, with a homeland in southern China.

Amerind is bounded from above by the time of the first human settlers of the New World: Last Glacial Maximum. Since its members are distant enough to make their relationship far from obvious, Amerind is likely Terminal Pleistocene in age.

This makes Borean Last Glacial Maximum in age or somewhat older, bounded from above by the first human settlement of Northern Eurasia (Europe and Siberia), roughly 45,000 years ago.
 
I can't find where I did the Northern Eurasian pronouns, so I'll do them again. :(

pl = plural, obl = oblique, combining form, vb = form attached to a verb, poss = possessive form
"this": proximal (near speaker), "that": distal (away from speaker)
"who?": animate interrogative, "what?": inanimate interrogrative

Indo-European:
1s *eg(h)o-, obl *me-, vb *-m, *-H -- 1d vb *-we -- 1p *wei, obl *nos-, vb *-me
2s *tuH, *tiH, obl *te-, vb *-s -- 2d vb *-to -- 2p *yuH, obl *wos-, vb *-te
"this", "that" *so, *to- and *e-, *i- -- vb 3s *-t, 3d *-ta, 3p *-nt
"who?" *kwi-, adj *kwo- "what?" (neuter of "who?")

Uralic:
1s *minä, *mun, vb *-m -- 1d vb *-mäjn -- 1p *me, vb *-mät
2s *tinä, *tun, vb *-n, *-t -- 2d vb *-täjn -- 2p *te, vb *-tät
"this" *to, pl *no -- "that" *tä, pl *nä -- 3s *se, 3p *ne, vb 3s *-, 3d *-kë, 3p *-t, poss 3s *-sä, 3d *-säjn, 3p *-sät
"who?" *ku-, *ke -- "what?" *mi

Turkic:
1s *be, obl *ben-, vb *-m, poss *-m -- 1p *bir', obl *bir'n-, vb *-mir', *-bir', poss *-mir'
2s *se, obl *sen-, vb *-n, poss *ng -- 2p *sir', obl *sir'n-, vb *-sir', poss *-ngir'
"this" *bu, obl *bun- -- "that", 3x *ol, obl *an-, vb 3s *-, poss 3s *-(s)i
"who?" *kem -- "what?" *nê-

Mongolic:
1s *bi, obl *nad-, poss pron *mini -- 1p *bid, obl *bidn-, poss pron *bidni
2s *tSi, obl *tSam-, poss pron *tSini -- 2p *ta, obl *tan-
"this" *ene, obl. *üün-, pl *edeeer -- "that", 3x *tere, obl *tüün-, pl *tedeger
"who? *ken -- "what?" *yaxun
(x is the kh fricative)

Tungusic:
1s *bi, obl *min- -- 1px *bu, obl *mun-, 1pi *münti
2s *si, obl *sin- -- 2px *su, obl *sun-
"this" *ër --"that" *tari
"who?" *ngüi -- "what?" *xa-

The Tungusic langs distinguish between inclusive and exclusive we: "you and I", "we without you".

Sources: Wiktionary, Wikipedia, On the structure of Proto-Uralic | Juha Janhunen - Academia.edu and Proto-Turkic - Wikibooks, open books for an open world and Evenki grammar and Manchu grammar
 
Last edited:
Appendix:Swadesh lists - Wiktionary, the free dictionary is a good place to go for pronouns.

For Koreanic, as it may be called, I used mostly Middle Korean:
1s na -- 1p wuli -- 2s ne -- 2p nehuy
"this" i -- "that (near)" ku -- "that (far)" tye -- 3s ku, 3p kutul
"who?" nwu -- "what" -- enu, musum

Proto-Japonic:
1s *a, *wa -- 1p *wa, *waya -- 2s *u, *o -- 2p *uya, *ura
"this" *ko (?) -- "that (near)" *së -- "that (far)" *kë -- 3s *ka, 3p *kata
"who?" *ta -- "what?" *na, *në

Korean and Japanese use a 3-way distinction in their demonstratives, like Spanish, and unlike English, with its 2-way distinction:
  • "this" is proximal, near the speaker -- Sp. éste
  • "that (near)" is medial, near the listener -- Sp. ése
  • "that (far)" is distal, ifar from both the speaker and the listener -- Sp. aquél

Neither of them looks very Eurasiatic, though for Korean, I've seen the theory that those pronouns the result of the original ones' n-stem declension with the roots dropped off: *mana > *na, *tene > *te, *kunu > *nu.
 
Chukotko-Kamchatkan is rather difficult:
1s *gëm -- 1p *muRi -- 2s *gët -- 2p *tuRi
where R is Chukchi r, Koryak j, Itelmen z.

But it fits in with the rest of Eurasiatic.

Appendix:Eskimo-Aleut basic vocabulary - Wiktionary, the free dictionary - Aleut, Proto-Eskimo
1s ting, vb -kuq, poss *-ng -- *vi, vb *-tua, *-kuq, poss *-nga
1d tingix -- *vik
1p tingin(s) -- *vit
2s txin, vb -kuxt, poss *-n -- *ëtlvant, vb *-it, poss *-in
2d txidix -- *ël-ptek
2p txichix -- *ël-vcet
3s ilaa, uda, vb *-kux, poss *-(n)gan -- *ël-nga, *una, vb *-tuq, poss *-ngan
3p laan(s), udan(s) -- *ël-ngat, *ukuat
"who?" kiin -- *kina
"what?" alqux -- *cangu

Not very Eurasiatic either, though PEsk 1x *v- seems like it corresponds with *m- and "who?" *kin- obviously corresponds. Dual *-k and plural *-t corresponds with Uralic, and to a lesser degree with Indo-European and Altaic.
 
I'll now assess Japonic, Ainu, and Austric. Ainu is spoke in Hokkaido, the northernmost of Japan's four big islands, and its Wiktionary Swadesh list is also in Appendix: Paleosiberian Swadesh lists - Wiktionary, the free dictionary

Proto-Japonic:
1s *a, *wa -- 1p *wa, *waya -- 2s *u, *o -- 2p *uya, *ura
"this" *ko (?) -- "that (near)" *së -- "that (far)" *kë -- 3s *ka, 3p *kata
"who?" *ta -- "what?" *na, *në

Ainu:
1s kuani, kani -- 1px ciokai, 1pi anokai -- 2s eani -- 2p ecioka
"this" tanpe -- "that" toanpe -- 3s sinuma, anihi, 3p okai
"who" nen, hunna -- "what" nep, hemanta

1s: kuani < ku- (1s prefix) + an "is" + -i (nominalizer) -- "that which is me"
1px: ciokai < ci- (1p prefix) + oka "are" + -i (nominalizer) -- "those which are us"
1pi: anokai < a- (impersonal prefix) + oka "are" + -i (nominalizer) -- "those which are someone"

2s: eani < e- (2s prefix) + an "is" + -i (nominalizer) -- "that which is you"
2p: ecioka < eci- (2p prefix) + oka "are" + -i (nominalizer) -- "those which are you"

3s: anihi < an "is" + i (nominalizer) -- "that which is" with doubled final vowel
3p: okai < oka "are" + -i (nominalizer) -- "those which are"

nen < ne- (interr root) + n (person) -- "which person"
nep < ne- (interr root) + p (thing) -- "which thing"

So one finds
1s *ku- -- 1p *ci- -- 2s *e- -- 2p *eci-
demonstr. *tVnpe
interr *na-
 
I found Ainu language (Shibatani).pdf

Pronouns in verbs:

Classical Ainu (used in compositions about gods and heroes):
Intransitive subject: 1s -an -- 1p -an
Transitive subject: 1s a- -- 1p a-
Object:1s i- -- 1p i-

Colliquial Ainu:
Intransitive subject: 1s ku- -- 1px -as -- 1pi -an
Transitive subject: 1s ku- -- 1px ci- -- 1pi a- (an-)
Object: 1s en- -- 1px uni -- 1pi i-

All of them have: 2s e- -- 2p eci- -- 3x (none)

1s ku- seems like Austronesian.
 

Austronesian:
1s *i-aku -- 1px *i-(k)ami -- 1pi *i-(k)ita -- 2s *i-{ka)Su -- 2p *i-kamu
"this" *i-ni -- "that" *i-Cu -- 3s *si-ia -- 3p *si-ida -- C becomes t in Malayo-Polynesian
"who" *si-ima -- "what" *na-nu

Tai family, Kra-Dai:
1s (KD) *aku: -- 1p (T) *raw-A -- 2s (T) *mung-A -- (the final A is a syllable tone)
"this" (T) *naj
(not very much in Wiktionary)

Austroasiatic:
1s *anj -- 1px *je -- 1pi *i -- 2s *mi -- 2p *pi
"this" *ti -- "that (near) *to -- "that (far)" *ni, *ne -- 3x *gi
"who?" *mVh -- "what?" *meh, *m(o), *m(o)h

Some resemblance, but not on the level of northern Eurasian m-t-k.
 
Ainu and Austric: Evidence of Genetic Relationship by John Bengtson and Vaclav Blazhek

Austronesian influence and Transeurasian ancestry in Japanese in: Language Dynamics and Change Volume 7 Issue 2 (2017)

 Sino-Austronesian languages

Putting the pieces of this puzzle together, I have come up with a hypothesis.

The first eastern Asians to farm were in southern China, and they spoke Proto-Austric. They spread out from there to Southeast Asia (Austroasiatic, Kra-Dai, Hmong-Mien/Miao-Yao), to Taiwan (Austronesian), and northward to northern China (Austric contribution to Sino-Tibetan, Japonic, also Ainu?)


I'll review the prehistory of Japan.

The first human inhabitants arrived some 30,000 - 40,000 years ago, and the first evidence of sedentary populations dates back some 16,000 years, a bit farther back than some other sedentary hunter-gatherer populations, like the  Natufian culture in the Levant at some 15,000 years. This was the time of the  Bølling–Allerød Interstadial a brief warm period some 3,000 years before the end of the Pleistocene.
The approximately 14,000-year Jōmon period is conventionally divided into several phases, progressively shorter: Incipient (13,750–8,500 BC), Initial (8,500–5,000), Early (5,000–3,520), Middle (3,520–2,470), Late (2,470–1,250), and Final (1,250–500).

...
The Early Jōmon period saw an explosion in population, as indicated by the number of larger aggregated villages from this period. This period occurred during the Holocene climatic optimum, when the local climate became warmer and more humid.

...
The degree to which horticulture or small-scale agriculture was practiced by Jōmon people is debated. Currently, there is no scientific consensus to support a conceptualization of Jōmon period culture as only hunter-gatherer.
Middle Jōmon
Highly ornate pottery dogū figurines and vessels, such as the so-called "flame style" vessels, and lacquered wood objects remain from that time.
Late and Final Jōmon
During the Final Jōmon period, a slow shift was taking place in western Japan: steadily increasing contact with the Korean Peninsula eventually led to the establishment of Korean-type settlements in western Kyushu, beginning around 900 BC. The settlers brought with them new technologies such as wet rice farming and bronze and iron metallurgy, as well as new pottery styles similar to those of the Mumun pottery period. The settlements of these new arrivals seem to have coexisted with those of the Jōmon and Yayoi for around a thousand years.
Reconstruction of a Yayoi period house in Kyushu.

Outside Hokkaido, the Final Jōmon is succeeded by a new farming culture, the Yayoi (c. 300 BC – AD 300), named after an archaeological site near Tokyo.
The Yayoi people brought rice farming and the Japanese language to Japan.

The Ainu?
A study by Lee and Hasegawa of Waseda University concluded that the Jōmon period population of Hokkaido consisted of two distinctive populations which later merged to form the proto-Ainu in northern Hokkaido. The Ainu language can be connected to an "Okhotsk component" which spread southwards. They further concluded that the "dual structure theory" regarding the population history of Japan must be revised and that the Jōmon people had more diversity than originally suggested.[
Seems like a population that went from the Amur River area in Siberia to Sakhalin, and then to Hokkaido and northern Honshu.

So if Ainu has some Austric in it, then that means some Austric speakers going far north, to the Amur River.
 
I'm surprised that Proto-World does as well as it does in this criterion. Could greater stability mean more data to work from?

It's not clear to me what your metric is. Suppose there are 100 words in reference list; and #1, #10, #15 are the only cognates. What measure would you calculate?

It's clear what a cognate of a language pair is. But what does it mean to be a "cognate of proto-World"?
 
I'm surprised that Proto-World does as well as it does in this criterion. Could greater stability mean more data to work from?
It's not clear to me what your metric is. Suppose there are 100 words in reference list; and #1, #10, #15 are the only cognates. What measure would you calculate?
If the words are ordered in some language's alphabetical order, then they are essentially random. If they are ordered by word-form stability, with more stable before less stable, then the more stable ones are cognates.

One estimates stability by looking at languages and language families with known histories and seeing how often some word form is replaced in them. Known histories include not only written records but also reconstructed protolanguages.

Several researchers have come up with several lists of highly-stable word forms, and they tend to agree that certain meanings have very stable words forms. So if some languages have highly-stable cognates, those langs likely inherited them from a common ancestor rather than borrowing from one to another.

Top ten:
  • Indo-European: 2, 3, 5, who? 4 -- 1s, 1, 1p, when?, tongue
  • Bantu: to eat, tooth 3, eye, 5 -- hunger, elephant, 4, human being, child
  • Austronesian: 2, 3, to die, eye, 4 -- 50, 10, 7, 5, tongue
  • Swadesh-Starostin: 1p, 2, 1s, eye, 2s -- who?, fire, tongue, stone, name
  • Dolgopolsky: 1s, 2, 2s, who?, tongue -- name, eye, heart, tooth, not
  • (Exended): 5, 3, 4, 6, 1s -- 2, 7, 8, 2s, who?
  • Leipzig-Jakarta: fire, nose, to go, water, mouth -- tongue, blood, bone, 2s, root
1s = I/me, 2s = thou, you (sg.), 1p = we/us, 2p = you (pl.)

Notice that certain kinds words are persistently rated as highly stable: positive integers from 1 to 5, simple personal pronouns, interrogative pronouns, body parts, common actions, natural phenomena, and "name".

(PDF) Borrowability and the Notion of Basic Vocabulary

In the authors' sample, nouns are twice as likely to be borrowed as all the other kinds of words: adjectives and adverbs (modifiers), verbs, and function words (pronouns, adpositions, conjunctions, ...)

It's clear what a cognate of a language pair is. But what does it mean to be a "cognate of proto-World"?
I meant cognate in Proto-World. In my argument, greater stability means more likely reconstructed, giving more to work with. An alternate possibility is that it is easier to search for words with the most stable meanings.
 
For example, negation:
The Proto-Sapiens Prohibitive/Negative Particle *Ma | John D Bengtson - Academia.edu
also at
(PDF) THE PROTO-SAPIENS PROHIBITIVE/NEGATIVE PARTICLE *MA

That seems too much like [https://rationalwiki.org/wiki/Confirmation_bias Confirmation bias &#45; RationalWiki] -- the enumeration of favorable circumstances -- selecting the hits while omitting the misses.

A more proper approach would be to get *every* word for negation, including every morpheme (word part), and then to look for patterns.

So I looked at
not - Wiktionary, the free dictionary
no/translations - Wiktionary, the free dictionary
and I found a *lot* of variation.
 
I'm surprised that Proto-World does as well as it does in this criterion. Could greater stability mean more data to work from?
It's not clear to me what your metric is. Suppose there are 100 words in reference list; and #1, #10, #15 are the only cognates. What measure would you calculate?

What was (and still is) unclear to me is your own criterion. I gave "#1, #10, #15 are the only cognates" as an example, hoping you would define your metric by working through the exact arithmetic.

The methods I've seen always compare EXACTLY TWO languages or families. When estimating the age of "Indo-Uralic" for example, Proto-IE and Proto-Uralic are compared. (Of course a word form can be guessed from its presence in some subset of Uralic, without actually reconstructing the word in Proto-Uralic). Most (though not all) of the examples in your first post from Sunday are of PAIRS: "Samoyedic (Uralic) - Yukaghir," etc.

It's clear what a cognate of a language pair is. But what does it mean to be a "cognate of proto-World"?
I meant cognate in Proto-World.

I'm not sure this answers my question. proto-World is not a PAIR. Almost every word is present in a plural subset of world languages, while about zero words are present in every world language! But I'm being silly now. The only intent of my response was to seek clarification of
(index in cognate list) / (max of indices in cognate list)
-
(index in reference list) / (max of indices in reference list)
Only NOW did I even see that Minus Sign! Still it might not be painfully difficult to present the worked arithmetic for a 3- or 2-cognate example and thereby remove any ambiguity.
 
Back
Top Bottom