• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

Greenberg and his disciples regard Amerind as very well-supported, certainly higher than Austric or Nostratic.
His method seems to me too hand-wavy, so that's why I bumped it down.

Problematic Use of Greenberg’s Linguistic Classification of the Americas in Studies of Native American Genetic Variation - PMC
Whereas Greenberg’s classification has been widely and uncritically used by human geneticists, it has been rejected by virtually all historical linguists who study Native American languages. There are many errors in the data on which his classification is based (Goddard 1987; Adelaar 1989; Berman 1992; Kimball 1992; Poser 1992), and Greenberg’s criteria for determining linguistic relationships are widely regarded as invalid. His method of multilateral comparison assembled only superficial similarities between languages, and Greenberg did not distinguish similarities due to common ancestry (i.e., homology) from those due to other factors (which other linguists do). Linguistic similarities can also be due to factors such as chance, borrowing from neighboring languages, and onomatopoeia, so proposals of remote linguistic relationships are only plausible when these other possible explanations have been eliminated (Matisoff 1990; Mithun 1990; Goddard and Campbell 1994; Campbell 1997; Ringe 2000). Greenberg made no attempt to eliminate such explanations, and the putative long-range similarities he amassed appear to be mostly chance resemblances and the result of misanalysis—he compared many languages simultaneously (which increases the probability of finding chance resemblances), examined arbitrary segments of words, equated words with very different meanings (e.g., excrement, night, and grass), failed to analyze the structure of some words and falsely analyzed that of others, neglected regular sound correspondences between languages, and misinterpreted well-established findings (Chafe 1987; Bright 1988; Campbell 1988, 1997; Golla 1988; Goddard 1990; Rankin 1992; McMahon and McMahon 1995; Nichols and Peterson 1996).
Excrement, night, grass -- all dark-colored.

While there are plenty of odd semantic shifts that have happened, it is best not to rely on them to assess language relationship, because they make relationship hypotheses difficult to falsify.

One should avoid using them outright or stick to very common ones.
 
Grass dark-colored? That may seem surprising, but there is some motivation for that, from limited development of generic color terms.

I'll use abbreviations W: white, R: red, Y: yellow, G: green, B: blue, K: black.

How the Munsell Book of Color Revolutionized Linguistics Part 5 | Munsell Color System; Color Matching from Munsell Color Company

That page has an illustration from "Color Naming across Languages", Kay, Berlin, Maffi and Merrifield (1997), expanding on "Basic Color Terms", Berlin and Kay (1969), and a diagram showing the illustration's groupings of colors.

I've also found handprint : the geometry of color perception with an illustration: "development hierarchy in color terms"

The second one has:
  • (light) -> W + (warm)
  • (warm) -> R + Y
  • (dark) -> K + (cool)
  • (cool) -> G + B
with splits WRY GBK -> W RY GBK -> W RY GB K -> W R Y GB K -> W R Y G B K

The first one has these splits:
  1. WRY GBK
  2. W RY GBK
  3. W RY GB K -- W RY G BK -- W R Y GBK -- W R YGB K
  4. W R Y GB K -- W R Y G BK -- W R YG B K
  5. W R Y G B K

So green is one of the dark colors.
 
My impression is that the Amerind family is accepted by almost every linguist who is a "lumper"; this includes all linguists who accept hypotheses like Dene-Caucasian, Nostratic, Austric. or even Eurasiatic, Afro-Asiatic or Nilo-Saharan. It was Greenberg himself applying the same methods, who achieved great fame and respect by postulating the four macro-families of Africa, and he has said that he found the evidence for Amerind much greater than the evidence for Nilo-Saharan, etc. Lumpers have rebuttals for the arguments of the anti-Amerindians but, much like with the Shakespeare Authorship Question, this is an issue where dissent is ridiculed and drowned out at "consensus" sites, most notably Wikipedia.

The age of families like Dene-Caucasian or Eurasiatic corresponds roughly with the arrival of Amerinds and their genes into the New World. So it would be logical that the Amerind languages form a family with similar age. Or, to turn the argument about, IF there were two (or more) unrelated macro-languages spoken in pre-Columbian South America THEN Where did that 2nd family of languages come from anyway?

Almost all linguists agree that ALL languages could be pushed back farther and farther, though with unknowable evidence, perhaps all the way to a single proto-World language. The controversy is about how far macro-families can be pushed back based on available evidence before speculation becomes impossible or pointless.

One may well ask why "splitters" protest so vehemently against the Amerind hypothesis, without being equally nasty against proponents of Eurasiatic, Nilo-Saharan, etc. Some attribute this to the large number of linguists researching Amerindian language subfamilies, who have a vested interest against the Amerind hypothesis. Ruhlen points to one such researcher who wasted considerable effort trying to pin down kinship terms in one language subfamily -- a puzzle that disappears when the (t'ana, t'una, t'ina) pattern of proto-Amerind is considered.
 
Relevant to the Amerindian controversy:

Greenberg's Reply to Campbell on the Classification of American Indian Languages

A 60-page article by Bengtson and Ruhlen called "Global Etymologies" provides constructive criticism of Greenberg's methods.

There was a critical review of an Amerindian paper with objections so trivial and picayune it seemed like a parody. In the "mass comparison" method, many THOUSANDS of words are first collected (strength in numbers! But very time-consuming). Yet despite this, the review's picayune objections included glossing a word which means 'boy' as 'son' and a substitution of /kana/ for /k'ana/ or some such. I'd display that paper, but can't find it just now.
 
More stable words being better preserved than less stable ones: Preliminary Lexicostatistics as a Basis for Language Classification: a New Approach by George Starostin (2010) son of Sergei Starostin.

He calls it "Jaxontov's law", using an alternate transcription of linguist Sergei Yakhontov's last name.  Sergei Yakhontov - Серге́й Евге́ньевич Я́хонтов

The Yakhontov Test is usually applied by dividing the reference list in two, and comparing the number of cognates in the more-stable half to that in the less-stable half.

I've thought of an alternative method, one that uses all ranks. To use it, sort one's cognate list in the order of one's reference list, then find the average of

(index in cognate list) / (max of indices in cognate list)
-
(index in reference list) / (max of indices in reference list)

I also plot (index in reference list) against (index in cognate list).

I've used two ordered reference lists as comparisons: the 110-word Swadesh-Starostin list and the 200-word extended Leipzig-Jakarta list. Here are the best numbers:
  • Indo-Uralic: 0.20
  • Samoyedic (Uralic) - Yukaghir: 0.23
  • Altaic (Narrow): 0.14, (Broad): 0.12 (minimum of maximum)
  • Chukotko-Kamchatkan - Nivkh: 0.17
  • Basque - N Caucasian: 0.25
  • Yeniseian - Burushaski: 0.00
  • Yeniseian - Na-Dene (AET): 0.10 (AET = Athabaskan-Eyak-Tlingit; Haida has too few cognates)
  • Hurro-Urartian - Dene-Caucasian: 0.21
  • Hattic - Dene-Caucasian: 0.16
  • Sumerian - Hurrian: 0.13
  • Austro-Tai: 0.31
  • Austric: 0.17
  • Proto-World: 0.11 (Bengtson & Ruhlen "Global Etymologies", with *kaka "older male relative", *ma "not")
Only some of these were scramble-tested in the papers that I've found: Indo-Uralic, Altaic, and the circumpolar ones (Sam-Yuk, CK-Nivkh, Yen-ND, Yen-Bur).

Austric seems well-supported, as are various subgroupings of Eurasiatic and Dene-Caucasian, those whole groupings somewhat less so.

I'm surprised that Proto-World does as well as it does in this criterion. Could greater stability mean more data to work from?
 
Last edited by a moderator:
I first define Terminal Pleistocene as from the Last Glacial Maximum, about 26,000 - 20,000 years ago, to the end of that geological epoch, about 12,000 years ago, the beginning of the Holocene Epoch.

It is often defined as longer, however, the last glacial period, from the end of the previous interglacial period, the Eemian, about 130,000 - 115,000 years ago.

Afroasiatic is Terminal Pleistocene in age, and Eurasiatic and Dene-Caucasian probably are also, being bounded from below by Early Holocene macrofamilies Altaic and Euskaro-Caucasian. Austric I think is likely Early Holocene, with a homeland in southern China.

Amerind is bounded from above by the time of the first human settlers of the New World: Last Glacial Maximum. Since its members are distant enough to make their relationship far from obvious, Amerind is likely Terminal Pleistocene in age.

This makes Borean Last Glacial Maximum in age or somewhat older, bounded from above by the first human settlement of Northern Eurasia (Europe and Siberia), roughly 45,000 years ago.
 
I can't find where I did the Northern Eurasian pronouns, so I'll do them again. :(

pl = plural, obl = oblique, combining form, vb = form attached to a verb, poss = possessive form
"this": proximal (near speaker), "that": distal (away from speaker)
"who?": animate interrogative, "what?": inanimate interrogrative

Indo-European:
1s *eg(h)o-, obl *me-, vb *-m, *-H -- 1d vb *-we -- 1p *wei, obl *nos-, vb *-me
2s *tuH, *tiH, obl *te-, vb *-s -- 2d vb *-to -- 2p *yuH, obl *wos-, vb *-te
"this", "that" *so, *to- and *e-, *i- -- vb 3s *-t, 3d *-ta, 3p *-nt
"who?" *kwi-, adj *kwo- "what?" (neuter of "who?")

Uralic:
1s *minä, *mun, vb *-m -- 1d vb *-mäjn -- 1p *me, vb *-mät
2s *tinä, *tun, vb *-n, *-t -- 2d vb *-täjn -- 2p *te, vb *-tät
"this" *to, pl *no -- "that" *tä, pl *nä -- 3s *se, 3p *ne, vb 3s *-, 3d *-kë, 3p *-t, poss 3s *-sä, 3d *-säjn, 3p *-sät
"who?" *ku-, *ke -- "what?" *mi

Turkic:
1s *be, obl *ben-, vb *-m, poss *-m -- 1p *bir', obl *bir'n-, vb *-mir', *-bir', poss *-mir'
2s *se, obl *sen-, vb *-n, poss *ng -- 2p *sir', obl *sir'n-, vb *-sir', poss *-ngir'
"this" *bu, obl *bun- -- "that", 3x *ol, obl *an-, vb 3s *-, poss 3s *-(s)i
"who?" *kem -- "what?" *nê-

Mongolic:
1s *bi, obl *nad-, poss pron *mini -- 1p *bid, obl *bidn-, poss pron *bidni
2s *tSi, obl *tSam-, poss pron *tSini -- 2p *ta, obl *tan-
"this" *ene, obl. *üün-, pl *edeeer -- "that", 3x *tere, obl *tüün-, pl *tedeger
"who? *ken -- "what?" *yaxun
(x is the kh fricative)

Tungusic:
1s *bi, obl *min- -- 1px *bu, obl *mun-, 1pi *münti
2s *si, obl *sin- -- 2px *su, obl *sun-
"this" *ër --"that" *tari
"who?" *ngüi -- "what?" *xa-

The Tungusic langs distinguish between inclusive and exclusive we: "you and I", "we without you".

Sources: Wiktionary, Wikipedia, On the structure of Proto-Uralic | Juha Janhunen - Academia.edu and Proto-Turkic - Wikibooks, open books for an open world and Evenki grammar and Manchu grammar
 
Last edited:
Appendix:Swadesh lists - Wiktionary, the free dictionary is a good place to go for pronouns.

For Koreanic, as it may be called, I used mostly Middle Korean:
1s na -- 1p wuli -- 2s ne -- 2p nehuy
"this" i -- "that (near)" ku -- "that (far)" tye -- 3s ku, 3p kutul
"who?" nwu -- "what" -- enu, musum

Proto-Japonic:
1s *a, *wa -- 1p *wa, *waya -- 2s *u, *o -- 2p *uya, *ura
"this" *ko (?) -- "that (near)" *së -- "that (far)" *kë -- 3s *ka, 3p *kata
"who?" *ta -- "what?" *na, *në

Korean and Japanese use a 3-way distinction in their demonstratives, like Spanish, and unlike English, with its 2-way distinction:
  • "this" is proximal, near the speaker -- Sp. éste
  • "that (near)" is medial, near the listener -- Sp. ése
  • "that (far)" is distal, ifar from both the speaker and the listener -- Sp. aquél

Neither of them looks very Eurasiatic, though for Korean, I've seen the theory that those pronouns the result of the original ones' n-stem declension with the roots dropped off: *mana > *na, *tene > *te, *kunu > *nu.
 
Chukotko-Kamchatkan is rather difficult:
1s *gëm -- 1p *muRi -- 2s *gët -- 2p *tuRi
where R is Chukchi r, Koryak j, Itelmen z.

But it fits in with the rest of Eurasiatic.

Appendix:Eskimo-Aleut basic vocabulary - Wiktionary, the free dictionary - Aleut, Proto-Eskimo
1s ting, vb -kuq, poss *-ng -- *vi, vb *-tua, *-kuq, poss *-nga
1d tingix -- *vik
1p tingin(s) -- *vit
2s txin, vb -kuxt, poss *-n -- *ëtlvant, vb *-it, poss *-in
2d txidix -- *ël-ptek
2p txichix -- *ël-vcet
3s ilaa, uda, vb *-kux, poss *-(n)gan -- *ël-nga, *una, vb *-tuq, poss *-ngan
3p laan(s), udan(s) -- *ël-ngat, *ukuat
"who?" kiin -- *kina
"what?" alqux -- *cangu

Not very Eurasiatic either, though PEsk 1x *v- seems like it corresponds with *m- and "who?" *kin- obviously corresponds. Dual *-k and plural *-t corresponds with Uralic, and to a lesser degree with Indo-European and Altaic.
 
I'll now assess Japonic, Ainu, and Austric. Ainu is spoke in Hokkaido, the northernmost of Japan's four big islands, and its Wiktionary Swadesh list is also in Appendix: Paleosiberian Swadesh lists - Wiktionary, the free dictionary

Proto-Japonic:
1s *a, *wa -- 1p *wa, *waya -- 2s *u, *o -- 2p *uya, *ura
"this" *ko (?) -- "that (near)" *së -- "that (far)" *kë -- 3s *ka, 3p *kata
"who?" *ta -- "what?" *na, *në

Ainu:
1s kuani, kani -- 1px ciokai, 1pi anokai -- 2s eani -- 2p ecioka
"this" tanpe -- "that" toanpe -- 3s sinuma, anihi, 3p okai
"who" nen, hunna -- "what" nep, hemanta

1s: kuani < ku- (1s prefix) + an "is" + -i (nominalizer) -- "that which is me"
1px: ciokai < ci- (1p prefix) + oka "are" + -i (nominalizer) -- "those which are us"
1pi: anokai < a- (impersonal prefix) + oka "are" + -i (nominalizer) -- "those which are someone"

2s: eani < e- (2s prefix) + an "is" + -i (nominalizer) -- "that which is you"
2p: ecioka < eci- (2p prefix) + oka "are" + -i (nominalizer) -- "those which are you"

3s: anihi < an "is" + i (nominalizer) -- "that which is" with doubled final vowel
3p: okai < oka "are" + -i (nominalizer) -- "those which are"

nen < ne- (interr root) + n (person) -- "which person"
nep < ne- (interr root) + p (thing) -- "which thing"

So one finds
1s *ku- -- 1p *ci- -- 2s *e- -- 2p *eci-
demonstr. *tVnpe
interr *na-
 
Back
Top Bottom