• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

Looking in the PubMed archive, I find some papers I've discussed earlier, but PubMed is good at cross-referencing.

Ultraconserved words point to deep language ancestry across Eurasia - PMC by Mark Pagel et al.

though it used
Starling Etymological Databases

Ultraconserved words and Eurasiatic? The “faces in the fire” of language prehistory - PMC
“Ultraconserved words” are invalidated by several basic principles of linguistics: the relationship between sound and meaning is essentially arbitrary; change proceeds largely independently on each level; and in sound, changes generally apply without exception, irrespective of words’ meanings. Stability in meaning is powerless against instability in sound. Even if cognacy may survive for tens of millennia, the ability to detect it at all depends on sound, whose decay clock ticks far faster (witness water: Latin [akwam] to French [o] in just two millennia). This is the limitation that Pagel et al. should test: whether enough phonetic signal survives to judge cognacy reliably back to 14,450 BP—let alone 70 millennia (4), analogous to trying to radiocarbon date back ∼300,000 y.
That seems to me to be beside the point. Stability is for borrowing and internal replacement, and some word forms are *very* stable.

Short, frequent words are more likely to appear genetically related by chance - PMC

Reply to Mahowald and Gibson and to Heggarty: No problems with short words, and no evidence provided - PMC by Pagel et al.
 
Responding to that first criticism, Latin aqua /akwa/ -> French eau /o/ I compared the other Romance languages.

Romanian apa /apa/, Italian acqua /akkwa/, Spanish, Portuguese agua /agwa/, Catalan aigua /aigwa/, French eau /o/, Sardinian akwa, abba, ... so French is atypical.

How do we use language? Shared patterns in the frequency of word use across 17 world languages - PMC
Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p < 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use.

...
A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change.
  • Indo-European: Germanic (English), Italic (French, Spanish, Portuguese), Slavic (Russian, Czech, Polish), Greek
  • Uralic: Finnish, Estonian
  • Niger-Congo: Swahili
  • Altaic: Turkish
  • Austronesian: Maori
  • isolate: Basque
  • Creole: Tok Pisin
The average inter-correlation among the languages in the frequencies of use across the 200 word meanings is 0.73 (p < 0.0001), using the single Indo-European mean. Previously, we found an average inter-correlation of 0.85 for English, Russian, Greek and Spanish, and here we find an average inter-correlation among the nine Indo-European languages of 0.82.
Then discussing factors that lead to anomalies, like Finnish having "rotten" more offen than average -- likely from wood rotting in damp environments.
 
The history of number words in the world's languages—what have we learnt so far? - PMC
Taken together, the evidence suggests that numbers are at the cross-roads of language history. For languages that do have productive and consistent number systems, numerals one to five are among the most reliable available linguistic fossils of deep history, defying change yet still bearing the marks of the past, while higher numbers emerge as innovative tools looking to the future, derived using language-internal patterns and created to meet the needs of modern speakers.
Some number systems have very limited number words.
For more languages than previously assumed, either number words are virtually non-existent, or else they do not designate exact quantities [8, p. 414]. ...
In the Amazon, Papua New Guinea, and Australia. These places have warm climates, where it is not necessary to store food to survive a winter, so they may not have much need to count anything.

Mundurukú: 2: fat, 3: arms, 4: parents (not consistently)
Hup dialect: 2: eye quantity (also in Karitiana), 3: rubber-tree seed quantity

One pattern that appears to hold mostly unchallenged (dialects of Hup notwithstanding) is the representation of low-limit number words by means of unanalysable atoms. For languages that do have them, the history of such low-limit number words is astonishingly deep. Indo-European, Bantu, Austronesian and Pama-Nyungan languages have all been shown to preserve cognate forms of low-number words that have stubbornly lingered around for tens of thousands of years of linguistic evolution [7,17].

Noting
The deep history of the number words - PMC
We have previously shown that the ‘low limit’ number words (from one to five) have exceptionally slow rates of lexical replacement when measured across the Indo-European (IE) languages. Here, we replicate this finding within the Bantu and Austronesian language families, and with new data for the IE languages. Number words can remain stable for 10 000 to over 100 000 years, or around 3.5–20 times longer than average rates of lexical replacement among the Swadesh list of ‘fundamental vocabulary’ items. Ordinal evidence suggests that number words also have slow rates of lexical replacement in the Pama–Nyungan language family of Australia.
 
Ultraconserved words point to deep language ancestry across Eurasia - PMC - finding
Nostratic: (Dravidian, Kartvelian, Eurasiatic)
Eurasiatic: ((Indo-European, Uralic), (Altaic, Chukotko-Kamchatkan, Eskimo-Aleut))

with a time depth of roughly the beginning of the Holocene Epoch.

Turning to Support for linguistic macrofamilies from weighted sequence alignment - PMC Eurasiatic appears there also, though not broader Nostratic or Sino-Caucasian.

Japanese and Ainu turned out closest to Austroasiatic -- they seem to be Eurasiatic-Austric hybrids.

Hybrids? English has a very hybrid vocabulary, with much of it from Old French. But its grammar and basic vocabulary are nevertheless Germanic.

Using hybridization networks to retrace the evolution of Indo-European languages - PMC

Not much discussion of comparison to earlier linguistic scholarship, however.

PNAS Plus: Evolutionary dynamics of language systems - PMC
At first glance, our finding that structural features evolve more rapidly than basic vocabulary is incompatible with hypotheses proposing extremely deep signal in grammatical features. However, we do identify a set of highly stable structural features (Dataset S2). The highly stable features include many that have been previously suggested (5, 29), such as inclusive vs. exclusive distinctions and gender distinctions. For example, gender distinction in third person, and gender distinction in third person only are highly stable, while the presence of gender distinctions is in the medium rate category. Other features suggested to be unstable also appear in the fast rate category; for example, the presence of numeral classifiers is rapidly changing consistent with predictions that these are highly areal features (30). Suggestions that definite articles are unstable are also borne out by our results (31).

In contrast, however, other structural features predicted to be stable are not identified as such by our results. For example, case systems are proposed to be stable over time (30), but the four case-marking features (presence of case marking on core nominal noun phrases, on oblique nominal noun phrases, on core pronouns, and on oblique pronouns) all fall into the medium rate category. Likewise, constituent order features (order of numeral and noun, order of subject and verb), which have been claimed to be stable cross-linguistically (31), are allocated to the fast or medium rate category in this analysis. A major difference between our results and these studies is that we precisely estimate rates of change within a single, well-known language family, whereas the other analyses produce aggregate measures estimated from a phylogenetically disparate sample (30, 31). Such differences raise the possibility that features may vary in genealogical stability across different linguistic lineages (13), and suggest that future work should take a dynamics approach to language stability and attempt to identify the situations in which features are stable and those in which they not.

Gerhard Jäger has another paper:
Global-scale phylogenetic linguistic inference from lexical resources - PMC
Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of almost 7,000 phonetically transcribed word lists over 40 concepts, covering two thirds of the extant world-wide linguistic diversity. First, we estimated Pointwise Mutual Information scores between sound classes using weighted sequence alignment and general-purpose optimization. From this we computed a dissimilarity matrix over all ASJP word lists. This matrix is suitable for distance-based phylogenetic inference. Second, we applied cognate clustering to the ASJP data, using supervised training of an SVM classifier on expert cognacy judgments. Third, we defined two types of binary characters, based on automatically inferred cognate classes and on sound-class occurrences. Several tests are reported demonstrating the suitability of these characters for character-based phylogenetic inference.
Not some macro-linguistic work, but instead a test of reliability of automated methods.
 
Merritt Ruhlen had constructed a list of "global etymologies" and they include
  • tik = 1, finger, to point
  • pal = 2
He proposed a challenge to critics: how likely can one find the reverse? tik = 2, pal = 1.

This paper contains this permutation test applied to full-sized word lists: Permutation test applied to lexical reconstructions partially supports the Altaic linguistic macrofamily | Evolutionary Human Sciences | Cambridge Core - (PDF version)

Also in PubMed is Triangulation supports agricultural spread of the Transeurasian languages - PMC

Both of them agree on (Turkic, (Mongolic, Tungusic)) for (Narrow) Altaic, and also that Japanese and Korean are more distant.
 
Just south of the Transeurasian or (Broad) Altaic family's speakers is where Sino-Tibetan's speakers live.
The second one finds a clean split between Sinitic (Chinese languages) and Tibeto-Burman, the rest of ST, while the first one has a 0.54 probability of (Sinitic, Sal) with the rest of TB having similar probabilities in its early branchings.

ST thus has an age of about 7,000 - 8,000 BP, and the early TB branchines 6,000 - 5,000 BP.

Around the time of origin of millet farming in the Yellow River valley, around 8,000 BP (6,000 BCE).

Likewise, Transeurasian originated around 9,000 BP, the time of origin of millet farming in the West Liao valley.

For Vasco-Caucasian (Euskaro-Caucasian), Basque being brought to southwest Europe by Neolithic farmers makes the language marofamily's origination time 12,000 - 10,000 BP (10,000 - 8,000 BCE).

This may also explain some Germanic substrate vocabulary like "lamb" ~ NC "sheep" and some Greek substrate vocabulary - Notes on some Pre-Greek words in relation to Euskaro-Caucasian (North Caucasian + Basque)

As to Germanic "lamb" ~ NC "sheep", I searched for "sheep" in Search for data in: North Caucasian etymology and I found several entries. One of them is:
Proto-North Caucasian: *ɫVmbagV
Meaning: sheep
Proto-Avaro-Andian: *lVmbagV
Proto-Lezghian: *lamp:ak:
Notes: The root is rather restricted geographically and has a peculiar shape (as well as non-standard reflexes of medial *-mb-), thus old borrowing is not excluded (however, the source is not clear).
 
The opposite of highly-stable word forms is Wanderwörter -  Wanderwort - "wander words" - words that travel widely with what they name.

Here is a case of two rival wander words for the same thing: WALS Online - Feature 138A: Tea -- tea/translations - Wiktionary

Tea was invented in China, and which word for it elsewhere in the world depends on which dialect of Chinese was the source of that word, and that was a function of which place in China that traders bought tea from. English "tea" is from southern-dialect "te", while Russian "chai" is from northern-dialect "cha".

WALS has an example neither origin, Hawaiian kî, but that looks like a borrowing of English "tea" /tî/ that was fit into Hawaiian phonology.

Some other ones are Polish herbata, from recent Latin herba thea, and Ojibwe aniibiishaaboo, literally "leaf soup" or "leaf liquid".
 
The Tower of Babel - An Etymological Database Project - with lots of macro-linguistics work, including Articles and Books: The Tower of Babel Electronic Library and The Site's Etymological Databases - including "Long-range etymologies" -- the Borean hypothesis.

Also there is The Global Lexicostatistical Database

Quest for the mother tongue: the story behind the search for "proto-World," a primeval language that most linguists believe will never be found, that many believe never existed, but that some say they're already piecing together. - Document - Gale Academic OneFile
Author: Robert Wright
Date: Apr. 1991
From: Atlantic(Vol. 267, Issue 4)
Publisher: The Atlantic Monthly Group LLC
Document Type: Cover story
Length: 15,529 words
Then describing Vitaly Shevoroshkin, a macro-linguist who immigrated to the US from the Soviet Union in 1974, and his odd turns of phrase.
Something is being gained in translation here. Shevoroshkin speaks English imperfectly. If he knew that Americans generally reserve the term "twisted" for a select subset of felons, he doubtless would have described Goddard and Campbell more moderately--as, say, "fools." Similarly, "falsifies facts," upon elaboration, turns out to be the rough equivalent of "interprets the evidence wrongheadedly."
 
Then going into comparative and historical linguistiics, and then Indo-European and Nostratic.
But meanwhile, back in Moscow, trouble in brewing. Shevoroshkin's fellow Nostraticists are growing displeased. Though they like Shevoroshkin, and value the work he's done in publicizing Nostratics, a number are put out by all his talk of proto-World. they fear that it is only confusing Westerners about what Nostratics is and giving Nostraticists a reputation for wild-eyed speculation. This isn't to say that many of them doubt the past existence of a proto-World language; it's just that now may not be the ideal time to dwell on the subject--and certainly not to dwell on it in the manner that Shevoroshkin does. Some of the claims he has been making about proto-World, his fellow Nostraticists observe, could stand a bit more--how do you say?--nuance. He tends to get carried away, and journalists seem to abet this tendency.
Proto-Human, Proto-Sapiens, Proto-World: humanity's first language, or at least the last common ancestor of every documented one.

For that long-ago language: (PDF) Global etymologies at ResearchGate
By John Bengtson and Merritt Ruhlen
27 word roots that they find all over the world.

But it was spoken at least 70,000 years ago, before the first of our present species left Africa and started to settle the rest of the world. That's plenty of time for both these settlers and the African stay-at-homes to mangle this language beyond recognition -- and to mangle it in different directions as they dispersed -- making it impossible to reconstruct.

All the well-established language families whose original-speaker populations can be plausibly identified historically or archeologically, they are mid-Holocene or later, though Sino-Tibetan pushes into the early Holocene. Among speculative families like Transeurasian (Broad Altaic) or Vasco-Caucasian, families with plausible archeological correlates, they are early Holocene.

It's hard to identify anything archeological with Afro-Asiatic, the oldest well-established language family, let alone the likes of Eurasiatic, Nostratic, Dene-Sino-Caucasian, Amerind, or Austric.
 
Back to this article.
But the Moscow Nostraticists are in solid agreement with Shevoroshkin about one thing: what a frustrating group of people American comparative linguists are. They have a way of casually dismissing, or simply ignoring, interesting ideas. Almost none of them, for example, has more than taken a glance at the Nostratic thesis. ...

Joseph Greenberg, of Stanford University, the only eminent American comparative linguist who strongly believes in proto-World (and who believed in it before Shevoroshkin started talking about it), summarizes the willful agnosticism of most American linguists by paying them a caustic compliment. "They have succeeded," he says, "in suppressing their curiosity." Shevoroshkin is similarly impressed. "They are not interested in many things which are interesting," he says. He is referring specifically to his colleagues at Michigan, but he then proceeds to generalize beyond them with what is either a deft bit os sarcasm or a pioneering use of terminology. "And that's the American way," he says.
A critic of macro-linguistics, Indo-Europeanist Eric Hamp, points out that sound changes can produce odd results, like the Armenian word for "three", erek'. For "two", sound changes have produced something equally odd: erku. If one was doing Greenberg-style mass comparison, one would not think that they are related, so one would get false negatives.

Let's look for 2 and 3 elsewhere in IE. English two, three /tû, thrî/, German zwei, drei /tsvai, drai/, Swedish två, tre /tvô, trê/, French deux, trois /dö, trwa/, Spanish dos, tres, Italian due, tre, Irish dhá, trí, /ghâ, trî/, Greek duo, treis /dhio, tris/, Russian dva, tri, Persian do, se, Hindi dô, tîn, Bengali dui, tin, Sinhalese deka, tuna, ...

Even with only the present-day IE languages, one can easily recognize a pattern.

For "four", one gets more variation, like Welsh pedwar, Spanish cuatro, Greek teseris, Russian chetyre, Hindi châr, ... and a likely Greenbergian false negative.
 
Last edited:
Then about Vladislav Illich-Svitych's work on Nostratic. Western linguists are skeptical.
To begin with, reconstructed proto-Nostratic words strike Hamp as a bit more phonetically complex than proto-Indo-European words. ...

Hamp also complains about a certain scruffines in some of the Nostratic etymologies, a problem he traces partly to the inherent difficulty of any one scholar's mastering the complexities of six language families.
Though PIE *ayes- means different things in different descendants, like Latin aes "bronze" and English "ore".

Then turning to American linguist Joseph Greenberg.
Greenberg holds the distinction of having been denounced by both Eric Hamp and Shevoroshkin. He classifies languages in a way that conservatives and many radicals alike find subversive. But whether Shevoroshkin likes it or not, Greenberg's methodology is now the tool of choice among scholars who look for ancestral affinities among all the world's language families.
He has worked on several macro-linguistic classifications:
  • Africa: Niger-Congo, Nilo-Saharan, Afro-Asiatic, and Khoisan
  • New Guinea, Tasmania, and the Andaman Islands: Indo-Pacific
  • The Americas: Eskimo-Aleut, Na-Dene, Amerind (all the others)
  • Northern Eurasia: IE, Uralic-Yukaghir, Transeurasian (Broad Altaic), Chukotko-Kamchatkan, Eskimo-Aleut, Nivkh, Ainu: Eurasiatic, part of Nostratic
The scholarly feedback accorded Greenberg's ideas about American Indian languages is more or less typified by the suggestion several years ago by Lyle Campbell, of Louisiana State University, that Greenberg be "shouted down." Campbell says he finds it "really depressing" that the proto-World publicity blitz has lately brought Greenberg some uncritical popular attention. He compares Greenberg to the scientist who last year predicted an early December earthquake in New Madrid, Missouri, and then (through November, at least) basked in publicity while his soberer colleagues labored in obscurity. Campbell's fellow American Indianist Ives Goddard, of the Smithsonian, uses a different simile. He says Greenberg's Amerind thesis is like the claims made for cold fusion. (As for the reconstruction of proto-World: Campbell compares it to medieval alchemy, Goddard to the search for Bigfoot.)
 
Greenberg calls his methodology "mass comparison." Other people have other names for it. One linguist calls it "megalocomparison." Eric Hamp calls it "crude and puerile." He elaborates: "I mean, quite frankly, around here, in our classes here, if any of our students did any stuff like that, we wouldn't even tolerate it. You would flunk a person who did that kind of stuff."
EH then noted sound changes, a source of false negatives, and coincidences and borrowings, sources of false positives.

JG got around borrowing by sticking to basic sorts of vocabulary, the stuff of Morris Swadesh's big list and similar lists.

This highlights what Baxter says is a more legitimate gripe about Greenberg: he doesn't spell out criteria for deciding when two words correspond closely enough to qualify as a match. Greenberg himself may not need such pedantry; his intuitive sense for linguistic affinity is the subject of some renown. But other linguists may. And science is supposed to be a game anyone can play. Greenberg's critics say his notebooks are just an unusually long Rorschach test: he looks at them and sees what he wants to see, granting himself whatever phonetic (not to mention semantic) latitude turns out to be necessary. (Even the significance of the n-m pattern for first- and second-person Amerind pronouns--a fairly simple-sounding thing--Greenberg has not yet demonstrated to everyone's satisfaction.)
The n-m pattern is common, like the m-t pattern of northern Eurasia, but it is far from universal. Also, one can do computerized versions of Greenberg's method, but such methods have their own limitations.
By failing to make his analysis more accessible, Baxter says, Greenberg has allowed everyone to ignore the methodological issues he's raising. And that's not the only excuse he's given them. Many simple mistakes have been found in Language in the Americas: words in the wrong language, words with the wrong meaning. Greenberg's detractors often preface their indictments by denouncing his methodology, but then, rather than elaborate, they almost invariably veer toward their favorite subject: his sloppiness. There seems to be an unofficial contest among them to characterize most powerfully the magnitude of the error. (Campbell may win with his pithy "The bulk of Greenberg's data is non-data.")
JG himself admits that there are mistakes in his work. But he is confident that by doing enough comparison, a few mistakes won't make much difference.
 
Both Greenberg and Shevoroshkin subscribe to the theory (a "super-Nostratic" theory, some call it) that Amerind and Nostratic are sister phyla--that their proto-languages share a parent proto-proto-language.
Then mentioning Dene-Sino-Caucasian.
This has to do with part two of Starostin's theory--the contention that Sino-Caucasian and Nostratic are sister phyla, sprung from a proto-proto-language spoken more than 15,000 years ago. Starostin has observed that if he is right, and if Greenberg and Shevoroshkin are right about a comparable Amerind-Nostratic kinship, there must have been a single language whose progeny went on to occupy almost all of Eurasia and the New World. Call it proto-SCAN: Sino-Caucasian, Amerind, Nostratic.

... (Actually, you might make that SCAAN. Starostin and Greenberg believe that proto-Afro-Asiatic was a sister of proto-Nostratic, not a daughter. And this revision is a good example, Greenberg believes, of what's wrong with mainstream methodology; if Illich-Svitych and Dolgopolsky had first gotten the big picture through mass comparison, they would have had a clearer idea of which language families to compare fastidiously.)
Broad Nostratic = Narrow Nostratic + Afro-Asiatic
Narrow Nostratic = Eurasiatic + Kartvelian + Dravidian
SCA(A)N = Borean

The remaining ex-African macrofamilies are Austric, Indo-Pacific, and Australian (incl. Pama-Nyungan). Is Austric closest to Dene-Sino-Caucasian ("Dene-Daic")?, The closest relative of Borean? Not particularly close to Borean?

In Africa, that leaves us Khoisan, Niger-Kordofanian (incl. Niger-Congo), and Nilo-Saharan, with a proposal of Congo-Saharan for the latter two.

Our present species originated in sub-Saharan Africa and dispersed from there -  Early human migrations - with the first successful ex-Africa people going along southern Eurasia from Arabia to India to Southeast Asia to Australia ("Southern dispersal"). Some of these people then went northward, into the rest of Eurasia and then into the Americas.
 
Last edited:
The article then gets into "Global Etymologies", starting off with Vitaly Shevoroshkin being annoyed at an article titled "Mother Tongue" and subtitled "How Linguists Have Reconstructed the Ancestor of All Living Languages."

Then discussing the work of Vaclav Blazek (Blazhek) and Merritt Ruhlen, a proud Greenbergian.
Ruhlen's quest for proto-World can be traced to a 1976 lecture at which Greenberg said that in languages around the world he had seen words sounding something like "tik" for "one," or for "pointing finger" or just "finger."
Like Indo-European *deik- and its numerous descendants, like English "toe" and "teach" and Latin dîcere "to say" and digitus "finger, toe".

JG wanted to address that issue after finishing up his book on Eurasiatic: "Indo-European and its Closest Relatives" -- "sort of finish up the whole world". But "it's such a big job," and "the fact is, I have a finite intelligence." That book was published in two volumes in 2000 and 2002, and JG died in 2001, so he never had a chance to address this issue. He dedicated his book to Merritt Ruhlen, "optimo discipulo" ("to (my) best disciple" in Latin).

So it was up to MR to finish the job; JG suspected some 50 global etymologies, suspected by JC, Vladislav Illich-Svitych, Aharon Dolgopolsky, and the Italian linguist Alfred Trombetti, from the early 20th cy. "They practically ran him out of the linguistics community," JG said about AT.

MR posits some 31, like *pal "two", *aya "mother, older female relative", *bur "ashes, dust", *ku, *kun "who?", *mano "man, person", *mena "to think", VB includes *tali "tongue", *gini, *nigi "tooth".

VB uses 8 macrofamilies, accepting a root if it is in at least 3 of them. But 11 of his 12 are in at least 4 of them, and *tali "tongue" in all 8.

MR is more cautious, dividing them into 32 macrofamilies, and some of his roots are found in as few as 6 of them.
 
Joseph Greenberg had a defense of his work on the languages of the Americas: he'd point to some pattern being more common there than in the rest of the world. But that's not available for putative global etymologies. Merritt Ruhlen came up with a workaround: does one find scrambled versions of his global etymologies? Like instead of *tik = 1, *pal = 2, does one find *tik = 2, *pal = 1? That's a good test, and versions of that have been done in some automated mass-comparison tests, like for Broad Altaic (Transeurasian).

The article then mentions lumpers and splitters, and how each one is a valuable corrective for the other.

Finally, Vladislav Illich-Svitych's short poem, about historical linguistics itself.
Language is a ford across the river of Time,
It leads us to the dwelling place of those who are gone;
But he will not be able to come to this place
Who fears deep water.

And Eric Hamp might add: He will not be able to come to this place who isn't accompanied by some cautious soul insisting that they tread carefully.
 
Indo-European and Its Closest Relatives: The Eurasiatic Language Family, Volume 1, Grammar
Amazon.com: Indo-European and Its Closest Relatives: The Eurasiatic Language Family, Volume 1, Grammar: 9780804738125: Greenberg, Joseph H.: Books
Indo-European and Its Closest Relatives: The Eurasiatic Language Family, Volume 2, Lexicon
Amazon.com: Indo-European and Its Closest Relatives: The Eurasiatic Language Family, Volume 2, Lexicon: 9780804746243: Greenberg, Joseph H.: Books

Eurasiatic (Greenberg): Indo-European, Uralic-Yukaghir, Altaic (Turkic, Mongolian, and Tungus-Manchu), Japanese-Korean-Ainu (possibly a distinct subgroup of Eurasiatic), Gilyak, Chuckchi-Kamchatkan, and Eskimo-Aleut.

"The Eurasiatic-Amerind family represents a relatively recent expansion (circa 15,000 BP) into territory opened up by the melting of the Arctic ice cap."
 
Apologies if this is off-topic or uninteresting but it didn't seem worth its own thread.


Thailand has, according to a convenient if dated source, 31 separate languages: --

Ten languages in the Mon-Khmer subdivision of Austric: Tonga, Monic, the pair Khmer & Katuic (grouped together as Eastern Mon-Khmer, and finally Northern Mon-Khmer (4 Khmuic, 2 Western Palaungic)

Katuic by the way is a subsubfamily with 23 languages, 16 found in Laos, 5 in Vietnam, 1 each in Cambodia and Thailand. Those latter two clade together as Kuy-Suei resp.)

Eight languages in the Southwestern Tai sub-branch of Austric: Yong, Southern Tai, and clading together, Lao-Phutai, Nyaw and Chiang Saeng (4; includes Northern and Central Thai). The details are more complicated, since there are dialects, e.g. tone variations.

Six language in Sino-Tibetan: Karen (3) and Lolo (3).

Four languages in the Malay-Moklen subdivision of Austric-->Polynesian-->Sundic.

Two deaf sign languages: Thai and Hill;

One language in the Miao subbranch of Austric: Hmong.
A villager in Surin -- the border province in the southern part of Isaan especially famous for its elephants -- may be quatrilingual: Khmer, Suei (in Katuic), Thai and Lao-Phuthai (both in the East Central branch of Southwestern Tai)


By comparison, France is shown with just nine languages: Breton (Insular Celtic) two deaf sign languages (Lyons and Paris?) and six languages from the Romance sub-sub-family. The six Romance languages spoken in France are Corsican (Southern Romance), Gallo-Romance (3 languages), and Ibero-Romance (2 languages (The latter two groups clade together as Western Romance.)
 
() - The_Dene_Sino_Caucasian_hypothesis_state.pdf
John Bengtson (Santa Fe Institute)
George Starostin (Russian State University for the Humanities; Russian Presidential Academy)
The Dene-Sino-Caucasian hypothesis: state of the art and perspectives

Proto-DSC is reconstructed as having an elaborate system of consonants much like for Proto-North-Caucasian, but most of the other branches have simplified the consonants, like yielding voiceless vs. voiced and aspirated vs. nonaspirated voiceless.

In turn, PNC's consonants are reconstructed from the North Caucasian langs, which have elaborate systems of consonants.

PDSC: labial (p, b, m, w), dental (t, d, n, r), alveolar (s, z, ts, dz), postalveolar (sh, zh, tsh, dzh), in between those two (s-sh, z-zh, ts-sh, dz-zh, y), lateral (l, tl, thl), velar (k, g, kh, gh, ng), uvular (like velar, but farther down), pharyngeal (like h but in the throat), laryngeal/glottal (h).

Also includes glottalized or ejective consonants, those have a short pause with a blocked airway before starting the voice. Aspirated consonants are similar, with a bit of breathing before starting the voice.


There is one well-known member of this macrofamily that has an aspirated-nonasprated distinction: Mandarin Chinese. In the  Wade–Giles transcription, this distinction is written as p' - p, t' - t, k' - k (combined: T' - T), while in  Pinyin, it is p - b, t - d, k - g (combined: T - D). Wade-Giles also uses a lot of dashes which Pinyin omits. Thus: Peking -- Beijing and Mao Tse-Tung -- Mao Zedong.

Looking at other Chinese dialects / languages, Cantonese also has this distinction, which I write for short as Th, T, while Hokkien has a three-way distinction, adding voiced: Th, T, D, much like present-day Thai.  Middle Chinese has that distinction, as does  Old Chinese That distinction may go back to the  Proto-Sino-Tibetan language

Elsewhere, Na-Dene is (Haida, (Tlingit, (Eyak, Athabaskan))) -  Proto-Athabaskan language has distinction Th, T', T, where T' is glottalized. It also distinguishes velar and uvular consonants.

Basque and the Yeniseian language Ket only have T, D, while Burushaski has Th, T, D, along with velar-uvular.

In this transcription, PNC and PDSC have T, T', D.
 
That article continued with "A parallel issue concerns ލstabilizing the bordersތ of DSC." -- resolving the issue of some proposed members. Some of them have also been proposed for Eurasiatic and Nostratic, though not very often.

 Paleo-European languages mentions many of them.

 Hattic - Anatolia, 2nd mill BCE
Northwest Caucasian?

 Hurro-Urartian languages
Hurrian: N Mesopotamia, 2300 - 1000 BCE
Urartian: E Anatolia, 9th cy. - 585 BCE
Northeast Caucasian?

 Tyrsenian languages
Etruscan: NE, W Italy, 700 BCE - 50 CE
Lemnian: Lemnos, Greece, 550 BCE
Raetic: W Austria, nearby, 500 - 100 BCE
Camunic(?): N Italy, 1st mill BCE
Dene-Sino-Caucasian?? Eurasiatic?? Nostratic??

 Sumerian language - Mesopotamia, 2900 BCE - 1700 BCE - (classical language) - 100 CE
After dropping out of everyday use, Sumerian survived as a classical language, much like Latin or Sanskrit, for nearly 2 millennia.
Dene-Sino-Caucasian?? Nostratic??
 
«Evolution of Human Languages»: current state of affairs (03.2014) at Home | Santa Fe Institute

Lots of long rangers in it: Sergei Starostin, Anna Dybo, Oleg Mudrak, Alexander Militarev, Olga Stolbova, Vladimir Dybo, Václav Blažek, Merritt Ruhlen

Then noting Distant Language Relationship: The Current Perspective - 2009

The highest reliability is the likes of Germanic, Polynesian, Turkic, and next down is Indo-European, Turkic, Dravidian, Uto-Aztecan. With the second level, one can get good results with the help of first-level protolanguages, and some families on the second level have long-ago attested members, making them comparable to the first level. To get to the third, macrofamily level, one ought to use reconstructed protoforms from the first and second levels.

Some language families have gotten much more research than others.
Today, Indo-European comparative linguistics is still — for good reason — seen as the best standard to which one can hold comparative research done on the basis of other language families, ‘classic’ and ‘non-traditional’ alike. Yet there are also certain types of research for which it makes little sense to hold them up to such high standards.
Because of limited data and a limited amount of research and the like. But I think that well-established families like IE are good checks for whatever methods that one might want to apply, like automated mass comparison.
 
Last edited:
Back
Top Bottom