• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

Uri Tadmor et al., in their Borrowability paper, refer to a 61-word list that is in this thesis:

Lohr, Marisa. 1998. Methods for the Genetic Classification of Languages. PhD dissertation, University of Cambridge.

But it's paywalled: Methods for the genetic classification of languages. and Methods for the genetic classification of languages. -- frequest page stating "This item is under controlled access and is available subject to an access fee of £75"

They also state that Aharon Dolgopolsky has a list of 23 stable meanings, though I don't know how they counted those meanings, because his short list has 15 meanings and his long list 40 meanings.

AD also investigated Indo-European, Afroasiatic, Uralic, Altaic, Chukotko-Kamchatkan, Kartvelian, and Sumerian, and he concluded that all but Sumerian are recognizably related, thus supporting Nostratic.

One could extend this work with the 207-word Swadesh lists at Appendix:Swadesh lists - Wiktionary, the free dictionary -- though Wiktionary's contributors have some protoforms in it, their coverage has odd gaps like lack of Proto-Uralic.
 
In Hot Pursuit of Language in Prehistory - Bengtson, John D. - John Benjamins Publishing Company - Torrossa
contains
The Languages of Northern Eurasia: Inference to the Best Explanation
This paper discusses the development of hypotheses of classification of the languages of northern Eurasia, from the early “Scythian” hypothesis to the later Nostratic, Eurasiatic, Sino-Caucasian, and Dene-Caucasian proposals. The concept of scientific “proof” is discussed and contrasted with an alternative concept of “best explanation.” Eurasiatic and Dene-Caucasian can then be viewed as testable and fruitful hypotheses that, so far, provide the best explanations for language diversification in northern Eurasia.

"Just within the past decade there seems to be a growing consensus that there is a “core” Eurasiatic family consisting of Indo-European, Kartvelian, Uralic, Yukaghir, Altaic (including Korean and Japanese), Chukchi-Kamchatkan, and Eskimo-Aleut."
and
"Dravidian and Elamite may be further outliers, and many paleolinguists now agree that Afro-Asiatic can be considered another macrofamily roughly coordinate with Nostratic/Eurasiatic, rather than a part of it."

"... the present-day Dene-Caucasian hypothesis includes Basque, Caucasian, Burushaski, Sino-Tibetan, Yeniseian, and Na-Dene."

Then comparing Eurasiatic and Dene-Caucasian.

Phonology:

DC: "lateral affricates", tl, dl, tl' (glottal), in North Caucasian, Na-Dene, lost in the others
Eurasiatic: none

DC: velar-uvular (k-q) distinction: all but Basque, Sino-Tibetan
Eurasiatic: none in all but Kartvelian (areal in that one?)
 
Morphology:

Noun classes or genders:
DC: several, in North Caucasian, Burushaski, Yeniseian, fossilized in Basque, Sino-Tibetan, new system in Na-Dene
Eurasiatic: animate/inanimate, lost in Uralic, Altaic, Kartvelian, feminine only in Indo-European

1st and 2nd personal pronouns:
DC:
1s1: Cauc ni, Bur a-, Yen b-, 'ab-, ng, Bsq ni -- DC *('a)ngV
1s2: Cauc zô, Bur dza, Yen 'adz -- DC *('a)dzu
2s1: Cauc ghwV, Bur gu-, go-, Yen kV-, 'Vk-, Bsq hi -- DC *('u)Gwu
2s2: Cauc wô, Bur un, Yen 'aw, 'u -- DC *('a)wu
.
Sino-Tibetan: 1s *ngV ~ 1s1, 2s *na-(ng)-, (Tibetan, Burmese) *Kwa ~ 2s1
Na-Dene: 1s (Athabaskan) *sh- ~ 1s2?, 2s (Athabaskan) *ngän, (Haida) dang (like ST), (plural) (Athabaskan) *qhi- (Haida) *yi- ~ 2s1
.
Eurasiatic: 1s *m-, *b-, 2s *t-, *s-

Interrogative pronouns:

DC: (all) *s-, (all but Na-Dene) *n-
Eurasiatic: (animate/personal) *k-, (inanimate/impersonal) *m-. *y-
 
John Bengtson then got into "Fruitfulness" -- does a hypothesis enable further discoveries that are at least somewhat consistent with it?

He then mentioned some DC words that he discovered. Na-Dene and North Caucasian both have words with tl' that mean "left" or "bottom".

Also with a lateral affricate (tl or dl) some words for putting on and taking off footwear, with some words for footwear and barefoot.

Then "Inference to the Best Explanation". Is JB conceding that it's hard to get anything as strong as for (say) Indo-European?

He described the development of macrofamily hypotheses for northern Eurasia over the decades, ending up with Eurasiatic and Dene-Caucasian, with Afroasiatic being somewhat close to Eurasiatic. This is a natural result of looking in more and more detail, building on the work of one's predecessors.

He then gets into shared irregularities, like suppletive paradigms. English good, better, best is shared across Germanic, and English be, am, is is shared across Indo-European: *bheuH-, *h1es-.

Indo-European has *egHom "I", *me "me" and Chukchi-Kamchatkan has South Kamchadal kim "I", ma "me", but those could be separate developments. Oleg Mudrak reconstructs Proto-CK as having 1s *ɣə-m, 2s *ɣə-š, oblique *ɣə-n (Kirill Babaev's big collection of pronouns: Once Again on the Comparison of Personal Pronouns in Proto-Languages).

BTW, looking at the Amerind pronouns that KB lists, n-m stands out: Hokan, Penutian, Aztec-Tanoan, Chibchan, Quechuan, with the latter being 1s *nu-qa, 2s *qa-m -- qa is in different places.
 
"For some perspective, it must be admitted that the evidence for Eurasiatic and Dene-Caucasian is comparable to that assembled so far for African (macro-)families (or phyla) such as Niger-Congo."

So John Bengtson proposes in north Eurasia two macrofamilies, Eurasiatic/Nostratic and Dene-Caucasian, to the southwest, Afroasiatic, and to the southeast, Austroasiatic. Further to the southeast, Indo-Pacific -- Kusunda: An Indo-Pacific language in Nepal - PMC

Another thing that needs to be done is to define more precisely the membership of each macrofamily. It is embarrassing that each paleolinguist seems to have a different list of members of Eurasiatic and/or Dene-Caucasian, a fact that prevents some linguists from taking either hypothesis seriously.
What a difficulty.

Also,
The most probable taxonomic status of, for example, Etruscan, Sumerian, Gilyak (Nivkh), Kusunda, and Ainu should be more definitively determined.
 
Niger-Congo? I looked in Wikipedia and Google Scholar for more:

Wikipedia:
Niger–Congo
Atlantic–Congo
Volta–Congo
Benue–Congo
Bantoid
Southern Bantoid
Bantu

Bantu speakers live in most of central Africa and much of southern Africa. Non-Bantu Bantoid's speakers live in W Cameroon and E Nigeria. Non-Bantoid Benue-Congo's speakers live in E Nigeria.

Searching for "Benue-Congo" gives this:
Niger-congo, with a Special Focus on Benue-congo | The Oxford Handbook of African Languages | Oxford Academic (PDF) by Jeff Good
Niger-Congo is the largest referential language group in Africa. The extent to which it represents a true genealogical grouping is not established, though there is a large core set of members of the family that all specialists currently accept as related.
JG agrees with Wikipedia on Bantu and Bantoid, but the two disagree further out. JG's Benue-Congo is roughly Wikipedia's Benue-Congo and Volta-Niger. His map of well-established Niger-Congo is roughly Wikipedia's Atlantic-Congo with some differences here and there, like his omitting Ubangian and some differences in West Africa.

Volta-Niger has subgrouping YEAI with Igbo and Yoruba, two widely-spoken languages in Nigeria.

Searching for "Volta-Congo" gives this:
Recurrent sound correspondences of Akan and Yoruba and their significance for Proto-Benue-Kwa (East Volta-Congo) C1 reconstruction by Obadele Kambon
OK agrees with JG about Benue-Congo, with his East BC being Wikipedia's BC and his west BC being Wikipedia's East BC.

With the Kwa languages, incl. Akan, just to the west of Volta-Niger, he has what might be called East Volta-Congo (Proto-Benue-Kwa, Proto-Potou-Akanic-Bantu) and West Volta-Congo (Volta-Kru).

Google Scholar doesn't have much on Atlantic-Congo, but it does have a lot on Niger-Congo.
 
JG and OK have a somewhat broader definition of Benue-Congo than Wikipedia:
  • Their East BC - Wikipedia BC
  • Their West BC - Wikipedia Volta-Niger

 Proto-Niger–Congo language - its discussion of noun classes mainly draws from "Niger-congo, with a Special Focus on Benue-congo", by JG, mentioned earlier, and Niger-Congo Verb Extensions: Overview and Discussion

For noun classes, JD lists 1 (human sg), 3, 4 (tree sg, pl), 5, 6 (assortment sg, pl), 6a (liquids) -- the numbering is from Bantu-class numbering.

For Atlantic (far west), Gur (NW of BC), Swa (W of BC), BC itself, Bantu, Kordofanian (off on a distance in NE):

Class 6 has a rather motley collection of semantics: Kord "egg", Atlantic "head, name", Gur "egg, head", Kwa, BC "egg, head, name", Bantu "egg, name"

Class 6a is likely numbered that because it was merged with Class 6 plural in Bantu, but separate elsewhere in NC.

Noun-class markers are prefixes in Bantu and elsewhere sometimes suffixes and sometimes both prefixes and suffixes. They sometimes drop out in langs recognizably related to langs with them.
 
Niger-Congo Verb Extensions: Overview and Discussion - paper1603.pdf

These are suffixes with a variety of functions. The paper refers to Voeltz.1977.pdf for reconstructions. I checked in it, and it referred to Proto-Bantu and Proto-Atlantic reconstructions, but only individual langs elsewhere. I found those langs scattered all over Atlantic-Congo, and also in some doubtful Niger-Congo branches like Ubangian and Kordofanian.

The forms in that paper sometimes vary quite a lot in phonetics and semantics, but some forms seem to be reconstructible.

  • Causative: like English -ify
  • Benefactive: like "doing something for someone" with the "for" attached to the verb
  • Passive voice
  • Reciprocal: "each other"
  • Reversive: the opposite of the bare verb's action, like un- or de- in some English words: unscrew, decouple
  • Stative: stationary state, like "is <verb>able"

Swahili verb extensions: Appendix:Swahili verbal derivation - Wiktionary, the free dictionary

There are some possible members of Niger-Congo that are considered doubtful because of lack of noun classes -- Mande, Dogon, Kru, Ijoid
 
I looked for whether anyone has done some lexicostatistics works for any proposed macrofamilies other than Transeurasian (Macro-Altaic, Broad Altaic).

I'd earlier found "On the Homelands of Indo-European and Eurasiatic: Geographic Aspects of a Lexicostatistical Classification" by Alexander Kozintsev -- found three main clusters: Eurasiatic, Afroasiatic, and North Caucasian. In Eurasiatic, Kartvelian is close to the others and Dravidian more distant.

Etnograficheskoe obozrenie :: №4 :: Asia or Africa? Locating the Afrasian Homeland

The six branches of Afrasian, as he calls it, diverged very rapidly, something that suggests an African homeland. As to connection with Natufian early farmers in the Levant, I propose a two-step process. Some Natufians move to NE Africa, and they then split up there.

He used nonmetric multidimensional scaling, and I finally found an article on it: Metric vs Nonmetric MDS: A Comparison

Ordinary MDS starts with a distance matrix and finds low-dimensional vectors with distances that best fit.

Nonmetric MDS only uses the rank order of distances, the rank of each sorted distance.
 
NorthEuraLex - Lexicostatistical Database of Northern Eurasia
The current release version 0.9 covers a list of 1,016 concepts across 107 languages of Northern Eurasia, with a focus on Uralic and Indo-European, but also including all the language families conveniently summarized as Altaic/Transeurasian and Paleosiberian, a selection of Caucasian languages, some major contact languages from adjacent families, as well as the most well-known isolates of Northern Eurasia.
All the words had International Phonetic Alphabet transcriptions, generated automatically in most cases from standard spellings or transcriptions.

Described in
NorthEuraLex: a wide-coverage lexical database of Northern Eurasia | Language Resources and Evaluation
Describing how it was built.

Part of the EVOLAEMP project:
EVOLAEMP | University of Tübingen
Language Evolution: The Empirical Turn | EVOLAEMP | Project | Fact sheet | FP7 | CORDIS | European Commission
 
Combining Information-Weighted Sequence Alignment and Sound Correspondence Models for Improved Cognate Detection - ACL Anthology (PDF)

Hard to tell how it works. Does it use some sort of transition matrix of probabilities between kinds of sounds? Something like what is done for amino acids and nucleotides in protein and gene sequences.

Or does it try to guess sound correspondences and then find which ones do best?


Back to Alexander Kozintsev on Afrasian / Afroasiatic -- from his diagrams, the branching between the six subfamilies of AA is relatively short compared to the lengths in the branches themselves, meaning relatively quick branching. That would explain why AA has had numerous relationship hypotheses over the decades.

From "Language Dispersal Beyond Farming" ed. by Martine Robbeetts and Alexander Savelyev,

Proposes (Omotic, (Cushitic, (Semitic, Egyptian, Berber, Chadic)))

Alexander Kozintsev agrees, and proposes (Chadic, (Semitic, (Egyptian, Berber)))

Relationship hypotheses have varied like crazy, with the only clear consensus being (Omotic, all the others).
 
It's hard for me to find macrofamily lexicostatistics, despite its obvious utility in testing macrofamily hypotheses.

Some 70 years ago, Joseph Greenberg classified the languages of Africa into four macrofamilies, Afroasiatic, Nilo-Saharan, Niger-Congo, and Khoisan. I have earlier discussed AA and NC, and I'll now discuss NS.

 Nilo-Saharan languages
In his book The Languages of Africa (1963), Joseph Greenberg named the group and argued it was a genetic family. It contains the languages which are not included in the Niger–Congo, Afroasiatic or Khoisan groups. Although some linguists have referred to the phylum as "Greenberg's wastebasket", into which he placed all the otherwise unaffiliated non-click languages of Africa,[2][3] specialists in the field have accepted its reality since Greenberg's classification.[4] Its supporters accept that it is a challenging proposal to demonstrate but contend that it looks more promising the more work is done.[5][6][7]

The Nilo-Saharan hypothesis tested through lexicostatistics: current state of affairs. by George Starostin
This somewhat preliminary report follows the same lines as my previous report on East Sudanic (2015). It summarizes all of my lexicostatistical work (mixed with elements of etymological analysis) on the various potential members of the «Nilo-Saharan» phylum, whose goals are to clarify their internal relationships and assess whether a «Nilo-Saharan», in any form, is detectable on the level of comparison between the most stable segments of the basic lexicon (approximately the same way that one could detect Indo-European by comparing the basic lexicon of even the most radically divergent present day Indo-European languages, or the same way that even such controversial taxa like «Indo-Uralic» and «Altaic» also receive some lexicostatistical support).

Like Greenberg, I follow a «step-by-step» methodology in trying to progressively assemble larger taxonomic blocks from smaller ones. The crucial difference, which becomes more and more important as one goes deeper in time, is that the methodology tries to reconstruct the optimal equivalent for the required Swadesh meaning on each taxonomic level and then proceed to compare it further, instead of allowing to compare any form from any modern language with a wide range of meanings semantically connected to the Swadesh meaning. This is a serious safeguard against «garbage parallels», caused by sheer accidence or by linguistic contacts between parts of the family (e. g. West Nilotic languages of the East Sudanic family with Moru-Maɗi languages of the Central Sudanic family)
He'd earlier done East Sudanic: Lexicostatistical Studies in East Sudanic I: On the genetic unity of Nubian-Nara-Tama and Lexicostatistical Studies in East Sudanic II: The Case of Nyimang

He also confirms Central Sudanic and Saharan, and he confirms a proposed family, which he calls Koman-Gumuz. Its namesake members are separate in Wikipedia's NS article as Koman and B'aga (Gumuz). But Wikipedia's article on the latter mentions Komuz as a name for this combined family.

NS has several isolates, Kuliak, Berta, Kunama, Songhay, Maba, Fur, Kadu, Shabo, and he considered which of these families they are closest to.

He found ((East Sudanic, Fur), Berta) and (Central Sudanic, (Maba, Kunama)) and a possible distant relation between these two groups. He reconstructs pronouns 1s *a-, 2s *i-, with CS having 1s *ma, 2s *i ~ *mi ~ *nyi in its subfamilies.

So he ends up with (Macro-ES, Macro-CS), Saharan, Koman-Gumuz, Kuliak, Songhay, Shabo -- six branches with very little evident connection.
 
I've seen speculations about Songhay-Saharan and Koman-Gumuz-Shabo, reducing the number of independent branches to four.

Looking at Niger-Congo, its deepest well-established branchings are in West Africa, and its doubtful branchings are in West and North Central Africa: Mande, Dogon, Ijoid, Ubangian, Kordofanian.

Nilo-Saharan's speakers are just to the north and northeast of Niger-Congo's ones, mainly in the northeast, and Afroasiatic speakers even further north and northeast.

So there is a strip just south of the Sahara Desert where people didn't move around very much over much of the Holocene Epoch, if not longer.

Rather curiously, this strip does not have very mountainous topography, except at its east end, at the Ethiopian Highlands. Such topography has enabled people there to continue speaking lots of very different languages in several parts of the world, like New Guinea, parts of Southeast Asia, the west-coast mountains of North America, and the Caucasus Mountains. The Pyrenees Mountains likewise have helped Basque speakers hold out.

 Climate of Africa shows the present climate, with a thick strip of savanna from the southern third of the West African lobe to the Ethiopian highlands. Curiously, Bantu speakers expanded to the south of there into parts with similar climates but not into that Sahara-bordering strip.  Savanna -- grassland with scattered trees.

In the early Holocene, however, Africa was wetter --  African humid period and Studying early human culture in Africa | Turkana Basin Institute and esd.ornl.gov/projects/qen/nercAFRICA.html and Vegetation cover in northern Africa. (A) present day; (B) early Holocene. | Download Scientific Diagram and [PDF] Pleistocene sea-level fluctuations and human evolution on the southern coastal plain of South Africa | Semantic Scholar -- what's now the Sahara Desert was grassland back then, present-day lakes were larger back then, and there were more lakes back then.

So it was easier to expand from some homeland back then.

But in the last Ice Age, it was more difficult, because Africa was drier, with the Sahara Desert extending further south than today, with only a thin strip of grassland and savanna and forest on the south coast of the western lobe:  Last Glacial Maximum
 
I looked for other lexicostatistics work, and I found Hokan_Family_and_Lexicostatistics.pdf

Looks at several languages in that family and finds all but two to be a well-defined genetic grouping. Most of those in the sample are Northern Californian, but mixed in among them are some Southern Californian, Western Arizonan, and Northwestern Mexican branches. Also gives an age estimate of Early Holocene, from language-change rates.

I looked in Penutian and some other proposed macrofamilies of the Americas, but I couldn't find any of that sort of work.
 
I've covered most of humanity's (natural) languages, and I now turn to the  Khoisan languages united only by having lots of clicks in their phonology. It consists of three families, Khoe-Kwadi, Tuu, and Kx'a (Ju-Hoan) in southwestern Africa and two isolates, Hadza and Sandawe, in East Africa. Sandawe is possibly related to Khoe-Kwadi, and George Starostin proposes

( (Khoe-Kwadi, Sandawe), (Tuu, Kx'a) )

with Hadza remaining an isolate.

From
A Lexicostatistical Approach towards Reconstructing Proto-Khoisan by George Starostin (no family trees in it, however, but referred to in that Wikipedia article)
It could thus be argued, in terms of historical typology, that the situation with Khoisan historical phonetics might well be similar to that of, for instance, the Proto-West-Caucasian system. In the latter case, while the actual modern day phonological systems of West Caucasian languages such as Abkhaz and Adygh, already quite rich and complex by themselves, are quite close to each other, the correspondences between them are of an extremely complex nature and betray a proto-system even richer and more complex in oppositions than any of its daughter languages.
How would the original systems come about? From variations of sounds becoming interpreted as separate phonemes? But such reinterpretation may instead have happened in various descendants.
 
I've mentioned that nouns and verbs can be inflected quite a lot.

Nouns:
  • Number -- singular, dual, plural
  • Case -- much like adpositions (prepositions and postpositions)
  • Definiteness -- indefinite: "a/an", definite: "the", partitive: some of, ...

Verbs:
  • Tense -- relative time: past, present, future, ...
  • Aspect -- state of completion: imperfective: incomplete action, perfective: complete action, stative: being in some state, ...
  • Mode (mood) -- indicative: a plain statement, imperative: a command, subjunctive: something hypothetical, optative: something desired, conditional: something dependent on something else happening, potential: being possible, presumptive: being presumed to have happened, ...
  • Voice -- active, passive: object is in subject position, reflexive: on oneself, reciprocal: on each other, causative: make do something, applicative: preposition attached to the verb, ...
  • Evidentiality -- direct experience, was inferred, was told about something, ...

Adjectives can be either noun-like or verb-like, complete with the appropriate inflections, and adjectives have their own inflections, for comparison.

Adverbs usually have no inflection, and many adverbs are derived from adjectives in various ways. Affixes like English -ly are common, though some languages use certain adjective inflections, like Slavic langs, and some use no inflections at all.
 
English has three strategies for doing comparison: phrases and suffixes, phrases for the longer words, suffixes for the shorter words, and in a few cases, suppletion:

A, more A, the most A
A, A-er, the A-est
good, better, the best
bad, worse, the worst

The Romance languages have forms that are literally
A, more A, the more A
good, better, the better
bad, worse, the worse

Spanish, for instance:
A, más, el más A
bueno, mejor, el mejor
malo, peor, lo peor

Latin has
A, A-ior, A-issimus
bonus, melior, optimus
malus, peior, pessimus

So "optimus" and "pessimus" dropped out of the Romance languages, though they were reintroduced in words for optimism and pessimism.

Slavic languages usually have some suffix -i or -iyi for the comparative, and they form the superlative from the comparative, in most of them with the prefix nei-, and in Russian and Belarusian with a separate word, samy.

Ancient Greek has comparative/superlative suffixes -teros/-tatos and sometimes -iôn/-istos

It's hard to reconstruct the Proto-Indo-European comparative and superlative forms, but most of the inflected descendant forms are derived from *-yôs "very, rather", often with other suffixes. Modern Greek keeps that comparative, and uses o "the" with the comparative for the superlative, like the Western Romance languages.

As to the comparison reference, English uses "than", which is related to "then", and derived from Proto-Germanic *than "at that, at that time, then", of obscure origin. The Western Romance ones are derived from Latin quod, "what, that (relative pronoun)", and Latin uses quam, with a similar origin. Slavic ones use the preposition ot "from". Classical Greek uses the genitive case (the of-case), and Modern Greek uses apo "from".
 
That's just Indo-European langs. Here are some examples from elsewhere:

Finnish has comparative -mpi, superlative -in

Hungarian has comparative -bb, superlative leg- -abb
Comparison reference: mint <ref>, <ref>-nal

Turkish Grammar - Comaratives and Superlatives
Turkish uses <ref>-ABL daha <adj> for the comparative, en <adj> for the superlative:

Kazakistan Türkiye'den daha büyük. - Kazakhstan is bigger than Turkey
Bu, hayatımın en mutlu günü! - This is the happiest day of my life!

Thai Language Lessons - Comparitive and Superlative
The word "gwah" makes comparatives:
lek gwah - smaller
noo lek gwah mee-ow - a mouse is smaller than a cat
No inflection or "than" word for the reference.

The word "tee soot" makes superlatives:
yai tee soot - biggest
pooget pben gaw yai tee soot nai pra-tayt thai - Phuket is the biggest island in Thailand


 Adverb - formed in a variety of ways from their corresponding adjectives. Sometimes with a suffix (English -ly, Romance -ment, -mente, ...), or with an inflection (Swedish neuter, Russian short neuter, ...), and sometimes with no change.

 Flat adverb -- in English, adverbs are usually formed with -ly; flat ones are unchanged ones. Flat ones are much less common nowadays than they were in past centuries.

Adverbs also have comparative and superlative forms.
 
Noam Agmon has written about the shift in some Semitic-language ancestor from 2-consonant roots to 3-consonant ones, and I'd mentioned that earlier in that thread.

Materials and Language: Pre-Semitic Root Structure Change Concomitant with Transition to Agriculture in: Brill's Journal of Afroasiatic Languages and Linguistics Volume 2 Issue 1 (2010) -- () - Materials_and_Language_Pre_Semitic_Root.pdf

Statistics of Language Morphology Change: From Biconsonantal Hunters to Triconsonantal Farmers | PLOS ONE by Noam Agmon, Yigal Bloch
Firstly the reconstructed Proto-Semitic fire and hunting lexicons are predominantly 2c, whereas the farming lexicon is almost exclusively 3c in structure. Secondly, while Biblical verbs show the usual Zipf exponent of about 1, their 2c subset exhibits a larger exponent. After the 2c > 3c transition, this could arise from a faster decay in the frequency of use of the less common 2c verbs. Using an established frequency-dependent word replacement rate, we calculate that the observed increase in the Zipf exponent has occurred over the 7,500 years predating Biblical Hebrew namely, starting with the transition to agriculture.
This assumes the Western-Asia homeland hypothesis for the Afro-Asiatic langs, with Semitic speakers descended from stay-at-homes, but I think it more likely that AA originated in NE Africa, and that Proto-Semitic speakers went from there to the Levant, where they then spread out.

But this argument also works for the NE-Africa AA-homeland hypothesis.
 
Back
Top Bottom