• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.
  • 2021 Internet Infidels Fundraising Drive
    Greetings! Time for the annual fundraiser.Sorry for the late update, we normally start this early in October. Funds are needed to keep II and IIDB online. I was not able to get an IIDB based donations addon implemented for this year, I will make sure to have that done for next year. You can help support II in several ways, please visit the Support Us page for more info. Or just click:

    I will try to track all donations from IIDB. Many thanks to those that have already donated. The current total is $493. If everyone dontated just $5, we would easily hit our goal.

Language as a Clue to Prehistory

lpetrich

Contributor
Going further, we find Nostratic. That is roughly (Afro-Asiatic, (Kartvelian, Dravidian, Eurasiatic))

From  Proto-Dravidian language I find an estimated age of about 5,000 years. Its homeland was likely central to south India.

I can't find a date for  Proto-Kartvelian language, but the speakers of the present-day Kartvelian languages live in the southern Caucasus mountains, suggesting that homeland for their ancestral language's speakers.

Turning to the Afro-Asiatic family, with members Berber (North Africa), Chadic (North-Central Africa), Cushitic (East Africa, Horn of Africa), Egyptian (Nile River), Omotic (East Africa), and Semitic, and the best-known subfamily of it is the that last one of mostly Middle-Eastern languages.  Proto-Semitic language mentions an estimated divergence type of 3,750 BCE, nearly 6,000 years ago. But homeland proposals jump around like crazy: the Levant, the Arabian Peninsula, the Sahara Desert, and the Horn of Africa.

I've also found Appendix:proto-Semitic stems - Wiktionary

 Proto-Berber language - a recent dispersion around the time of the Roman Empire, and splitting off from other Afro-Asiatic speakers around 10,000 - 9,000 years ago, likely due to the Neolithic dispersion of farmers and herders.

 Proto-Afroasiatic language - some 16,000 to 12,000 years ago. The lower value is about when agriculture was invented in the Middle East.

 Afroasiatic Urheimat - Levant, Red Sea / Horn of Africa, North Africa, Sahara / Sahel

The hypothesis of spread with agriculture (farming) / pastoralism (herding) points to the Levant.
 

lpetrich

Contributor
There is another propose Eurasian big family roughly comparable to Nostratic: the  Dené–Caucasian languages

Its members are Na-Dene (North America), Yeniseian (near that Siberian river), Burushaski (N Pakistan), North Caucasian, and Vasconic (Basque and Aquitanian in SE Europe).

Though Dene-Caucasian is usually considered doubtful at best,  Dené–Yeniseian languages seems well-supported.

Dene-Caucasian likely split up about 10,000 years ago, with Na-Dene splitting off first. A lot of these early split dates are very hand-wavy, I must concede.

A possible subdivision is Macro-Caucasian, containing Basque, North Caucasian, and Burushaski.

 Proto-Eskimo - The Eskimo and Aleut branches split around 4,000 years ago, with the Eskimo speakers spreading across the North American Arctic to Alaska, N Canada, and Greenland.

Before the ancestral Eskimo-Aleut and Na-Dene speakers arrived in the New World, the ancestral Amerind speakers did so. Amerind is Joseph Greenberg's name for his classification of nearly all the New-World languages. However, it has been strongly criticized, and it is widely rejected.

A common Amerind feature is first-person n- and second-person m-. By comparison, a Eurasiatic feature is first-person m- and second-person t-.
 

lpetrich

Contributor
Turning to  Linguistic homeland or Urheimat (German: "original home") I find a collection of homeland hypotheses.

Like saying that the Turkic homeland is not known with any confidence, and that it may not be in Mongolia but somewhere else between the Caspian Sea and

The Algonquian languages were spread over much of North America, including a thick swath across southern Canada east of the Rocky Mountains, the Atlantic coast to North Carolina, Michigan - Illinois - Tennessee, and Nebraska - Colorado. The  Proto-Algonquian language was spoken some 2,500 - 3,000 years ago, though we are not sure where.

With the langues Wilmot and Yurok of the North Coast of California, they form Algic, and  Proto-Algic was likely spoken some 7,000 years ago, likely in the Columbia Plateau or some other northwestern location.

The Uto-Aztecan languages were spoken in the southwestern United States and western and southern Mexico. From  Proto-Uto-Aztecan language - "Authorities on the history of the language group have usually placed the Proto-Uto-Aztecan homeland in the border region between the United States and Mexico, namely the upland regions of Arizona and New Mexico and the adjacent areas of the Mexican states of Sonora and Chihuahua, roughly corresponding to the Sonoran Desert and the western part of the Chihuahuan Desert. It would have been spoken by Mesolithic foragers in Aridoamerica, about 5,000 years ago."
 

lpetrich

Contributor
I round out Eurasia with Southeast Asia.

I start with the Tai languages, with national languages Thai and Lao of Thailand and Laos. The  Proto-Tai language has been reconstructed. This family is in the  Kra–Dai languages, likely originating in southern China from several early branchers being there.

Its closest relative may be Austronesian:  Austro-Tai languages

Also in Southeast Asia are the Austroasiatic languages, including Vietnamese and Khmer (Cambodian).  Proto-Austroasiatic language likely dates back to 2,000 BCE, 4,000 years ago, and was spoken either in the Mekong basin or between it and the Yangtze River.

Rounding it out are the Hmong-Mien or Miao-Yao languages, scattered over Southeast Asia and Southern China.  Proto-Hmong–Mien language has some very different dates for it: 2,500 and 4,300 years ago.

Most of these Southeastern Asian languages have mostly analytic grammar, with verb tense and aspect done with adverbs.


How might they be related? A prominent hypothesis is  Austric languages


Some recent research has been done with the  Automated Similarity Judgment Program - The ASJP Database -
It uses a 40-word Swadesh-based word list and a standardized spelling. But it has not been widely accepted.
 

lpetrich

Contributor
Can we go even further? Some linguists have done so. Proposed by Harold C. Fleming and Sergei Starostin,  Borean languages includes all but sub-Saharan Africa, Australia, New Guinea, and the Andaman Islands.

Seems to me that Borean vs. non-Borean corresponds to non-Negroid (light skin, straight or loosely-curled hair) vs. Negroid (dark skin, tightly-curled hair). So Borean was likely spread by some offshoot human population colonizing northern Eurasia in the latest part of the last glacial era.

SS dated Borean at 16,000 years ago, in the Upper Paleolithic, roughly around when the  Last Glacial Maximum ended. SS's family tree of languages and well-established language families:

  • Altaic: Japonic, Koreanic, Turkic, Tungusic, Mongolic, Nivkh, Ainu
  • Indo-Uralic: Indo-European, Altaic, Uralic, Yukaghir
  • Paleosiberian: Eskimo–Aleut, Chukotko-Kamchatkan
  • Nostratic: Indo-Uralic, Paleosiberian, Sumerian, Elamite, Kartvelian, Dravidian, Afroasiatic
  • Sino-Caucasian: Sino-Tibetan, Burushaski, North Caucasian, Hattic, Hurro-Urartian
  • Dene-Caucasian: Yeniseian, Na-Dene, Iberian, Basque, Sino-Caucasian
  • Austro-Tai: Austronesian, Tai–Kadai
  • Austric: Austro-Tai, Hmong–Mien, Austroasiatic
  • Dene–Daic: Dene-Caucasian, Austric
Eurasiatic = Indo-Uralic + Paleosiberian

Some of SS's colleagues believe that the comparative method has given strong evidence for Eurasiatic and Dene-Caucasian, but not as much for Afroasiatic and Austric. They and SS are not sure about Amerind, though HF includes Amerind in Borean. I've seen "SCAN" for this version of Borean: Sino-Caucasian-Amerind-Nostratic.
 

lpetrich

Contributor
Distant Language Relationship: The Current Perspective by Murray Gell-Mann, Ilya Peiros, George Starostin

Estimates these split times:
  • Sino-Caucasian: 10 KYA, North Caucasian: 6 KYA, Sino-Tibetan: 6 KYA, Yeniseian: 2 KYA, Basque: modern, Burushaski: modern
  • Eurasiatic: 12 KYA, Altaic: 8 KYA, Eskimo: 2 KYA, Dravidian: 5 KYA, Indo-European: 7 KYA, Uralic: 6 KYA, Kartvelian: 4–3 KYA
  • Afroasiatic: 12 KYA, Omotic: 7 KYA, Cushitic: 9 KYA, Chadic: 7 KYA, Semitic: 7 KYA, Berber: 3 KYA
  • Austric: 10 KYA, Austronesian: 5 KYA, Tai-Kadai: 5 KYA, Austroasiatic: 7 KYA, Miao-Yao: 4 KYA
These dates are extrapolations with statistics, and for Austronesian, I don't know how much it includes non-Malayo-Polynesian members -- that date is not far off for MP itself.

Sino-Caucasian likely also includes at least some of the European Neolithic substrate. This is from such likely substrate borrowings as Proto-Germanic *lambaz (English "lamb", etc.).
 

lpetrich

Contributor
Turning to Australia, most of the Aboriginal languages are in the Pama-Nyungan family, a family that covers nearly all of Australia except the more west of the northernmost parts of that continent. The  Proto-Pama–Nyungan language was likely spoken about 5,000 years ago in the Gulf Plains of NE Australia (the SE shore of the big gulf in N Australia).

The non-Pama-Nyungan ones have more obscure relations, though I've seen  Macro-Pama–Nyungan languages

As to what caused that colonization of 5,000 years ago, it may be some ecological issue like climate getting more hostile, like what happened to the Sahara Desert over the Holocene. Rainfall regimes of the Green Sahara | Science Advances
During the “Green Sahara” period (11,000 to 5000 years before the present), the Sahara desert received high amounts of rainfall, supporting diverse vegetation, permanent lakes, and human populations. Our knowledge of rainfall rates and the spatiotemporal extent of wet conditions has suffered from a lack of continuous sedimentary records. We present a quantitative reconstruction of western Saharan precipitation derived from leaf wax isotopes in marine sediments. Our data indicate that the Green Sahara extended to 31°N and likely ended abruptly. We find evidence for a prolonged “pause” in Green Sahara conditions 8000 years ago, coincident with a temporary abandonment of occupational sites by Neolithic humans. The rainfall rates inferred from our data are best explained by strong vegetation and dust feedbacks; without these mechanisms, climate models systematically fail to reproduce the Green Sahara. This study suggests that accurate simulations of future climate change in the Sahara and Sahel will require improvements in our ability to simulate vegetation and dust feedbacks.
That 5,000 years agrees with the end of the "Green Sahara" period. Whatever made "Green Sahara" may also have made "Green Australia". Australia's desert covers roughly the same range of latitudes as the Sahara, though on the opposite side of the Equator, and both deserts are the result of large-scale atmospheric circulation. So whatever might have made the Sahara less dry would likely have made Australia less dry. I'd have to look for Australia paleoclimate research.
 

Swammerdami

Squadron Leader
Staff member
Take care when guessing a language family's origin from the present location of its speakers. Algonquin was common in the Northeast U.S. when Europeans arrived, but probably originated far to the West since Blackfoot is its most divergent member. Similarly Dravidian is common in Southern India but probably originated far from there. Part of the evidence for that is that Brahui, found in Pakistan, is the family's most divergent member. Basque's distant ancestral language didn't come from northern Spain; instead it survived there because of the relative isolation of that mountainous region. Similarly, Caucasian language families need not have originated in the Caucasus Mountains.

Prof. Merritt Ruhlen divides all the world's languages into 12 families:
Kartvelian
Dravidian
Amerindian
Niger-Kordofanian (Niger-Congo)
Eurasiatic
Australian
Afro-Asiatic
Dene-Caucasian
Nilo-Saharan
Austric
Indo-Pacific
Khoisan​
Ruhlen isn't afraid to posit even larger macro-families, with all languages ultimately descended from a single proto-language — indeed he provides such a speculative language tree in his 26 year-old book — but it makes sense to "draw a line" somewhere.

I've listed the 12 language families very roughly by decreasing certainty: Kartvelian (aka South Caucasian) is a tiny language family devoid of any controversy. Dravidian Proper is a large but non-controversial family; I think the combination of Dravidian Proper and Brahui to form Dravidian is well accepted. At the other end, Khoisan, at least when Hatsa and Sandawe languages are included, is regarded by many as a paraphyletic grouping of languages which all happen to use click-sounds; and Indo-Pacific is a loose aggregation of little-studied languages.

The farther back in time we go, and the looser the language macro-families become, the harder it is IMO to derive interesting conclusions about pre-history.

Some speculations I find interesting are:
  • The original homeland of Dravidian is a mystery. Snake worship is a part of Dravidian culture leading some to imagine an exodus from Egypt!
  • Within Austric, Austronesian and Daic are especially close. This has led some to posit a migration from the Philippines back to the mainland, with proto-Daic drastically influenced by mainland languages like Chinese.
  • Prof. Dixon argues that Australian forms a distinct family NOT because the Australian languages all share a recent genetic ancestor, but for the opposite reason! The Australian languages are so ancient that their long-ago alignment into genetic groups has been rendered invisible; instead borrowings and Sprachbunds over hundreds of centuries have created a single large, if fuzzy, family.
  • Roger Blench claims that the entire Niger-Congo family is a branch within the Central Sudanic branch of Nilo-Saharan.
  • I show Amerindian as the 3rd most certain of the 12 families; ahead of Niger-Congo where some question the affiliation of Kordofanian with Niger-Congo proper.
As an interested layman observing the authorities debate, I sometimes feel that the quality of debate by so-called experts becomes a more interesting topic than the subject of the debate! (A few weeks ago I started a thread in Miscellaneous on another such controversy.) The evidence that Amerindian is a single language family is very strong; Sapir and Greenberg — the two greatest historical linguists ever — regarded it as clear-cut. Google-search for excerpts from the anti-Amerindian papers: you can almost feel the spittle fly as these professors lose their objectivity in some strange fear or hatred. (There is a Wiki on Amerind languages.)
 

Politesse

Sapere aude
I don't think most people are "enraged" at Edward Sapir or Joseph Greenberg! The field was very different when they were doing most of their work. Indeed, Sapir's vigorous documentation of the North American languages gave us a hefty portion of our data set. But it is in the nature of the sciences for theory to change and evolve as new facts come to light and old guiding paradigms transmute to new ones. I can see where debates and academic politics can seem off-putting or even ridiculous to the interested layman, but mutual critique and review are actually a very important part of the scientific process by which we improve our theories over time.
 

lpetrich

Contributor
Turning to New Guinea, not much work has been done on the  Papuan languages. "Statistical analyses designed to pick up signals too faint to be detected by the comparative method, though of disputed validity, suggest five major Papuan stocks (roughly Trans–New Guinea, West, North, East, and South Papuan languages)" I've found  Proto-Trans–New Guinea language but not much else. Joseph Greenberg has suggested "Indo-Pacific" for the Andaman Islands, New Guinea, and Tasmania, though not Australia, but that is not very widely accepted.


So we have one region remaining: Africa south of the Sahara Desert. This the place where humanity originated, where our present human species emerged from an earlier species. Joseph Greenberg's macro-classification here is generally accepted, unlike his macro-classifications for other places. He proposed four macrofamilies:
  • Afro-Asiatic
  • Niger-Congo
  • Nilo-Saharan
  • Khoisan
This is rather unlike what one might expect from humanity emerging there -- Africans' languages falling into numerous families with no clear relationship between them. Like what we find in New Guinea, but on a larger scale. So let us take a closer look at the three macrofamilies that I have not discussed yet.
 

lpetrich

Contributor
First,  Niger–Congo languages sometimes called Niger-Kordofanian ones.

It is something of a hand-wave, it must be conceded. Its highest-level split is between the  Atlantic–Congo languages and some languages in West Africa, mostly "Far West" Africa: Ijo, Dogon, Mande, Katla, and Rashad. Mande, especially, is only doubtfully related to the rest of Niger-Congo.

Turning to Atlantic-Congo, its genetic unity is generally accepted. It is composed of  Volta–Congo languages and some Far-West "Atlantic" languages.

In turn, Volta-Congo is composed of  Benue–Congo languages and several West African languages, like Volta-Niger langs Igbo and Yoruba, and the North-Central Zande langs.

The Benue-Congo langs extend only a tiny bit west of the Central-Southern-Africa Atlantic coast. Wiktionary lists Appendix:proto-Benue-Congo reconstructions - Wiktionary and notes a book, The Noun-Class System of Proto-Benue-Congo | De Gruyter

I checked that Wiktionary pages, and the two sets of reconstructions differ almost completely. The first style is from de Wolf 1971, and the second style from Blench 2004.
belly *-bumu
belly *-mani
20. #-koo belly

buffalo *-zati
buffalo *-poŋ
145. #(n)-(g)yati buffalo

egg *-kiŋ, *-tiŋ
4. #eje egg

headpad *-kata
12. ekãta head-pad

salt *-nunu
salt *-mu
307. #mana salt

scorpion *-nan
scorpion *-get
15. #keNkere scorpion

water *-izi (±)
water *-ni (±)
218. #-mbal- water
 

lpetrich

Contributor
The Benue-Congo langs contain the Cross-River ones of Nigeria and nearby, and the  Bantoid languages. Those are in turn divided into  Northern Bantoid languages and  Southern Bantoid languages, and the latter contains the  Bantu languages The non-Bantu Bantoid languages are a tiny sliver on the northwest end of the total distribution.

The Bantu languages are spoken over much of central and southern Africa, and they are very recognizably related. A  Proto-Bantu language has been reconstructed with some success. The Bantu languages have several noun classes marked out with prefixes, and Proto-Bantu is reconstructed as also having them. The classes are marked out with prefixes, with singular and plural ones unpredictably different, and they are used for both adjective and verb agreement.

Appendix:Swahili noun classes - Wiktionary

Examples:

Watu wazuri wawili wale wameanguka (watu = people; -zuri = good; -wili = two; -le = those; -anguka = fall down)
Kenya (“Kenya”) → Mkenya (“Kenyan”)
-gonjwa (“sick”) → mgonjwa (“sick person”)
tende (“date”) → mtende (“date palm”)
Uingereza (“England”) → Kiingereza (“English”)
-baya (“bad”) → ubaya (“badness”)
-Ganda (“Ganda”) → Uganda (“Uganda”)
Kristo (“Christ”) → Ukristo (“Christianity”)
-soma (“read”) → kusoma (“reading; to read”)

Also
Kiswahili ("Swahili language") - Waswahili("Swahili people")
 

lpetrich

Contributor
So we get a picture of speakers of Proto-Niger-Congo living in Far-West Africa and inventing agriculture. They slowly spread eastward, and when they reach what's now Cameroon, they start spreading much faster.

 Proto-Bantu language,  Bantu expansion - the Proto-Bantu speakers likely lived some 3,500 - 4,000 years ago (1,500 - 2,000 BCE) in Cameroon. They then spread eastward to the Pacific coast, then southward into southern Africa. Some of them went southward along the Atlantic coast to southern Africa.

Thus, Africa's Holocene prehistory resembles that of Europe and the Pacific islands, where a population of farmers spread over a large land area, mixing with and displacing the people already present. While that is evident from the language history of Africa (Niger-Congo) and the Pacific islands (Austronesian), it is much less apparent in Europe, because a later migration came along and erased most of the earlier one's linguistic evidence. There is a little bit that survives, like various words for "goat", "sheep", "rye", "barley", "chickpea", and the like.


That Sahara wet period was the  African humid period - roughly 14,500 years ago to 5,500 years ago. Afro-Asiatic speakers would have had an easier time spreading during it than today.


 Nilo-Saharan languages is not very strongly supported. "Nilo-Saharan languages present great differences, being a highly diversified group. It has proven difficult to reconstruct many aspects of Proto-Nilo-Saharan. Two very different reconstructions of the proto-language have been proposed by Lionel Bender and Christopher Ehret."


 Khoisan languages - they share click consonants, but not much else. It has three families, Khoe-Kwadi, Kx'a, and Tuu, and two isolates, Hadza and Sandawe, with very little evidence of relationship.

Some Southern African Bantu languages have also have clicks, but they may have been borrowed from local Khoisan speakers.
 

lpetrich

Contributor
 Sub-Saharan Africa is Africa south of the Sahara Desert. From "Genetic history",
In addition, whole genome sequencing analysis of modern populations inhabiting sub-Saharan Africa has observed several primary inferred ancestry components: a Pygmy-related component carried by the Mbuti and Biaka Pygmies in Central Africa, a Khoisan-related component carried by Khoisan-speaking populations in Southern Africa, a Niger-Congo-related component carried by Niger-Congo-speaking populations throughout sub-Saharan Africa, a Nilo-Saharan-related component carried by Nilo-Saharan-speaking populations in the Nile Valley and African Great Lakes, and a West Eurasian-related component carried by Afroasiatic-speaking populations in the Horn of Africa and Nile Valley.
noting
Early Back-to-Africa Migration into the Horn of Africa
The genetics of East African populations: a Nilo-Saharan component in the African genetic landscape | Scientific Reports

Pygmies themselves nowadays speak Bantu languages, so they can't provide much linguistic evidence.
 

Swammerdami

Squadron Leader
Staff member
... I can see where debates and academic politics can seem off-putting or even ridiculous to the interested layman, but mutual critique and review are actually a very important part of the scientific process by which we improve our theories over time.

I'm not alone in finding Ruhlen's critics overly shrill. Michael Witzel, distinguished Harvard Professor and President of the Association for the Study of Language in Prehistory since 1999 writes:

Michael Witzel said:
Another Severe Attack on Ruhlen

Some linguists criticized me last year for being too harsh in some of my comments on linguists. I said I was sorry. But now, good colleagues one and all can read something very very harsh from the other side. Even Lyle Campbell and Ives Goddard are pussycats compared to some of the critics of Greenberg, and more recently Ruhlen.

Get a copy of Anthony Grant's review of Merritt Ruhlen's On the Origin of Languages: Studies in Linguistic Taxonomy, 1994 which appears in Anthropological Linguistics 37, number 1, 1995,93-96. After reading that piece of academic Schadenfreude, no one will ever again accuse me or Lyle Campbell of being harsh. By the way, that journal (AL) seems to have joined Language and IJAL in being totally biased. Like the three famous monkeys: see no evil, hear no evil and speak no evil -- where the Amerind theory is evil incarnate, in the body of Joe Greenberg. Heavens!

I'd post more examples, but paywalls are popping up everywhere.
 

lpetrich

Contributor
The best-known of the Bantu languages is Swahili, an East African language widely learned there as a second language.
Here are some others. I've listed those that have grammar descriptions in Wikipedia:
 Proto-Bantu language lists the reconstructed noun classes:
  • 1 *mu- human being, animate
  • 2 *ba- (plural of 1)
  • 3 *mu- plant, natural force, body part, inanimate
  • 4 *mi- (plural of 3)
  • 5 *li- (various)
  • 6 *ma- (plural of 5), liquids (mass nouns)
  • 7 *ki- various, tools, diminutives, manner/way/language
  • 8 *bi- (plural of 7)
  • 9 *n- animals, inanimate
  • 10 *din- (plural of 9, 11)
  • 11 *lu- abstractions, things with extended outline shapes, various
  • 12 *ka- diminutives
  • 13 *tu- (plural of 12)
  • 14 *bu- abstractions
  • 15 *ku- infinitive / gerund of verb ("-ing")
  • 16 *pa- location on (proximal, exact)
  • 17 *ku- location at (distal, approximate)
  • 18 *mu- location in (interior)
  • 19 *pi- diminutive
There are possible additional ones, like *ghi- augmentative (big version of something).
 

lpetrich

Contributor
To that list of Bantu languages I add  Bemba language,  Luganda,  Kinyarwanda (all (East) Central African),  Herero language,  Otjiherero grammar (Southwest African)

Hendrikse and Poulos have proposed this sequence of semantics:
  • 1/2, 3/4, 9/10 -- nouns - concreteness (five senses)
  • 5/6, 7/8, 11
  • 12/13, 19, 20, 21, 22 -- adjective-like nouns - attribution (two senses)
  • -
  • 16, 17, 18, 23 - adverb-like nouns - spatial orientation (one sense)
  • 14
  • 15 -- verb-like nouns - abstractness (no sense)
In addition to the 1-19 I'd earlier listed, some linguists have proposed 20 *ghu- putative, 21 *ghi- augmentative, 22 (?), 23 *i- locative

The noun-class prefixes are used for adjective agreement and as subject and object prefixes for verbs. Pronouns also have prefix forms for verbs.

I couldn't find much more about Proto-Bantu grammar in the Wiki article -- it mostly discussed the noun classes and their prefixes.

I looked elsewhere, and I've found Did the Proto-Bantu verb have a Synthetic or an Analytic Structure? - lots of separate words or mashed together to form one big word?

Author Derek Nurse concludes that Proto-Bantu verbs were on the synthetic side, much like most present-day Bantu languages, like Swahili. He also concludes that Proto-Niger-Congo verbs were on the analytic side, much like many present-day West African languages. But he doesn't go into a lot of detail about what can be reconstructed.

Reconstructing the Proto-Bantu Verbal Unit: Internal Evidence - author Larry Hyman notes Achille Meussen's reconstruction:

Verbal Unit = (pre-stem) + (stem)

The stem part is generally accepted:
(verb root) + (optional extension) + (inflectional final vowel)

The pre-stem part is less generally accepted, but according to AM:
Pre-stem = (pre-initial) + (subject) + (negative) + (tense) + (formative) + (object)

He argues that the subject and the object were already prefixes in Proto-Bantu, though he is unsure about the rest. Some tense markers likely originated from auxiliary verbs, for instance.
 

lpetrich

Contributor
 Bantu languages -  Proto-Bantu language


Superfamilies:
 Southern Bantoid languages (  Northern Bantoid languages )
 Bantoid languages

Non-Bantu Bantoid is mainly at the southern Nigeria-Cameroon border, while Bantu is over a much larger area.

Reconstructing Benue-Congo Person Marking I. Proto-Bantoid

Bantu personal pronouns:
  • subject pronoun / verb prefix
  • object pronoun / verb affix
  • possessive pronoun / noun or verb suffix
  • independent stressed (emphatic) subject pronoun
Reconstructed Proto-Bantu:
  • 1s: subject prefix *nyi-, non-subject *(a)me, independent *(i)me
  • 2s: subject prefix *v-, object affix *-kv-, possessive *(a)we, independent *(i)we
  • 1p: subject prefix *tv-, non-subject *ac(u)e, independent *(be)c(u)e / *ic(u)e
  • 2p: subject prefix *mv-, non-subject *an(u)e, independent *(be)n(u)e / *in(u)e
Reconstructed Proto-Bantoid:
  • 1s: subject prefix *nyi-, non-subject+independent *(a)me
  • 2s: subject prefix *v-, non-subject+independent *(a)we
  • 1p: subject prefix *tv-, non-subject+independent *(be)c(u)e
  • 2p: subject prefix *mv-, non-subject+independent *(be)n(u)e
 

lpetrich

Contributor
Going back further with Bantu < Bantoid:
 Benue–Congo languages
Going further into Nigeria.

Proto-Benue-Congo word list - Voorhoeve, Jan and Paul P. de Wolf. Benue-Congo noun class systems. Leiden: West African Linguistic Society, Afrika-Studiecentrum, 1969

Repeated in Appendix:proto-Benue-Congo reconstructions - Wiktionary without the noun classes

I'll list them:
  • *a SC 2
  • *bi SC 1
  • *bu SC 4 - abstractions (Bantu 14)
  • *bu/*a 8
  • *bu/*í 8
  • *ì SC 5
  • *ì/*í 41 - animals, some inanimate (Bantu 9)
  • *ka SC 2
  • *ka/*í 3
  • *ka/*ti 1
  • *ki/*a 2
  • *ki/*bi 18 - inanimate, some body parts, some animals (Bantu 7, 8)
  • *ku/*a 16
  • *ku/*í 7 - body parts
  • *li SC 6
  • *li/*a 44 - body parts (Bantu 5)
  • *lu*í 2
  • *ma SC 12 - liquids (Bantu 6)
  • *ù/*ba 24 - human beings (Bantu 1, 2)
  • *ú/*í 15
  • *ú/*ti 10
All of them are of inanimate objects in general unless indicated otherwise. Some of them are recognizable as Bantu ones.
 

lpetrich

Contributor
 Volta–Congo languages (  Volta–Niger languages )
On the west, Liberia and Côte d'Ivoire (Ivory Coast), and extending eastward, including a strip north of the Bantu reginon.

The Volta-Niger subgroup contains Igbo (Ibo) and Yoruba, two languages of Nigeria.

Neither language has noun classes, though Yoruba has different interrogative pronouns for human vs. nonhuman: "ta ni" ("who?") and "ki ni" ("what?"). Or more generally, reasoning vs. nonreasoning.


 Atlantic–Congo languages
On the west coast of Africa: Senegal to Liberia, also some spots to the north.

Then finally,
 Niger–Congo languages
It covers the remaining bits of West Africa, inland of Senegal-Liberia.

That article lists it as "hypothetical", and it has a question for after every branch but Atlantic-Congo, branches like Mande and Dogon.

It also has (noun classes) for Atlantic-Congo.
 

lpetrich

Contributor
 Proto-Niger–Congo language - there is a big gap in listed protolanguages between that and Proto-Bantoid (no separate Wiki article) and  Proto-Bantu language

Niger-Congo -- Atlantic-Congo -- Volta-Congo -- Benue-Congo -- Bantoid -- Southern Bantoid -- Bantu

Some linguists reconstruct noun classes for Proto-Niger-Congo, though the family has a lot of variety in them.

A survey of Niger-Congo noun class agreement systems -- Hepburn-Gray.pdf
Gives these examples:

Otoro ethnonyms:
gwu-toro "Toro person", li-toro "Toro people", o-toro "Toro land", dhi-toro "Toro language"

Baïnounk Gubëeher botanical names:
so-dooma "kaba tree", bu-domma "kaba-tree fruit", ja-dooma "kaba-tree leaves", tin-dooma "kaba-tree sap"

Otoro is one of the  Talodi–Heiban languages, a branch of Kordofanian, and I've found (PDF) Do Heiban and Talodi form a genetic group and how are they related to Niger-Congo? | Roger Blench - Academia.edu
Talodi-Heiban is likely in Atlantic-Congo.

Niger-Congo languages - Widespread characteristics of Niger-Congo languages | Britannica
The number of noun classes varies from language to language. Within the Atlantic branch, for instance, the number of noun classes varies from 3 to nearly 40. In the Gur branch 11 classes are most commonly found. In Bantu languages 12 to 15 noun classes frequently occur, and early Bantu, as reconstructed by scholars, is thought to have had some 23 noun classes.

Atlantic = Atlantic languages | African language | Britannica =  West Atlantic languages (Atlantic-Congo)

Gur is  Gur languages (Volta-Congo)

 Proto-Niger–Congo language itself mentions Kordofanian (Talodi-Heiban?), (West) Atlantic , Oti-Volta (Gur), Kwa (in Volta-Congo), Benue-Congo, and Bantu.

So all these examples are Atlantic-Congo, and some are restricted further, in Volta-Congo, Benue-Congo, and Bantu.
 

lpetrich

Contributor
Niger-Congo Noun Classes: Reconstruction, Historical Implications, and Morphosyntactic Theory - ProQuest

Asking "Is the NC-type system unique?" and addressing issues like "Probability of chance resemblance" and "Can a noun class system be borrowed?"

Discussing:
  • Benue-Congo incl. Bantu (in Volta-Congo)
  • Kwa: Guang, Ghana-Togo-Mountain (in Volta-Congo)
  • Atlantic: Fula-Sereer, Bainounk-Kobiana-Kasanga, Tenda, Cangin, Bak, Mel (in Atlantic-Congo)
  • Kordofanian: Heiban, Talodi, Rashad, Katloid (in Atlantic-Congo?)
  • Kru (in Niger-Congo)
  • Gur-Adamawa: Gur, Senufo, Adamawa, Ubangi (in Volta-Congo)
Noun classes:
  • 1 *gwu-,*a-|*gwu-
  • 1a *gwu-,*a-|*∅-
  • 2 *ba-|*ba-
  • 2a *ba-|*-VN
  • 3 *gu-|*gu-
  • 4 *gi-|*gi-
  • 5 *di-|*di-
  • 6 *Nwa-|Nwa-
  • 22 *mi-|*mi-
  • 23 *ma-|*ma-
Why the jump in numbering? Then discusses "Noun Class Evidence for Volta-Congo" and "Is there evidence for an Atlantic-Congo branch?"

Seems like much of the evidence is rather restricted in distribution.
 

lpetrich

Contributor
Looking at putative Niger-Congo languages outside of Atlantic-Congo,  Dogon languages,  Mande languages, and  Ijoid languages they lack a noun-class system and their preferred word order is subject-object-verb, not what is usual: subject-verb-object.


In addition to noun classes, one can reconstruct verbal extensions, at least for Atlantic-Congo. These are suffixed to the verb root, and they have such meanings as passive voice, causative, and reciprocal ("each other").


As an aside, in addition to Appendix:Swahili noun classes - Wiktionary Wiktionary has Appendix:Swahili verbs - Wiktionary and Appendix:Swahili verbal derivation - Wiktionary


Reconstruction of Proto-Niger-Congo pronouns:
  • 1s: *mV(front)
  • 2s: *mV(back)
  • 1p: *TV(close)
  • 2p: *NV(close)
 

lpetrich

Contributor
Turning to Proto-Niger-Congo numerals, I have found two different reconstructions, by Pozdniakov and by Güldemann (2-5 only):
  • 1: *ku-(n)-di (> ni/-in), *do, *gbo/*kpo
  • 2: *ba-di -- *Ri
  • 3: *tat / *tath -- *ta(C)
  • 4: *na(h)i -- *na(C)
  • 5: *tan, *nu(n) -- *nU
  • 6: 5+1
  • 7: 5+2
  • 8: *na(i)nai (< 4 reduplicated)
  • 9: 5+4
  • 10: *pu / *fu
  • 20: < ‘person’
The best matches are for 3 and 4, with 2 and 5 having rather limited matches. I checked on Appendix:proto-Niger-Congo numerals - Wiktionary

Here are some more general lists:
 

lpetrich

Contributor
I'll give some examples of Swahili, likely the best-known of the Bantu languages. I used Google Translate, so it may be a bit too schematic.

I have a good book.
I have one good book.
I have good books.
I have two good books.
I have three good books.

Nina kitabu kizuri.
Nina kitabu kimoja kizuri.
Nina vitabu vizuri.
Nina vitabu viwili vizuri.
Nina vitabu vitatu vizuri.

BTW, the Swahili word for book is borrowed from Arabic kitab. Its first syllable was then reinterpreted as a Swahili noun-class prefix.

I went through a lot of other words:

Dream
Nina ndoto nzuri.
Nina ndoto nzuri.
Thought
Nina mawazo mazuri.
Nina mawazo mazuri.
Eye
Nina jicho zuri
Nina macho mazuri
Hand
Nina mkono mzuri.
Nina mikono nzuri.
Bed
Nina kitanda kizuri.
Nina vitanda vizuri.
Key
Nina ufunguo mzuri.
Nina funguo nzuri.
Car
Nina gari nzuri.
Nina magari mazuri.
Rock
Nina mwamba mzuri.
Nina miamba mzuri.
Star
Nina nyota nzuri.
Nina nyota nzuri.
Lake
Nina ziwa zuri.
Nina maziwa mazuri.
Tree
Nina mti mzuri.
Nina miti mizuri
Flower.
Nina maua mazuri.
Nina maua mazuri.
Butterfly
Nina kipepeo mzuri.
Nina vipepeo wazuri.
Dog
Nina mbwa mzuri.
Nina mbwa wazuri.
Cat
Nina paka mzuri.
Nina paka nzuri.
Lion
Nina simba mzuri.
Nina simba wazuri.
Elephant
Nina tembo mzuri.
Nina tembo wazuri.
Sister
Nina dada mzuri.
Nina dada wazuri.
Teacher
Nina mwalimu mzuri.
Nina walimu wazuri.
 

lpetrich

Contributor
Sometimes a word is distinguished by noun class, like ndege: "bird" when animate, "airplane" when inanimate:

The big bird arrived.
The big birds arrived.
The big airplane arrived.
The big airplanes arrived.

Ndege mkubwa alifika.
Ndege wakubwa walifika.
Ndege kubwa ilifika.
Ndege kubwa zilifika.

One can see some object agreement on the verb:

I love my sister.
I love my sisters.
Nampenda dada yangu.
Nawapenda dada zangu.
Lake
Ninapenda ziwa langu.
Ninapenda maziwa yangu.
Book
Ninapenda kitabu changu.
Ninapenda vitabu vyangu.

Some verb conjugation. Notice a sort of negative conjugation:

I love my sister.
I don't love my sister.
Nampenda dada yangu.
Sipendi dada yangu.
You
Unampenda dada yangu.
Humpendi dada yangu.
He/She
Anampenda dada yangu.
Hampendi dada yangu.
The cat
Paka anampenda dada yangu.
Paka hapendi dada yangu.
The flower
Maua yanampenda dada yangu.
Maua hayampendi dada yangu.
The tree
Mti unampenda dada yangu.
Mti haumpendi dada yangu.
The computer
Kompyuta inampenda dada yangu.
Kompyuta haimpendi dada yangu.
The book
Kitabu kinampenda dada yangu.
Kitabu hakimpendi dada yangu.

Possessive pronouns:

I love my sister.
Nampenda dada yangu.
Your
Nampenda dada yako.
The cat's
Nampenda dada wa paka.
The tree's
Nampenda dada wa mti.
My cat
Ninampenda paka wangu.
Flower
Ninapenda maua yangu.
Book
Ninapenda kitabu changu.
 

lpetrich

Contributor
I love my sister.
Nampenda dada yangu.
Will love
Nitampenda dada yangu.
Was loving
Nilikuwa nikimpenda dada yangu.
Loved
Nilipenda dada yangu.
Have loved
Nimempenda dada yangu.
Would love
Ningempenda dada yangu.
Should love
Ninapaswa kumpenda dada yangu.
Might love
Naweza kumpenda dada yangu.

They love.
They are loved.
They love each other.

Wanapenda.
Wanapendwa.
Wanapendana.
 
Can and canino are valid terms for the same animal in continental Spanish as well. Perro is preferred informally and in the colonias, but the latin-related term wasn't lost. Many think that perro began as some manner of onomatopeia, since it doesn't seem to have any equivalents in neighboring languages.
Curiously, in English exactly the same process happened to exactly the same word. "Dog" is preferred informally, but the Germanic-related term "hound" is still a valid term for the same animal, and "hound" is the Germanic cognate of Latin "canis", and nobody knows where "dog" came from, and onomatopeia has been proposed.
Academic Etymologists Miss the Obvious. That's the Way They Are.

"Dog" has prehistoric origins. It is related to the Latin "digitus" and means pointer. Also, Greek "deiknumi," which means "show" and is the origin of "paradigm."
 

Copernicus

Industrial Grade Linguist
Can and canino are valid terms for the same animal in continental Spanish as well. Perro is preferred informally and in the colonias, but the latin-related term wasn't lost. Many think that perro began as some manner of onomatopeia, since it doesn't seem to have any equivalents in neighboring languages.
Curiously, in English exactly the same process happened to exactly the same word. "Dog" is preferred informally, but the Germanic-related term "hound" is still a valid term for the same animal, and "hound" is the Germanic cognate of Latin "canis", and nobody knows where "dog" came from, and onomatopeia has been proposed.
Academic Etymologists Miss the Obvious. That's the Way They Are.

"Dog" has prehistoric origins. It is related to the Latin "digitus" and means pointer. Also, Greek "deiknumi," which means "show" and is the origin of "paradigm."

I have never heard of this claim. The etymology of "dog" still remains something of a mystery. Can you back this claim up with a reference?
 
Turning to grammar, some things look very different. Latin had several noun cases, while the Romance languages have much fewer. But there is more similarity than what one might at first think. The Romance languages carry over Latin's prepositions, and use some of them to substitute for cases. To illustrate, let us consider "horse's head" or "head of the horse".

Latin: caput equi
(equi: genitive or of-case for equus, "horse")

Italian: testa di cavallo
Spanish: cabeza de caballo
Portuguese: cabeça de cavalo
French: tête de cheval
Romanian: capul calului

All of them are "head of horse" or "head of-horse" with that word order. Most of the Romance languages use descendants of the Latin preposition "de" ("from") for "of". Note also the word-form substitutions. For "horse", the Romance words descend from Late Latin "caballus", becoming common in the last few centuries of the Western Roman Empire. The words for "head" are more complicated, with "caput" descendants surviving in some cases, but being replaced by descendants of Latin "testa" ("pot") in others. The descendants of Latin "caput" in Italian and French are "capo" and "chef", both meaning "leader", as English "head" sometimes does.

Although this is simpler than Latin, the Western Romance languages have some complications. In particular, they have a definite article, a word for "the", a word that Latin lacks. In most of the Romance languages, it is derived from Latin "ille", meaning "that". They also have a lot of contractions of definite articles with prepositions.

French:
  • de + le = du, à + le = au
  • de + les = des, à + les = aux
The other combinations are written separately: de la and à la. Writing -ux instead of -us is a French spelling quirk.

Spanish:
  • de + el = del, a + el = al
The others are written separately here also.

Italian:
  • il (masc. sg. bf cons.) di + il = del
  • lo (masc. sg. bf c clus.) di + lo = dello
  • l' (sg. bf vowel) di + l' = dell'
  • la (fem. sg. bf cons.) di + la = della
  • i (masc. pl. bf cons.) di + i = dei
  • gli (masc. pl. bf vwl/clus.) di + gli = degli
  • le (fem. pl.) di + le = delle
All contracted here. The prepositions da ("from"), a ("to"), in ("in"), and su ("on") are similar, though in becomes ne-.
The Uneducated Slur, Transpose, Shorten, and Fancify. The Romance Languages Are Latin Ebonics.

Caballus is actually derived from equus. Ekuvus Kuvus Kubus Cubus Cabus Caballus.
 

lpetrich

Contributor
The Uneducated Slur, Transpose, Shorten, and Fancify. The Romance Languages Are Latin Ebonics.

Caballus is actually derived from equus. Ekuvus Kuvus Kubus Cubus Cabus Caballus.
Pure Goropianism.  Johannes Goropius Becanus (1519-1573) is known for this:
Goropius theorized that Antwerpian Brabantic, a particular dialect of Dutch spoken in the region between the Scheldt and Meuse Rivers, was the original language spoken in Paradise. Goropius believed that the most ancient language on Earth would be the simplest language, and that the simplest language would contain mostly short words. Since Brabantic has a higher number of short words than do Latin, Greek, and Hebrew, Goropius reasoned that it was the older language.[citation needed]

A corollary of this theory was that all languages derived ultimately from Brabantic. For example, Goropius derived the Latin word for "oak", quercus, from werd-cou (Brabantic for "keeps out cold"). Similarly, he derived the Hebrew name "Noah" from nood ("need"). Goropius also believed that Adam and Eve were Brabantic names (from Hath-Dam, or "dam against hate", for "Adam", and from Eu-Vat ("barrel from which people originated") or Eet-Vat ("oath-barrel") for "Eve", respectively). Another corollary involved locating the Garden of Eden itself in the Brabant region. In the book known as Hieroglyphica, Goropius also allegedly proved to his own satisfaction that Egyptian hieroglyphics represented Brabantic.[citation needed]
 
Can and canino are valid terms for the same animal in continental Spanish as well. Perro is preferred informally and in the colonias, but the latin-related term wasn't lost. Many think that perro began as some manner of onomatopeia, since it doesn't seem to have any equivalents in neighboring languages.
Curiously, in English exactly the same process happened to exactly the same word. "Dog" is preferred informally, but the Germanic-related term "hound" is still a valid term for the same animal, and "hound" is the Germanic cognate of Latin "canis", and nobody knows where "dog" came from, and onomatopeia has been proposed.
Academic Etymologists Miss the Obvious. That's the Way They Are.

"Dog" has prehistoric origins. It is related to the Latin "digitus" and means pointer. Also, Greek "deiknumi," which means "show" and is the origin of "paradigm."

I have never heard of this claim. The etymology of "dog" still remains something of a mystery. Can you back this claim up with a reference?
Links Are Part of a Chain. Anyone Flashing Links Is Pulling Your Chain.

What you hear should not be respected, especially as objecting to such an obvious derivation. Deal with digitus and deiknumi, not with academic scribbles. Those who devote themselves to non-practical subjects like etymology are self-indulgent escapists and not very intelligent. We shouldn't look up to them as final authorities. Apply to them the attitude expressed in "War is too important to be left to the generals."
 

Copernicus

Industrial Grade Linguist
...
I have never heard of this claim. The etymology of "dog" still remains something of a mystery. Can you back this claim up with a reference?
Links Are Part of a Chain. Anyone Flashing Links Is Pulling Your Chain.

What you hear should not be respected, especially as objecting to such an obvious derivation. Deal with digitus and deiknumi, not with academic scribbles. Those who devote themselves to non-practical subjects like etymology are self-indulgent escapists and not very intelligent. We shouldn't look up to them as final authorities. Apply to them the attitude expressed in "War is too important to be left to the generals."

OK, so I see that as an admission that you have no source at all for your information, nor do you have any clue how etymologists actually do research. You expect me to trust your intuition over that of people who actually know what they are talking about, claiming that such people are somehow less knowledgeable or intelligent than you. We shouldn't take them as authorities, but we should take you as an authority on a subject that you know nothing about.

So, thanks for nothing, and thanks for letting us all know that you are clueless on this subject. It saves me taking the trouble to try to educate you further, since I have been one of those "academic scribblers" that you seem to detest. ;)


A big problem. Latin d- and Greek d- correspond regularly with Germanic t-, like duo ~ two.

Reconstruction: Proto-Indo-European/deyḱ- - Wiktionary

With descendants like English toe, token, teach, Latin digitus "finger, toe", dîcere "to say, ...", Greek dikê "custom, law, judgment", deiknûmi "I show, point out", ...

Of course, the problem is that loan words screw up sound correspondences, and nobody knows for sure when the word entered English. It may not form a cognate set within the Germanic branch, or it could have been borrowed into Old English from some other historical dialect of Germanic or even Brythonic.
 

Swammerdami

Squadron Leader
Staff member
In America there are many named varieties of dog: setter, Labrador, boxer, beagle, poodle, terrier, collie etc. The etymologies of these variety names are varied and uninteresting. If a group of migrants arrived unwilling to learn English, they might be slightly surprised by the difference between indigenous canines and their home's varieties, and pick the name of some preferred variety -- say collie, for example -- and eventually promote that name (whether originating in pre- or post-migration home) to become a generic word for dog.

Could this have been the situation when Anglo-Saxons arrived in Britain? They focused on one-particular variety, and whether its name was Germanic or Celtic, exercised that one specific varietal name, despite that it had never previously been a generic word for dog.

@ Experts, is this plausible? In such a case, trying to trace 'dog' to some pre-English generic word would be a fool's errand; instead the word would have some obscure etymology lost in time.
 

Copernicus

Industrial Grade Linguist
In America there are many named varieties of dog: setter, Labrador, boxer, beagle, poodle, terrier, collie etc. The etymologies of these variety names are varied and uninteresting. If a group of migrants arrived unwilling to learn English, they might be slightly surprised by the difference between indigenous canines and their home's varieties, and pick the name of some preferred variety -- say collie, for example -- and eventually promote that name (whether originating in pre- or post-migration home) to become a generic word for dog.

Could this have been the situation when Anglo-Saxons arrived in Britain? They focused on one-particular variety, and whether its name was Germanic or Celtic, exercised that one specific varietal name, despite that it had never previously been a generic word for dog.

@ Experts, is this plausible? In such a case, trying to trace 'dog' to some pre-English generic word would be a fool's errand; instead the word would have some obscure etymology lost in time.

That's plausible, but it's still just speculation. Sage was just speculating that the word had something to do with "point", and he speculated that that had something to do with a Greek or Latin root. I believe that pointers were first bred in England, so there might be some kind of plausibility to such a speculation. Until you realize that the breed was created in the 18th century, but the etymology of "dog" goes back to Old English, before English had gone through the Great English Vowel Shift (GEVS). That is, it had to enter the language relatively soon after West Germanic and Norse invaders settled in Britain. (See  Pointer (dog breed).)
 

lpetrich

Contributor
 roto-Indo-European verbs - it's remarkable how much grammar one can find for Proto-Indo-European.

Like distinguishing between "thematic" and "athematic" verb conjugations. Athematic onse have personal endings directly attached to the verb stem. The present of "to be" is an athematic one:
*h1és-ti ~ Skt asti ~ Latin est ~ German ist ~ English is
*h1s-énti ~ Skt santi ~ Latin sunt ~ German sind ~ Old English sind
(notice the ablaut in the stem; its vowel gets reduced to zero grade when the accent moves onto the ending)

Thematic ones have a vowel in between, something much more common.

There are a variety of affixes that can be used to form aspects.

For imperfective verbs, there is reduplication, repeating the first consonant and sticking a vowel in between. Two other ones are an -n- infix and a -nu suffix, and these ones are likely related. There were also -ye-, -ske-, -se-, -eh1-, -eye- (causative, something repeated), -(h1)se- (something desirable), -ye-, and -h1-.

English "break" and Latin frangô ("I break") are cognates, from *bhreg- But notice the -n- in the Latin word. It's absent from frêgi "I have broken" and frâctus "broken" (passive participle).

For perfective verbs, there is the -s- suffix and rarely reduplication. That -s- appears in the Greek "sigmatic aorist", and in several Latin 3rd-declension perfectives, like dîcô "I say" and dîxi "I have said".

For stative verbs, there is reduplication.
 

lpetrich

Contributor
Origins of ‘Transeurasian’ languages traced to Neolithic millet farmers | Language | The Guardian

Transeurasian is what some m Macro-Altaic, and it includes ordinary Altaic (Turkic, Mongolic, and Tungusic), and also Koreanic and Japonic.
This language family’s beginnings were traced to Neolithic millet farmers in the Liao River valley, an area encompassing parts of the Chinese provinces of Liaoning and Jilin and the region of Inner Mongolia. As these farmers moved across north-eastern Asia over thousands of years, the descendant languages spread north and west into Siberia and the steppes and east into the Korean peninsula and over the sea to the Japanese archipelago.

...
The researchers devised a dataset of vocabulary concepts for the 98 languages, identified a core of inherited words related to agriculture, and fashioned a language family tree.

Archaeologist and study co-author Mark Hudson said the researchers examined data from 255 archaeological sites in China, Japan, the Korean peninsula and the far east of Russia, assessing similarities in artifacts including pottery, stone tools and plant and animal remains. They also factored in the dates of 269 ancient crop remains from various sites.

The researchers determined that farmers in north-eastern China eventually supplemented millet with rice and wheat, an agricultural package that was transmitted when these populations spread to the Korean peninsula by about 1300BC and from there to Japan after about 1000BC.

... The originators of the Sino-Tibetan language family farmed foxtail millet at roughly the same time in China’s Yellow River region, paving the way for a separate language dispersal, Robbeets said.
Nice to see some resolution of that issue.
 

lpetrich

Contributor
Triangulation supports agricultural spread of the Transeurasian languages | Nature - journal paper.
The ancestor of the Mongolic languages expanded northwards to the Mongolian Plateau, Proto-Turkic moved westwards over the eastern steppe and the other branches moved eastwards: Proto-Tungusic to the Amur–Ussuri–Khanka region, Proto-Koreanic to the Korean Peninsula and Proto-Japonic over Korea to the Japanese islands (Fig. 1b).

Through a qualitative analysis in which we examined agropastoral words that were revealed in the reconstructed vocabulary of the proto-languages (Supplementary Data 5), we further identified items that are culturally diagnostic for ancestral speech communities in a particular region at a particular time. Common ancestral languages that separated in the Neolithic, such as Proto-Transeurasian, Proto-Altaic, Proto-Mongolo-Tungusic and Proto-Japano-Koreanic, reflect a small core of inherited words that relate to cultivation (‘field’, ‘sow’, ‘plant’, ‘grow’, ‘cultivate’, ‘spade’); millets but not rice or other crops (‘millet seed’, ‘millet gruel’, ‘barnyard millet’); food production and preservation (‘ferment’, ‘grind’, ‘crush to pulp’, ‘brew’); wild foods suggestive of sedentism (‘walnut’, ‘acorn’, ‘chestnut’); textile production (‘sew’, ‘weave cloth’, ‘weave with a loom’, ‘spin’, ‘cut cloth’, ‘ramie’, ‘hemp’); and pigs and dogs as the only domesticated animals.
So they didn't have cows or horses, and they didn't have that stereotypical eastern Asian grain: rice.
By contrast, individual subfamilies that separated in the Bronze Age, such as Turkic, Mongolic, Tungusic, Koreanic and Japonic, inserted new subsistence terms that relate to the cultivation of rice, wheat and barley; dairying; domesticated animals such as cattle, sheep and horses; farming or kitchen tools; and textiles such as silk (Supplementary Data 5). These words are borrowings that result from linguistic interaction between Bronze Age populations speaking various Transeurasian and non-Transeurasian languages.
Genetic evidence?
We report genomic analyses of 19 authenticated ancient individuals from the Amur, Korea, Kyushu and the Ryukyus and combined them with published genomes that cover the eastern steppe, West Liao, Amur and Yellow River regions, Liaodong, Shandong, the Primorye and Japan between 9500 and 300 bp (Fig. 3a, Extended Data Fig. 4, Supplementary Data 11, 13, 17).
So the descendants of the West Liao farmers spread westward and northward and eastward, much as other inventors of farming had done.
 

lpetrich

Contributor
The Late Bronze Age saw extensive cultural exchange across the Eurasian steppe, which resulted in the admixture of populations from the West Liao region and the Eastern steppe with western Eurasian genetic lineages. Linguistically, this interaction is mirrored in the borrowing of agropastoral vocabulary by Proto-Mongolic and Proto-Turkic speakers, especially relating to wheat and barley cultivation, herding, dairying and horse exploitation.

Around 3300 bp, farmers from the Liaodong–Shandong area migrated to the Korean peninsula, adding rice, barley and wheat to millet agriculture. This migration aligns with the genetic component modelled as Upper Xiajiadian in our Bronze Age sample from Korea and is reflected in early borrowings between Japonic and Koreanic languages. Archaeologically it can be associated with agriculture in the larger Liaodong–Shandong area without being specifically restricted to Upper Xiadiajian material culture.

In the third millennium bp, this agricultural package was transmitted to Kyushu, triggering a transition to full-scale farming, a genetic turn-over from Jomon to Yayoi ancestry and a linguistic shift to Japonic. ...

Turning to how they did their research, they used a Swadesh-like list of relatively stable meanings to look for cognates.

They used these ages as calibration: (Japonic 2100 bp ± 175, Koreanic 800 bp ± 175, Turkic 2100 bp ± 175, Mongolic 750 bp ± 50, Tungusic 1900 bp ± 275). Japonic = Japanese - Ryukyuan (in the Ryukyu Islands)

BP = Before Present (1950 CE)

Structure of Transeurasian language family revealed by computational linguistic methods | Max Planck Institute for the Science of Human History

Finding
(
( (Turkic: 100 BCE, Mongolian: 1250 CE): 2000 BCE , Tungusic: 100 CE): 3500 BCE
(Japonic: 100 BCE, Koreanic: 1200 CE): 2500 BCE
): 5000 BCE
 

lpetrich

Contributor
In the West Liao basin, agriculture started with the growing of broomcorn millet around 9000 BP (7000 BCE).

Bayesian phylolinguistics reveals the internal structure of the Transeurasian family | Journal of Language Evolution | Oxford Academic

They used 23 out of 27 present-day Turkic languages and also Old Turkic, 10 out of 17 Mongolic languages and also Written Mongolian and Middle Mongolian, 10 out of 13 Tungusic languages and also Manchu and Jurchen, Korean and also Middle Korean, and Japanese and 5 out of 14 Ryukyuan languages and also Old Japanese.

They also used a 200-word version of the  Leipzig–Jakarta list, a list of meanings with seldom-replaced word forms, a list derived using statistics on etymologies. Something like the Swadesh lists, with a lot of overlap.
It is highly unlikely that all similarities between the basic items in our dataset are the result of contact instead of genealogical relationship. Traditionally, the strength of basic vocabulary lies in the fact that words with basic meanings tend to resist borrowing more successfully than random lexical items. The very fact that we find 150 Transeurasian etymologies covering 107 distinct basic vocabulary concepts thus is a strong argument against borrowing by itself. In addition, we can advance other arguments against borrowing, such as (1) the misfit with the expected borrowing hierarchy; (2) the misfit with the expected typology of verbal borrowing; (3) the regularity and complexity of sound correspondence; (4) the occurrence of broken contact chains; (5) the multiple setting; and (6) the well-spread distribution of the cognates; see also Robbeets (in press, 2019).
They then go into detail.

"First, among the concepts of the Leipzig-Jakarta list, we find fifty-nine actions, thirty-two property words, twenty-three deictic or grammatical items and eighty-six nominal concepts." and "Empirically, it is observed that languages tend to borrow lexical items more easily than grammatical ones and nouns more easily than verbs (a.o. Wohlgemuth 2009; Matras 2009; Tadmor et al. 2010)." That is, words with standalone meanings tend to be borrowed more easily than words with meanings connected to other words. "In contrast to this tendency, there are more correlations for verbs (65%) and deictic and grammatical items (57%) in the Transeurasian basic vocabulary than for nouns (43%)."

"Second, as far as the mechanisms of loan verb accomodation are concerned, most recipient languages can be categorized into two distinct groups: borrowed verbs either arrive as verbs, needing no formal accommodation, or, they arrive as nonverbs and need formal accommodation. Most Transeurasian languages can be assigned to the second group because they display a clear preference for the nonverbal strategy (Wohlgemuth 2009: 159, 161)." In Japanese, borrowed verbs often have form "to do" <verb> (<verb> suru).

"Third, the comparative sets for basic vocabulary display regular correspondences for each consonant of the verb root and for each but the root-final vowel, conform to the requirements in Supplementary Data (SI 1)." Usually close to other work, like Palaeolexicon - Table of (Macro-) Altaic Phonology

"Fourth, gaps in the attestation of members of an etymology, whereby a cognate is absent in one or more intermediate contact branches are indicative of borrowing."

"Fifth, most examples of borrowing have a binary setting in common: they typically go from a model language into a recipient language. Especially for verbs and grammatical markers, examples of the same item progressing into a third or fourth language are relatively rare." - but it is grammatical items and verbs that are best-preserved of the cognate list used.

"Finally, the distribution of a certain basic item to a single language or to only few languages of a certain subgroup could serve as an indication of borrowing. However, such cases do not occur among our basic vocabulary etymologies."
 

lpetrich

Contributor
Here are the ranges of dates that they used for the four families with more than one member.
WhatLowerUpper
Proto-Turkic500 BCE100 CE
Proto-Mongolic1000 CE1300 CE
Proto-Tungusic600 BCE500 CE
Proto-Japonic200 BCE500 CE

"Linguists and archeologists associate proto-Japonic with the beginnings of Yayoi-culture (900 BCE–300 CE) on the Japanese Islands. ... The ancestor of the languages now spoken in the Ryukyuan Islands is thought to have remained in northeastern Kyushu until around 900 CE, when full-scale agriculture was introduced to the Ryukyus."

"Proto-Mongolic is nearly equivalent with the language spoken by the historical Mongols around the time of the Mongol Empire (1206–1368), which is documented in historical sources, written in several different scripts and collectively termed Middle Mongol."

The first split in Turkic is between Oghuric / Bulgaric and Common Turkic. Of the former, Chuvash is the only survivor with Bulgar and Khazar now dead, with the latter containing all other present-day Turkic languages, including Turkish.

Oghuric is sometimes called Lir Turkic and Common Turkic Shaz Turkic from these sound correspondences:
Proto-TurkicOghurCommon
*lylsh
*ryrz

They considered hypotheses
( (Tungus, JK), (Turk, Mongol) )
( Tungus, JK, (Turk, Mongol) )
( JK, ( Turk, (Tungus, Mongol) ) )
( JK, ( Tungus, (Turk, Mongol) ) ) -- their best fit.
-- essentially various placements of Tungusic in (JK, (Turk, Mongol)) with JK = (Japonic, Korean)
 

Copernicus

Industrial Grade Linguist
I simply don't buy the Bayesian analysis, which is entirely untested and subject to possible biases inherent in setting up priors. I don't think that there is any way to measure how strongly languages are affected by borrowed traits. For example, the Breton language has been hugely affected by French, which almost completely reworked its system of time and tense reference. English has quite a few very strong effects of borrowing from both Scandinavian and French influence, some of which significantly reworked its syntactic structure and morphology. I don't have any further knowledge of this study than what has been released to the public, but it is being met with more skepticism in the linguistic community than the popular media, which is to be expected, I suppose. It's fun to speculate, but going back more than a few thousand years is really making a lot of leaps.
 

Swammerdami

Squadron Leader
Staff member
It sounds like the Transeurasian Hypothesis lpetrich writes about is based on the Swadesh-like list, and NOT on grammatical characters.

Ringe et al did consider grammatical and phonological characters as well as lexical when they reconstructed the tree structure of the Indo-European family. But as discussed in the long linked-to article they tried to exclude all but rare and irreversible characters.

For me — as I mentioned (#92) last year in this thread — the most interesting thing about Ringe's study is the anomalous status of Germanic. His software was unable to attach Germanic consistently. Approaching the I-E expansion from the archaeological direction, dots can be connected to other I-E families (e.g. Glob Amphora ↣ Bell Beaker ↣ Urnfeld Bronze ↣ Hallstatt Iron ↣ Celtic) but Germanic is much less clear. Was not Corded Ware supplanted by Comb Ceramic in Sweden; and might its successors have been dominant in the Nordic Bronze Age? I think there were at least THREE languages that contributed to proto-Gemanic: Centum, Satem, and a non-IE language spoken by TRB or Comb Ceramic. There was likely a stage of creolization. Germanic has a non-IE word for king/koenig which might suggest an unconquered people adopting I-E voluntarily.

Is the linguistic evidence enough to construct an hypothesis for the early evolution of proto-Germanic?
 

lpetrich

Contributor
Looking at Don Ringe's paper, he finds a reasonably good tree if he omits Germanic.
  • Root: (Anatolian, (Tocharian, Classic))
  • Classic: (Albanian, (Italo-Celtic, Core))
  • Italo-Celtic: ((Welsh, Old Irish), (Latin, (Oscan, Umbrian)))
  • Core: ((Greek, Armenian), (Balto-Slavic, Indo-Iranian))
  • Balto-Slavic: (Old Church Slavonic, (Old Prussian, (Lithuanian, Latvian)))
  • Indo-Iranian: (Vedic, (Avestan, Old Persian))

"Classic" refers to the conception of Proto-Indo-European before Hittite and Tocharian were discovered.
All the inflectional characters that give any precise information about the position of Germanic - namely M5, M6 and M8 - place it in the large subgroup that also includes Balto-Slavic, Indo-Iranian and Greek; and since those are the characters that are the most reliable indicators of genetic descent, it appears that Germanic should be placed in what we are calling the core of the family ± the residue after the departure of Anatolian, Tocharian and Italo-Celtic.
Authors Don Ringe and Ann Taylor conclude that early Germanic was an offshoot of Core IE, but one whose speakers lived near some early Celtic or Italo-Celtic speakers and got a lot of vocabulary from them.

The three Italic languages have the usually-accepted subgrouping, as do the three Baltic ones and the three Indo-Iranian ones.
 

Swammerdami

Squadron Leader
Staff member
Yes, Germanic's sources include both a "core" language (Satem like Balto-Slavic, or para-Satem like Albanian) and a Western Centum language (Italic or Celtic), but there must have been a THIRD source as well. I think the third source was a sea-faring Baltic people, either the  Pitted Ware culture or the  Pit–Comb Ware culture.

The sea-faring terms Ship, Sail, Sea, Seal, Keel, Eel possibly Ice and perhaps even Boat are all non-IE words found in both West Germanic and North Germanic. The Finnic (or Fennic) language is often associated with these Scandinavian seal-hunters but I don't think any of the eight words just mentioned has a clear Uralic cognate. Although 'Boat' has a possible PIE etymology (*bheid- "to split"), cognates of Boat in Romance languages are considered borrowings from Germanic. (And Irish bád is borrowed from Old English.)

Basic vocabulary words found in both Western and Northern Germanic but not in other I-E languages include finger, toe, neck, bone, wife, oak, berry and even horse.

Ocean-going ships were in use in the Baltic as early as 2500 BC, about the same time as Corded Ware farmers arrived in Sweden. But some fisher-gatherers of Sweden rejected farming and adopted a rich economy on the shores of the Baltic. They could trade furs and amber for agricultural products; or even use their sea-going skills as pirates to raid and steal what they wanted.

The Nordic Bronze Age was centered in Sweden, not Germany or Denmark. I think the "proto-Vikings" — whose existence isn't even hinted at in Barry Cunliffe's otherwise excellent Europe Between the Oceans — gained control during that Age. (Perhaps their sea-faring skills gave them access to the English tin needed for bronze.) At some point they switched to the I-E (Corded Ware) language of those they conquered but they retained some of their old language, calling their king Kuningaz instead of Rēx, and so on.

The origin of the Germanic people and their language is surely a fascinating story but one we'll never be able to reconstruct. Still, I think linguistics may offer some clues.
 

lpetrich

Contributor
I simply don't buy the Bayesian analysis, which is entirely untested and subject to possible biases inherent in setting up priors. ...
Bayesian phylolinguistics reveals the internal structure of the Transeurasian family | Journal of Language Evolution | Oxford Academic

I checked on "Methods" and the authors used some existing software that is often used in molecular phylogeny. BEAST: Bayesian Evolutionary Analysis Sampling Trees - BEAST Software - Bayesian Evolutionary Analysis Sampling Trees | BEAST Documentation - BEAST 2

 Bayesian inference in phylogeny describes how it works. Since the number of possible family trees grows factorially with the number of end nodes or leaves, one cannot do a completely comprehensive search. For n end nodes, one must search this number of trees:

(2n-3)!! = (2n-3)*(2n-5)*...*5*3*1

So one must do some random sample of them, and the way to do that is to compare a tree to randomly-generated tweaks of it, then repeat with a good one of these trees:  Markov chain Monte Carlo I say a good one and not the best one, unless one wants to do hill climbing. Using a good one means more sampling of the space of possibilities.

 Computational phylogenetics -  List of phylogenetics software -  List of phylogenetic tree visualization software

Bayes's inverse-probability theorem: for data values D and hypotheses H:

\( \displaystyle{ P(H \text{ if } D) = \frac{ P(D \text{ if } H) P(H) } { P(D) } ,\ P(D) = \sum_H P(D \text{ if } H) P(H) } \)

The P(H) values are the prior values of hypotheses H.
 

lpetrich

Contributor
The development of negation in the Transeurasian languages by Martine Robbeets

A  Negative verb is a verb that expresses negation, with the main verb as its object, like the English auxiliary "do not". Negative verbs are common in the Uralic languages, and also in the Transeurasian ones.

Negative verbs often get turned into auxiliary verbs, and then into particles or prefixes or suffixes. English "not" is a negation particle, and Turkish and Japanese both use negation suffixes.

With a negative verb, the main verb is originally a non-finite (non-inflected) form (participle, infinitive). In Uralic, inflection gradually gets transferred from the negative verb to the main verb, in this order: voice, aspect, mood, tense, person/number, imperative.

MR reconstructs negative verbs Proto-Transeurasian *ana-, Proto-Altaic *e-, Proto-Turkic *ma-

Transeurasian basic verbs: copy or cognate?
"Copy" is a MUCH better word than "borrow", because the original words are unchanged.
Empirically it is observed that languages tend to copy nouns more easily than verbs (e.g. Moravcsik 1975, 1978, Muysken 2000, Wichmann & Wohlgemuth 2008, Wohlgemuth 2009, Matras 2009, Tadmor et al. 2010). From the seventeeth to the nineteenth century, for instance, Japanese underwent intensive contact from Dutch leading to the global copying of over 300 words and the selective copying of syntax, but Japanese did not copy a single verb from Dutch (Irwin 2011). The relative stability of verbs is interrelated with a number of factors, such as the fact that verbal semantics tend to be less culturally determined than the meanings of nouns, that verbs are less perceivable as a distinct unit because they need more adaption to the morpho-syntactic frame of the sentence, and that there simply are less verbs than nouns.
The Swadesh Lists are short on verbs, but a recent list of stable meanings, the Leipzig-Jakarta one, adds a lot of verbs.

Loan verbs in a typological perspective

"Direct insertion" - using a root form directly, "indirect insertion" - using some affix, "light-verb strategy" - using some verb like "to do"

The latter is a common strategy in Transeurasian languages like Turkish and Korean and Japanese.
 

lpetrich

Contributor
 Jespersen's Cycle described by linguist Otto Jespersen a century ago.

Words for negation get used a lot, so they tend to become weakened. The speakers then press some other word into service to make negation more prominent, and the same thing then happens. Like in French:

Jeo ne dis
Je ne dis pas
Je dis pas

"Pas" is literally "step", and English words like "no" and "not" have similar origins: "no" from "ne any" (also makes "none") and "not" from "ne aught" ("anything", making "not" literally "nothing").
 

lpetrich

Contributor
A very recent coinage is "to click", to press a mouse button because of the sound it then makes.
  1. Spanish: clicar, cliquear, hacer clic ("make click") (Catalan, Portuguese similar)
  2. Greek: κάνω κλικ, κλικάρω
  3. Turkish: klik-le-, klik et-
  4. Japanese: クリックする kurikku suru ("do click")
Sources:
  • 1, 2: "Loan verbs in a typological perspective"
  • 3: "Transeurasian basic verbs: copy or cognate?"
  • 1, 4: click - Wiktionary
Several languages use direct insertion, like German klicken, Danish klikke, Swedish klicka, French cliquer, Italian cliccare, Polish klikać, Thai klik, etc.

Illustration of the three strategies inside English itself:
  • Direct insertion: to click
  • Indirect insertion: to clickify
  • Light verb: to do click

As to word morphology, I recall reading that a little after 1800, Danish linguist Rasmus Rask showed that the Germanic languages were a well-defined family by describing their grammatical similarities, because vocabulary comparisons seemed too prone to borrowing. (BTW, Otto Jespersen was also Danish).

For verbs, their simple past tenses and past/passive participles are very obviously related. There are several classes of "strong" verbs, those with vowel shifts, and one class of "weak" verbs, those with -ed and its cognates.
 
Top Bottom