Language as a Clue to Prehistory

Swammerdami · Mar 22, 2025

Ringe, for his classification of IE languages, identified grammar features that were UNLIKELY to be borrowed.
(If interest I'll try to hunt a reference.)

lpetrich · Mar 22, 2025

The authors used artificial neural networks to fit their data, though they went into detail on training strategies, there were some details unmentioned. Were the ANN's supervised-learning or unsupervised-learning ones? In supervised learning, the ANN has some target that it tries to fit. Unsupervised learning is essentially a search for clusters. Related to that is principal components analysis, a search for the directions in data space with the most variation.

It's apparently supervised learning, going from two-consonant word shape to family membership, as far as I can tell.

They used two lexical databases, Lexibank (Swadesh 100-word) and ASJP (original 40-word), and also a grammatical database, Grambank.

They calculated a baseline case for comparison: ASJP 84%, Lexibank 83%. Training and testing their ANN's, they found ASJP 80%, Lexibank 84%, Grambank 68%, and combined 88%.

They did several tests, like training on the core Indo-European langs and testing on the two outliers, Anatolian and Tocharian. They used Lexibank, because of a lack of coding for Grambank, and they found 100% success.

For Sino-Tibetan, they trained on Tibeto-Burman and tested on Sinitic, the Chinese languages. Lexibank: 97.5%, combined: 98%, Grambank: most members as either Hmong-Mien or Austroasiatic. That points to areal effects. Classifiers or measure words are a good example of such an effect.

For Uto-Aztecan, they trained on the southern branch and tested on the northern branch. Lexibank: 40%, Grambank: 10%, usually Cariban (S America), Chibchan (C & S America), or Pama-Nyungan (Australia), combined: 26%.

Then some isolates.

Bangime (Mali, W Africa)

Bangime is classified consistently as Dogon in the Lexibank model (95%) and as Mande (66%) or Atlantic-Congo (29%) in the Grambank model. Consequently, the combined model primarily has Bangime unclassified (86%).

Basque (Pyrenees, SW Europe)

Basque on the other hand remains mostly unclassified in both the Lexibank (46%) and the Grambank model (47%), although the latter also tends to propose an affiliation with Sino-Tibetan (39%). The combined model proposes an affiliation with Indo-European (23%) or Sino-Tibetan (18%) but also includes the unclassified affiliation (32%).

Kusunda (Nepal, Himalayas)

In the Lexibank model, Kusunda is affiliated either with Nuclear Trans-New Guinea (51%), Austroasiatic (19%), or left unclassified (17%). The Grambank model mostly affiliates Kusunda with the Sino-Tibetan family (60%). The combined model mostly proposes no affiliation of Kusunda with other language families (79%).

Mapudungun (SE S America)

Mapudungun is split between several language families in the Lexibank model: Timor-Alor-Pantar (29%), Austronesian (18%), or Unclassified (11%). The Grambank model mostly suggests a Salishan affiliation (68%) or no affiliation (17%). The combined model, again, finds no clear affiliation pattern (88%).

Kusunda - seems like some Trans-New-Guinea speakers went all the way to Nepal, then were influenced by Sino-Tibetan grammar.

The closeness between Basque and Indo-European suggested by the combined model has been recently suggested using both traditional and computational methods Blevins (2018); Blevins and Sproat (2021). The connection of Basque with Sino-Tibetan suggested by the Grambank model and the connection with Nakh-Daghestanian suggested by the Lexibank model would fit with the far-ranging proposal of a Sino-Caucasian macro-family, in which scholars at times include Basque (Starostin, 2017).

I've seen Basque proposed to be recognizably related to Indo-European, but that seems to me very unconvincing. Stronger IMO is its being related to North Caucasian, at least according to John Bengtson's work. JB pointed out cognates in highly-stable vocabulary, and at least in this test, Euskaro-Caucasian seems about as strong as Indo-Uralic.

From our findings, we can make three conclusions, (a) language affiliation achieves promising results even for language relations way back in time, (b) grammar alone is not sufficient for a successful affiliation, and (c) combined models seem to work very well, reflecting that languages are best affiliated by using lexicon plus a bit of grammar.

That's why long-rangers focus on lexicon -- it's hard to find very long-lived bits of grammar.

lpetrich · Mar 22, 2025

Swammerdami said:
I've always been most impressed by the methods and results of Ringe and Warnow. I attach their proposed chronology below.
I see that the initial split in Italic between LA (Latin) and OS-UM occurs at 1300 BC, exactly as lpetrich proposes.

Indo-European and Computational Cladistics by Don Ringe, Tandy Warnow, and Ann Taylor

Some more recent work:

Rapid radiation of the inner Indo-European languages: an advanced approach to Indo-European lexicostatistics by Alexei S. Kassian, Mikhail Zhivlov , George Starostin , Artem A. Trofimov , Petr A. Kocharov , Anna Kuritsyna and Mikhail N. Saenko
Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages by Paul Heggarty, Cormac Anderson, Matthew Scarborough, Benedict King, Remco Bouckaert, Lechosław Jocz, Martin Joachim Kümmel, Thomas Jügel, Britta Irslinger, Roland Pooth, Henrik Liljegren, Richard F. Strand, Geoffrey Haig, Martin Macák, Ronald I. Kim, Erik Anonby, Tijmen Pronk, Oleg Belyaev, Tonya Kim Dewey-Findell, Matthew Boutilier, Cassandra Freiberg, Robert Tegethoff, Matilde Serangeli, Nikos Liosis, Krzysztof Stroński, Kim Schulte, Ganesh Kumar Gupta, Wolfgang Haak, Johannes Krause, Quentin D. Atkinson, Simon J. Greenhill, Denise Kühnert, and Russell D. Gray
Informal review of Heggarty et al. 2023 (Indo-European phylogeny) by Alexei S . Kassian and George Starostin
Phylogeny of the Indo-European languages: state of the art (EAA, Belfast, 2023) by Alexei S. Kassian

Paul Heggarty et al. find somewhat greater ages than the others by a few thousand years, but that may be due to the model for dating that they used.

They all agree on grouping Greek and Armenian together, Anatolian branching off first, then Tocharian. But beyond that, they don't agree very much:

Ringe-Warnow: (Ital, Celt), (Germ, (Gk-Ar, (Bl-Sl, In-Ir) ) )
Heggarty et al.: Gk-Ar, (In-Ir, (Bl-Sl, (Ital, (Germ, Celt) ) ) )
Kassian et al.: Gk-Ar, (Ital, Germ, Celt), (Bl-Sl, In-Ir)

Albanian is rather difficult to place, but it's in Core IE, all of IE but Anatolian and Tocharian. Ringe et al. also found Germanic difficult to place, something that points to a northern-European dialect continuum some 4,500 years ago. Germanic and Balto-Slavic share a word for "thousand": Proto-Germanic *þūsundī, Proto-Balto-Slavic *tū́ˀsantis (> Proto-Slavic *tysǫti) < PIE *tuHsont-. Latin, Greek, and Indo-Iranian, however, point to PIE (*sm-) + *gheslom.

lpetrich · Mar 22, 2025

Swammerdami said:
Ringe, for his classification of IE languages, identified grammar features that were UNLIKELY to be borrowed.
(If interest I'll try to hunt a reference.)

Is it this paper? Indo-European and Computational Cladistics
On language acquisition,

Importantly for historical linguistics, that process is tightly constrained; for instance, the system of morphosyntactic categories is normally mastered by age four, and native acquisitionof a language is virtually impossible after the onset of puberty.

Morphosyntactic: about morphology (word forms) and syntax (word arrangement)

Moreover, it appears that every successful acquisition of a native language gives rise to a robust grammatical 'signature' which persists throughout life. The most important details and their consequences can be summarised as follows.

Recent research on native-language acquisition by children shows that the contrastive system of sounds, the inflectional morphology and the basic syntax of a native language are acquired in the first six or seven years of life, and that `mixed' grammars are not acquired even in multilingual environments.

and

Recent research in sociolinguistics shows that, while most linguistic structures can be borrowed between closely related dialects, natively acquired sound systems and inflections are resistant to change later in life; attempts to acquire a non-native phonemic contrast, phonological rule or inflectional category are at best only partly successful.

and

Borrowing between speechforms that are not very similar appears to be even more severely constrained, as one would expect. Studies of the bilingual situations in which borrowing occurs show that the phonology and morphosyntax of one's native language are typically carried over into a language learned later in life, but not usually the other way round.

"... one often encounters claims that practically anything can be borrowed into one's native language in a suitable bilingual situation." But that looks at results rather than process.

... morphosyntactic structures even of very dierent languages can apparently be borrowed into a community's native language in the context of community-wide bilingualism persisting for many generations.

Like Hebrew plurals in Yiddish, a dialect of German.

It seems likely
that some ®rst-language learners in such situations misinterpret frequent code-switching as monolingual behaviour and thus learn foreign morphosyntax as part of their native language, and that, given enough time, their analysis can become dominant in the community. However, we must also note that the only study in depth of such a process in progress concludes that even morphosyntactic borrowing of this kind is mediated by lexical borrowing: in effect, 'core' lexemes are borrowed and bring their morphosyntax with them.

...
A surprising number of examples of the supposed borrowing of foreign morphosyntax by native speakers can be reinterpreted rather easily as the influence of a native language on a second language.

That native language as a second language for some speakers of it.

But that paper has no list of grammatical features that resist borrowing.

lpetrich · Mar 22, 2025

First, "borrow" and "loan" are bad names for this linguistic effect, since the originals are unchanged. "Copy" or "imitate" would be better.

Revisiting the Borrowability Scale(s) of Free Grammatical Elements: Evidence from Modern Greek Contact induced Varieties in: Journal of Language Contact Volume 12 Issue 3 (2019) - function words, like prepositions, conjunctions, and pronouns.

For instance, borrowability hierarchies lead to predictions that unbound forms are more borrowable than bound ones, lexical items more borrowable than grammatical items, semantically transparent forms more borrowable than semantically opaque ones, etc.

The article quoted several examples.

Whitney (1881):
- Function words: prepositions > conjunctions > pronouns
- Affixes: derivation > inflectional
Haugen (1950): nouns > verbs > adjectives > adverbs, prepositions, interjections
Muysken (1981): nouns > adjectives > verbs > prepositions > coordinating conjunctions > quantifiers > determiners > free pronouns > clitic pronouns > subordinating conjunctions
Matras (2007): Nouns, conjunctions > verbs > discourse markers > adjectives > interjections > adverbs > other particles, adpositions > numerals > pronouns > derivational affixes > inflectional affixes

The order varies somewhat, depending on what was researched. I add Haspelmath, Tadmor, & Taylor's "Borrowability and the Notion of Basic Vocabulary" (2010): Nouns are twice as likely to be borrowed as adjectives, adverbs, verbs, and function words, which are similar in borrowability.

Borrowability and the notion of basic vocabulary | John Benjamins

lpetrich · Mar 22, 2025

The Global Lexicostatistical Database - "Comparative Onomasiological Database for the Ancestral States of Eurasian Linguistic Families" - "Optimized for semantic and lexicostatistical analysis."

Onomasiology - about how to name things. "Lexical" would be a better word, at least in English.

General description by George Starostin; Alexei Kassian; Mikhail Zhivlov; Anna Dybo; Ilya Egorov; Roman Pavlov; Alexander Savelyev; Artem Trofimov.

The 400-item list is a planned expansion to the more "traditional" 100- or 200-item Swadesh wordlist. It is designed as a fixed, relatively universal set of discrete semantic notions, lexical equivalents to which can be located in all or most of the world's languages and reconstructed to all or most protolanguages for which sufficient data from descendant languages are available. (Relatively is an important word here, since even the much condensed 100-item wordlist is known not to be 100% universal, much less so any expanded version of it; nevertheless, at the very least the lexical items found on the list correspond to objects, processes, and qualities found across all of the world's regions and possibly pertaining to all types of societies).

This database also includes semantic shifts. This is necessary for long-distance comparisons, to increase the amount of data available.

List of basic semantic concepts (from the same 400-item list) that are known to be connected to the current meaning through "trivial" semantic connections (relatively simple metaphors and metonymies, synchronically or historically attested or reliably reconstructed for at least two or more linguistic situations).

Thus trying to keep the semantic spread from being too great, for maintaining falsifiability. Something additionally useful would be to have an estimate of how many times each shift has happened in one's reference sample. One can then weight an amount of semantic match by how common a shift is.

How the words were classified:

N = "noun" (186 terms), V = "verb" (132 terms), A = "adjective / descriptive verb" (63 terms), P = "pronoun" (13 terms, including personal, deictic, and interrogative pronouns), Q = "quantifier" (5 terms, including numerals and such quantifier terms as 'all', 'many', etc.), C = "particle / clitic" (only for the word 'not').

It's still a work in progress, and all commentary in the database is still only in Russian, though the authors hope to eventually make English versions.

lpetrich · Mar 22, 2025

400-item basic wordlists for the Ancestral States of Eurasian Linguistic Families : Query result -- all in one page
and
Search for data in: 400-item basic wordlists for the Ancestral States of Eurasian Linguistic Families

Using reconstructed protoforms for Indo-European, Uralic, Yukaghir, Turkic, Mongolic, Tungusic, Korean, Japanese, Dravidian, Kartvelian, Eskimo, Aleut, Chukchee-Koryak, Itelmen, Nivkh, Yeniseian, Chinese.

Using each meaning's list of semantically-related meanings, I constructed a graph, a network of how these meanings are related.

Many of them are very straightforward, like day - Sun, house - nest, forget - lose, lick (v.) - tongue, fly (n.) - bee - honey, stomach - belly - intestines, ... but there were also larger ones, like a triangle of black - dark - night, with branches off of it black - coal, black - dirt - mud, night - evening, night - bat - mouse, and a huge one with many of the meanings. BTW, the Russian word for bat (animal) is literally "flying mouse".

I shrunk it down by selecting out the meanings with lots of other meanings connected to them, and I still got some big networks. Here are some: green - new - white - shine, and raw as a central point, with raw - meat - eat - bite, raw - sour - bitter, raw - wet - rain - water.

I next found the meanings with the best-connected neighbor meanings, and made a graph out of that. I found a line, walk - come, with branches walk - live, walk - foot, walk - go - exist, come - enter, come - send, come - arrive, and also two with branches two - twins, two - pair, two - four, and two - other - friend.

lpetrich · Apr 2, 2025

How they are related:

Macro-Uralic: Uralic, Yukaghir
Indo-Uralic: Indo-European, Macro-Uralic
Core Altaic: Turkic, Mongolic, Tungusic
Broad Altaic: Core Altaic, Korean, Japanese
Chukotko-Kamchatkan: Chukchee-Koryak, Itelmen
Eskimo-Aleut: Eskimo, Aleut
Eurasiatic: Indo-Uralic, Broad Altaic, Chukotko-Kamchatkan, Nivkh, Eskimo-Aleut
Nostratic: Eurasiatic, Kartvelian
Dene-Caucasian: Yeniseian, Chinese

Of these, Macro-Uralic, Indo-Uralic, Core Altaic, and CK-Nivkh have some statistical support.

lpetrich · Apr 13, 2025

I looked for cognates represented in at least one of Indo-Uralic and Core Altaic, being represented in at least one language family each, and I tried to stick to meanings in the Swadesh list. I found a sizable number of putative cognates:

IE *wed- "water" -- Ural *weti id -- Tung *udun "rain"
IE *klew- "to hear" -- Ural *kuwli, *kule- id -- Turk *kulkak "ear" -- Mong xulxi "earwax, inner ear" -- Tung *xül- "to sound"
IE *(s)kwalos "big fish" -- Ural *kala "fish" -- Tung *xolsa id
IE *ei- "to go" -- Ural *aja- "to drive, chase" -- Mong ajan "journey, campaign" > "to travel" -- other Altaic cognates?
Mong *hünesün "ashes" -- Tung *pulnje- id -- IE *pel- "flour, dust" -- Ural *pelme "dirt, dust, ashes"
Turk *kara "black" -- Mong *kara id -- Jap *kurua id -- IE *kers- id
Turk *êr "man (male)" -- Mong *ere id -- IE *wersen- id -- Uralic *urV id
Mong *hulahan "red" -- Tung *pula- id -- Kor mok -- IE *pelH- "pale, gray" -- ? Ural *pil'me "darkness"
Mong *kele(n) "tongue" -- Tung *xilngü id -- Ural *kele id -- IE *kelh1- "to call, shout"
Turk *kûrt "worm" -- Mong *kora id -- Tung *xirga- "fly (insect)" -- Ural *karmV id -- IE *(k)wrmis, "worm"
Kor kurum "cloud" -- Jap kumua id -- Ural *kumV id -- ? IE *kemer/n- (Germanic *himinaz) ? IE *h2akmon- "stone"
Kor pir "fire" -- Jap *pëi id -- IE *peh2wr/n- id
Kor mân "many" -- Jap *mana id -- Ural *mone "some, many" -- IE *monoghos "many, much", *megh2- "big, great"
IE *esHr/n- "blood" -- Tung *sekse id -- Turk *sag "healthy" -- Mong *sayin "good"
IE *krewH "blood"- -- Turk *kïr'ïl "red" -- Kor guri "copper"
IE *ed- "to eat" -- Mong *ide- id -- Turk *etmek "bread"
Kor ani "not" -- Jap *an id -- Tung *ana id -- IE *ne id -- Ural *ne- id
Turk *ek(ki) "two", *ek(k)ir' "twin" -- Ural dual ending -kë

I found a sizable tilt toward the more stable words, like for Indo-Uralic and Core Altaic separately. So this means some support for Eurasiatic or "Northern Nostratic".

lpetrich · Apr 13, 2025

Possible cognate of IE *(s)kwalos "big fish" > "whale (pseudofish)"
Mong xalim (*kalimu) "whale" -- Even *xalim id -- Nivkh qalm id
I suspect some borrowing, because large fish and pseudofish are something that inland people are unlikely to have much direct experience of, unless they live near big rivers or big lakes.

Wiki (Aikio):
IE *h2ayer/n- "day" -- Ural *kaja "dawn, Sun"
https://en.wiktionary.org/wiki/अहर्#Sanskrit - ahar
Starling:
IE *gwhai- "light, bright" -- Ural *koje "dawn" -- Turk *kün "day, Sun" -- Mong *gegheghe "dawn" -- Tung *gianja- "dawn"
Greek phaios "gray", Lithuanian gaisas "glow, light, beam", Latvian gaiss "bright, well lit"

Wiki (Aikio):
IE *h2ag- "to drive" -- Ural *aja "to drive, flee"
Starling:
IE *ei- "to go" -- Ural *aja- "to drive, chase" -- Mong ajan "journey, campaign" > "to travel" -- other Altaic cognates?

This is not in any of the lists of meanings with highly-stable word forms that I've found, but it's well-preserved over Eurasiatic:

IE *plusis "flea" -- Turk *bürge, *bürtSe -- Mong büüreg (*bürge) -- Kor byeoruk (*pjërok)
The IE forms have a lot of consonant shuffling (taboo deformation?), though the meaning is well-preserved.
*plusis > Sanskrit plusi
*plus-ek- > *puslek- > Latin pûlex, -icis
*plow-kos > Proto-Germanic *flauhaz
*plus-y(e)h2 > *psuly(e)h2 > Greek psulla
*b(h)lus-eh2 > Proto-Balto-Slavic *bluSâ > Proto-Slavic *bluxâ
*pluso- > Armenian *lu, *luo-
*plews- > Albanian plesht
The Turkic and Mongolic forms look much alike, close enough to suspect borrowing. This is a bugbear of Altaic linguistics: Turkic, Mongolic, and Tungusic speakers living close together, but without doing many nontrivial sound shifts. Thus, inheritance and borrowing can be hard to disentangle. This is something that is often much easier in Indo-European linguistics, where one can easily sort out "father" vs. "paternal", for instance.

About blood, PIE had two words for it:
Constituent - associated with one's nature and ancestry
*esHr/n- -- Tung *sekse id -- Turk *sag "healthy" -- Mong *sayin "good"
Released -- associated with violence
*krewH- -- Turk *kïr'ïl "red" -- Kor guri "copper"

Negation
Kor ani "not" -- Jap *an id -- Tung *ana -- IE *ne id -- Ural *ne- id

Numerals
"one" Turk *bir -- Jap *pitä

"two"
Turkic *ek(ki) "two", *ek(k)ir' "twin"
Mongolic *ikire "twin" (borrowed from Turkic?)
Uralic *-kë (dual)
Eskimo-Aleut *-k (dual)
Indo-European *-H (dual)

lpetrich · Apr 13, 2025

I made the same plot of (meaning index) - (meaning stability index) and I found that Indo-Uralic-Core-Altaic does well, roughly comparable to Indo-Uralic and Core Altaic separately.

I allowed for some semantic shifts, like "ear", "to hear", "to sound" -- "dust", "ashes" -- "tongue", "to call, shout" -- "blood", "red" -- "water", "rain" -- "worm", "fly (insect)" -- "to eat", "bread".

Pronouns: 1sg (I/me), 2sg (thou), "that", "who?"

Sources:

Wiktionary, the free dictionary and Indo-Uralic languages
Proto-Indo-European-Uralic comparison from the probabilistic point of view [JIES 43, 2015]
Permutation test applied to lexical reconstructions partially supports the Altaic linguistic macrofamily | Evolutionary Human Sciences | Cambridge Core
Greenberg - Indo-European and Its Closest Relatives - The Eurasiatic Language Family, Vol. 1 - Grammar (2000) : Allan R. Bomhard : Free Download, Borrow, and Streaming : Internet Archive
Greenberg - Indo-European and Its Closest Relatives - The Eurasiatic Language Family, Vol. 2 - Lexicon (2002) : Allan R. Bomhard : Free Download, Borrow, and Streaming : Internet Archive
Databases at The Tower of Babel

So satisfying to more-or-less demonstrate what I'd read about long ago in the collections of papers in Vitaly Shevoroshkin's books, also available at the Internet Archive.

lpetrich · Apr 14, 2025

Starling: database software at the Tower of Babel site. Don't be afraid of that name; its authors consider that story to be pure mythology, a Just So Story about why people speak many languages. The God of the Bible comes off as a dick in that story, as he does in a lot of other places in the Bible. Nothing like "I see what you are trying to do. You won't be able to reach me no matter how high you build. But don't feel too bad. It's a great achievement to build a tower so high." Instead it's "I feel affronted by all you people building a tower to try to reach me. So I'll make you all speak different languages so you can't coordinate your efforts."

Google Translate of "I need some bricks" into present-day languages in the Middle East and nearby:

Hebrew: Ani tzrich kma levanim.
Arabic: 'Ahtaj 'iilaa baed altuwb.
Somali: Waxaan u baahanahay xoogaa leben.
Greek: Chreiázomai meriká toúvla.
Romanian: Am nevoie de niște cărămizi.
Ukrainian: Meni potribno trokhy tsehly.
Persian: Man bah moghdari ajer niaz daram.
Turkish: Tuğlaya ihtiyacım var.
Georgian: Ramdenime aguri mch’irdeba.
Chechen: Suna cẋacca daxça öşu.
Abkhaz: AţŹamcķəa sţahup.

"Bricks!" alone:

Hebrew: Levanim!
Arabic: Tub!
Somali: Leben!
Greek: Toúvla!
Romanian: Cărămizi!
Ukrainian: Tsehla!
Persian: Ajer!
Turkish: Tuğlalar!
Georgian: Aguri!
Chechen: Daxça!
Abkhaz: AţŹamcķəa!

lpetrich · Apr 18, 2025

Fish vs. whale -- from constructing that list of Northern Eurasian cognates

"Fish"

Indo-European
- *dhghuHs > Greek ikthus, Old Armenian dZukn, Balto-Slavic *zûs
- (Western) *peisk- > Italic (Latin piscis), Celtic, Germanic *fiskaz > E "fish"
- Balto-Slavic *ryba
- Indo-Iranian *matsyas
Uralic *kala
Turkic *bâlik
Mongolic *dZigasun
Tungusic *xolsa
Chukchi ynneen
Nivkh tS'o

"Whale"

Indo-European *(s)kwalos "big fish" > Latin squalus "big sea fish", Old Prussian kalis "sheatfish" (big catfish), Germanic *hwalaz "whale" > E "whale", Finnish valas, ...
Latin ballaena ~ Greek phallaina -- > Italian balena (> Turkish balina), French baleine, ...
Greek kêtos > Old Church Slavonic kitu > Russian kit > Kazakh kit, ...

Words for this animal have been borrowed a lot, something typical of words for exotic animals: "camel", "elephant", ...

Going from Western to Eastern Eurasia, we find

Mongolian xalim
Even, Evenki (Tungusic) kalim
Nivkh qalm "small whale"
Nivkh keng "medium-sized whale"

Seems like Nivkh or coastal Tungusic speakers first coined a word for this animal, then this word spread inland. A word like *kal-im-

Source: Wiktionary, the free dictionary for "fish" and "whale"

lpetrich · Apr 19, 2025

Reconstructing the Proto-Afroasiatic Pronouns – A New Approach – Linguistics and Nonsense

About THIS THING

A page that looks at historical linguistics (with a focus on the Afroasiatic Hypothesis of the origin of Proto-Indo-European). I occasionally delve into interesting parts of history, generally interesting stuff, and of course a bit of nonsense.

Proposes
1s /an-V-k- ... 1p /an-nV-k- ...2s /an-tV-k- ... 2p /an-tVN-k-
with both a prefix, /an-, and a suffix, -k-. I'm using / instead of a dotless ? to represent a glottal stop.

To check this out, I consulted Vaclav Blazhek's work: Afroasiatic Personal Pronouns - in: Diachronic Perspectives on Suppletion 2019 - somewhat complicated. The pronouns have independent and object forms, and second and third persons are gendered.

1s ind (/an-)/aku ... obj /ya/i/u ... 1px ind (/an-)hina/u ... obj na/i/u ... 1pi ind (/an-)muni ... obj muni
2sm ind (/an-)ta ... obj ku ... 2sf ind (/an-)ti ... obj ki
2pm ind (/an-)tunwa ... obj kunwa ... 2pf ind (/an-)tinya ... obj kinya
3sm ind Suwa ... obj Su ... 3sf ind Siya ... obj Si
3pm both Sunwa ... 3pf both Sinya

S = English "sh"

This may be summarized as

1s ind */aku ... obj */i- ... plural added to object form ... is independent form suffixed? /a-ku
2x ind *t- ... obj *k-
3x *S-
Plural: *-n
Gender: masc *-u, fem *-i
Prefix for 1x, 2x independent: */an-

From A Concatenative Analysis of Diachronic Afro-Asiatic Morphology by David Wilson:

1s /V- ... 1p nV- ... 2x tV- ... 3sm, 3p yV- ... 3sf tV-
1s -(â)ku ... 1p -(â)nV ... 2s -(â)tV ... 2p -(â)tVn ... 3xm - ... 3xf -, -t

lpetrich · Apr 22, 2025

Here, "I have some bricks."

Hebrew: Yesh lei kma levanim.
Arabic: Ladaya baed altuwba.
Somali: Waxaan haystaa xoogaa leben.
Greek: Écho meriká toúvla.
Romanian: Am niște cărămizi.
Ukrainian: U mene ye kilʹka tsehlyn.
Persian: Man chand ajer daram
Turkish: Benim birkaç tuğlam var.
Georgian: Me makvs ramdenime aguri.
Chechen: Cẋacca daxçanaş du san.
Abkhaz: Sara aķʹyrmytķəa symoup.

lpetrich · Apr 22, 2025

Anachronistic matches between Proto-Turkic, Proto-Mongolic and Proto-Tungusic - "The rough statistics suggests that prehistoric contacts between Proto-Turkic and Proto-Mongolic were much more intense than Proto-Mongolic—Proto-Tungusic contacts."

Some items in that list are obvious borrowings, like "wheel spoke" and "plow" and "bridle" and "saddle" and words for various domestic animals.

(Nuclear Altaic phylogeny (Turkic, Mongolic, Tungusic): comparing reconstructed Swadesh wordlists of three proto-languages [WAC 2022]

The hypothetical Altaic a.k.a. Transeurasian macro-family consists of the nuclear families: Turkic, Mongolic, Tungusic, and the outliers: Korean and Japonic. The genealogical relationships between Turkic, Mongolic, Tungusic are obscured by prehistoric and later contacts. The working hypothesis of our Moscow team is that the genealogical filiation is [Turkic [Mongolic, Tungusic]] with intense post-split contacts between Turkic and Mongolic in the 1st millennium BC.

That latter time is around when Turkic speakers started spreading out of their Central Asian homeland.

Number of matches:

Turk - Mong, Turk - Tung, Mong - Tung
110 words: 12%, 10%, 13%
400 words: 37%, 22%, 33%

The numbers for the 400-word list are much less "flat" than those for the 110-word list. This suggests a pattern of borrowing:

Turk <-> Mong
Mong <-> Tung

meaning that the homelands of these three language families were likely in a chain. From west to east: Turk, Mong, Tung.

lpetrich · Apr 22, 2025

Proto-Indo-European and Proto-Uralic among other proto-languages of Eurasia: a lexicostatistical evaluation
A slideshow.

Lexicon -- Morphology -- Stability:
Cultural lexicon -- «Peripheral» grammatical affixes -- Low
«Peripheral» basic lexicon -- «Core» grammatical affixes -- High
«Core» basic lexicon -- «Fossilized» affixes -- Highest

Author George Starostin concedes that we are not likely to get very much over very long distances, certainly not as much as for Indo-European or Austronesian, for instance, both mid-Holocene.

His conclusions:

1. All formal tests dealing with «core» basic lexicon indicate a connection between Proto-Indo-European and Proto-Uralic.

2. The simplest and most logical explanation of this connection, given the nature of the data, is descent from a common ancestor.

3. Comparison of «automated» vs. «manual» results shows that the «strength» of the binary Indo-Uralic connection is more impressive than the «strength» of the larger Nostratic hypothesis, but is formally comparable with the «strength» of such deep level connections as Altaic or Vasco-Caucasian.

Altaic and Vasco-Caucasian are both Early Holocene, and that would make Indo-Uralic also Early Holocene.

lpetrich · Apr 22, 2025

Typological Expectations and Historic Reality: Once Again on the Issue of Lexical Cognates between Indo-European and Uralic

Discusses dividing the Swadesh list into more-stable and less-stable halves, and using Aharon Dolgopolsky's simplified-phonology method. Also of interest are true cognates that were not detected by this method: false negatives.

Lower 50 -- 18 -- 4: ‘foot’, ‘new’, ‘tooth’, ‘two’ -- 14: ‘nail’, ‘ear’, ‘eye’, ‘heart’, ‘horn’, ‘I’, ‘name’, ‘night’, ‘star’, ‘sun’, ‘thou’, ‘tree’, ‘what’, ‘who’
Upper 50 -- 4 -- 2: ‘feather’, ‘stand’ -- 2: ‘knee’, ‘sand’

Indo-Uralic:

Lower 50 -- 7: ‘I’, ‘thou’, ‘name’, ‘hear’, ‘water’, ‘who’, ‘drink’
Upper 50 -- 5: ‘cold’, ‘lie (repose)’, ‘many’, ‘that’, ‘give’

Albanian: from Latin-Romance, Slavic

Lower 50 -- 7: ‘bone’, ‘foot’, ‘hair’, ‘head’, ‘hear’, ‘smoke’, ‘tail’
Upper 50 --14: ‘bark’, ‘come’, ‘feather’, ‘ﬁsh’, ‘ﬂy’, ‘green’, ‘liver’, ‘many’, ‘red’, ‘road’, ‘round’, ‘sand’, ‘swim’, ‘yellow’

Brahui: from Indic, Iranian

Lower 50 -- 8: ‘bone’, ‘dog’, ‘egg’, ‘leaf’, ‘louse’, ‘star’, ‘tooth’, ‘tree’
Upper 50 -- 17: ‘breast’, ‘bark’, ‘cloud’, ‘earth’, ‘fat’, ‘feather’, ‘ﬁsh’, ‘full’, ‘good’, ‘knee’, ‘liver’, ‘many’, ‘person’, ‘root’, ‘round’, ‘sand’, ‘swim’

So Indo-Uralic looks like it has some common ancestry.

lpetrich · May 1, 2025

Schleicher's fable - classic example of connected text in reconstructed Proto-Indo-European. Many of that article's translations include translations into those articles' languages.

A Song of Sheep and Horses: eurafrasia nostratica, eurasia indouralica - development of Indo-European and Uralic with translations of August Schleicher's fable along the way.

Translations into the Big Three of Indo-European studies: Ouis equique (Versio Latina) | Memiyawanzi and ὄϊς ἵπποι τε (Versio Graeca) | Memiyawanzi and अविको ऽश्वाश्च (Versio Sanscrita) | Memiyawanzi

Schleicher's Fable in Old Enlgish : r/OldEnglish

Schleicher's Fable collection - Pastebin.com

lpetrich · May 2, 2025

Differential object marking - in many languages, direct objects are marked differently from subjects only some of the time. This is often from prominence, something that has two hierarchies in it:

Animacy: human > nonhuman animate > inanimate
Definiteness or specificity: personal pronoun > proper name > definite NP > indefinite specific NP > non-specific NP

NP = noun phrase

What's in the second hierarchy: the dogs bark > some dogs bark > dogs bark.

In the first hierarchy, Slavic languages with noun cases are languages that split their main masculine declension into animate and inanimate ones, with the animate accusative has the form of the genitive case and the inaninate accusative has the form of the nominative case. "Animate" and "inanimate" are in an informal sense, with plants often being inanimate.

Also in the first hierarchy, every Indo-European language with a neuter or inanimate gender, the accusative case has the form of the nominative case: the "Neuter Law".

Turning to the second hierarchy, English has pronouns with separate object marking while nouns don't: I/me, we/us, thou/thee. This is also true of most other recent Germanic languages, most Romance languages, and Bulgarian. Exceptions: Icelandic and the older Germanic languages (Old English, Old Norse, Gothic), and Old French. Seems like incomplete loss of noun cases.

Most noun cases correspond to prepositions, and also postpositions, collectively adpositions. A postposition that becomes attached makes a noun case, and many languages have very modular cases - one case ending for each meaning that is attached after other endings. Indo-European is weird in having singular and plural cases that look very different.

Some languages have a version of the second hierarchy, having a separate accusative form only for a definite direct object. An indefinite direct object gets the nominative form. Examples are Turkic languages and Mongolian:

Turkic: nom -, def acc -i
Mongolian: nom -, def acc -gi

Yes, a zero ending on the nominative. Indo-European is an odd one out for having nonzero nominative singular endings.

Some languages have accusative prepositions. I must concede that that is like calling "of" the genitive preposition, "to" the dative preposition, "at" the locative preposition, "in" the inessive preposition, "on" the superessive preposition, "with" the instrumental-comitative preposition, "from" the ablative preposition, ...

Hebrew is one of them; it has <et>, a definite accusative one, for definite direct objects. It appears in places like Genesis 1:1.

Spanish also has an accusative preposition, the "personal <a>". This preposition usually means "to" and sometimes "at", but it is also used as a definite accusative preposition for people and pets, thus being in both hierarchies. It was easy to find discussions of this grammatical feature online, I must say, so I read several pages before coming to this conclusion.

Language as a Clue to Prehistory

Squadron Leader

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor