Language as a Clue to Prehistory

Swammerdami · Jul 6, 2025

Thirty languages are spoken in Thailand, according to Ethnologue, including eight languages in the Southwestern Tai subfamily. Among these, the most prominent are Central Thai (Siamese), Northern Thai (Lanna), and Isan (Lao). This wide variety of languages is still present, despite that government schools have been teaching only Central Thai for decades. Northern Thai is not just a dialect of Central Thai: It is a distinct language with different tones, different fonts, different words and different idioms.

My children left Central Thailand several years ago to study at Chiang Mai University and have since picked up a lot of Northern Thai. I just had a conversation with my daughter that I found interesting. Northerners avoid Lanna words when speaking to a non-Lanna, but are happy to use Central Thai words with Northern meaning: because the words are Central Thai words, they think they're speaking Central Thai!

For example, the query "Where are you?" is (word-by-word translation) "Stay where?" in Central Thai but "Have where?" in Lanna. The Thai word "Have" gets used by Northerners in a lot of contexts incompatible with Central Thai. Wiktionary has some Northern Thai words -- here's /mii/ "to have" written with an obsolescent(?) alphabet.

The weirdest example my children give is จะไปไปไป /ja-pai-pai-pai/, literally "will go go go" in Central Thai. But /ja-pai/ "will go" means "Forbid" in the North! So /ja-pai-pai/ means "Do not go." The third /pai/ in the sentence is for emphasis.

Jokodo · Jul 12, 2025

Swammerdami said:
There are wild(?) conjectures connecting the Huns to Scandinavia, perhaps as a result of Gothic warriors mingling with Hunnic warriors. Here's a recent paper on that theory.

Here are some even wilder(?) musings claiming

... the archaeological record indicates a significant change in religious practices occurring in Scandinavia in the 5th Century AD. The dramatic alterations in artifact assemblages and burial practices strongly point to a change “coming from the south around 450 or a little earlier” (Brandt, p.30) a people who would have a significant impact on all aspects of life in Scandinavia – and yet it goes unnoticed in the standard texts.
This period in time marked the beginning of a change in the centers of power to Gamla Uppsala and Southeastern Bornholm. The thesis of the present study is that these changes were initiated by the arrival of Uldin / Odin and his mixed Ostrogoth / Herul and Hun / Alanic forces who established new dynasties and brought with them the unmistakable Y chromosome DNA signatures of Central Asia.

Click to expand...

Snorri Sturluson's Heimskringla, particularly the Ynglinga saga, portrays Odin as a historical, albeit legendary, king of the Swedes. The saga also mentions Odin having a son named Sigi, who is said to have become the king of the Huns. IIUC, some of Snorri's writings show Odin as coming from Asia and bringing a new religion to Scandinavia. Odin may even be cognate to Uldin, Attila's predecessor. I don't think Odin/Wodan cognacy rules this out -- couldn't there be some conflation?

A pdf of "Scandinavia and the Huns: An Interdisciplinary Approach to the Migration Era" is on-line and free

... the introduction of the caftan as the distinct male warrior-dress in Scandinavia from the late fifth century has recently been ascribed to direct Asiatic influence on Scandinavia (Mannering 2006:197f). ... it has been argued that the type of saddle known from the chiefly burials at Vendel and Ho¨ gom in Sweden was brought to Europe by the Huns and the Avars (Engstro¨ m 1997:248f).

Click to expand...

I did some digging. Apparently the most commonly accepted interpretation of Snorri's references to Asia is that this was one of the ways he and his contemporaries (he wrote in the 13th century) coped with the dilemma of keeping the old stories alive (and in Snorri's case, preserving them for posterity in writing) while being good Christians. A quarter millennium after the Christianisation of Scandinavia and Iceland, reconceptualising the gods of old as semi-historical heroes from a far-away land, could have been their way to justify talking of them.

Another thing to note is that claiming a founding figure from the east was actually a fairly common cliché in late Iron Age to medieval (western) Europe. We have Aeneas of Troy as the mythical forefather of Romulus and Remus, Brute of Troy in Britain, and Anglo-Saxon chronicles that claim the Picts descend from Scythians. Snorri could have easily picked up elements from such stories. Now, this meme itself *could* represent a distorted memory of real migrations much deeper in prehistory - of Yamaya- related groups in the early Bronze Age, or even of Anatolian farmers in the neolithic period. But if so, the timelines do not add up if we attempt a literal interpretation of the stories.

One thing that's almost certainly an anachronistic addition is the equation Æsir~Asia. "Asia" was originally a specifically Greek term for a specific region in what's now Western Turkey (though its ultimate etymology is apparently unclear). It's unlikely its use, and in particular its generalised use referring to more or less what we call Asia today, would have been familiar in Scandinavia before Christianisation and the increased reliance on Greco-Roman tropes it brought with it, so *even if* the Æsir's conception in pre-Christian Scandinavia was partly based on the historical Huns, this one is almost certainly a 13th century add-on. No Old Norse or Proto-Norse speaker of the 9th or 5th century would have plausibly used a term related to Greek "Asia" to refer to someone from what we call Central Asia.

On the other hand, Snorri's account is pretty specific. It places the Æsir east of the river Tanais (Don in Ukraine) and the Vanir west of it, and taken literally puts us in the 5th century. It's certainly possible that some version of oral history pertaining to the events in the region involving Huns and/ or Goths was amalgamated with much older stories.

lpetrich · Jul 21, 2025

A remarkable kind of archeological evidence has emerged over the last few decades: sequencing genetic material from old human bones. This has enabled new precision in tracking down the ancestry of long-ago people, including testing hypotheses of long-ago migrations. The people who migrated may have spread various languages, and that in turn is a hypothesis that can be tested.

For instance, Long shared haplotypes identify the Southern Urals as a primary source for the 10th century Hungarians | bioRxiv - "Our results tightly link the Magyars to people of the Early Medieval Karayakupovo archaeological horizon on both the European and Asian sides of the southern Urals." - what one would expect from the Uralic nature of their language, Hungarian.

This is 10 years old, but still good: Massive migration from the steppe was a source for Indo-European languages in Europe - PubMed

We show that the populations of Western and Far Eastern Europe followed opposite trajectories between 8,000-5,000 years ago. At the beginning of the Neolithic period in Europe, ∼8,000-7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ∼24,000-year-old Siberian. By ∼6,000-5,000 years ago, farmers throughout much of Europe had more hunter-gatherer ancestry than their predecessors, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but also from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ∼4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ∼75% of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ∼3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for a steppe origin of at least some of the Indo-European languages of Europe.

Europeans are descended from three main populations: Terminal Paleolithic settlers, Middle Eastern farmers, and early Indo-European speakers from what is now Ukraine and nearby. Why did the IE speakers take over? I've seen the bubonic plague, but another explanation I've seen is more advanced dairying, drinking of milk and eating of cheese. They could then live wherever they could pasture their cows.

lpetrich · Jul 22, 2025

The Genetic Origin of the Indo-Europeans | bioRxiv and The Genetic Origin of the Indo-Europeans - PMC and The genetic origin of the Indo-Europeans | Nature with a more accessible summary at Ancient-DNA Study Identifies Originators of Indo-European Language Family | Harvard Medical School

The findings reveal that a population of Caucasus Lower Volga people moved west and started mixing with locals, forming the distinct Yamnaya genome.

“We found that the Yamnaya descend from just a few thousand people living in a handful of neighboring villages from 5,700 to 5,300 years ago,” Reich said. “Their descendants developed a radically new economy that allowed them to follow their herds of livestock into previously inaccessible open steppe lands. This led to a demographic explosion, so that in a few hundred years Yamnaya descendants numbered many tens of thousands and were spread from Hungary to eastern China.”

Language isn’t the only tradition the Yamnaya carried on from their Caucasus Lower Volga forebears. Both cultures buried their dead in kurgans, or large tombs with earth mounded on top. Lazaridis noted that these graves attracted generations of archaeologists and have now enabled the genetic reconstruction of their makers’ origins.

Ancient genomics support deep divergence between Eastern and Western Mediterranean Indo-European languages | bioRxiv with a more accessible summary in Ancient genomes provide final word in Indo-European linguistic origins

Findings indicate that Spanish, French and Italian populations received steppe ancestry from Bell Beaker groups, while Greek and Armenian groups acquired ancestry directly from Yamnaya populations. Their results are consistent with the Italo-Celtic and Graeco-Armenian linguistic models.

...
Bell Beaker populations originated from steppe pastoralists who mixed their steppe ancestry with local European farmers. Specifically, Bell Beaker groups carried steppe-related genetic profiles from earlier steppe populations, such as Yamnaya, combined with ancestries related to the pre-existing Globular Amphora Culture in Western Europe.

...
Steppe ancestry in Greece and Armenia was derived directly from Yamnaya populations of the Pontic steppe without significant admixture of locals.

In Greece, this ancestry was detected in individuals from the Peloponnese as early as 3,800 BP, preceding the emergence of the Greek language and the Mycenaean civilization. Steppe ancestry coincided with the political rise of Mycenaean culture.

Armenian steppe ancestry appeared during the Middle Bronze Age and was genetically similar to Greek populations, supporting the Graeco-Armenian linguistic hypothesis. Steppe ancestry paralleled the Kura-Araxes culture's decline and the Trialeti culture's emergence.

...
These findings are consistent with the Italo-Celtic and Graeco-Armenian linguistic migration hypotheses and do not align with alternative models such as Indo-Greek and Italo-Germanic.

So:

Yamnaya -> Corded Ware -> Bell Beaker (W Europe) -> W Mediterranean
Yamnaya -> Balkans -> E Mediterranean

Swammerdami · Jul 22, 2025

lpetrich said:
... Ancient-DNA Study Identifies Originators of Indo-European Language Family | Harvard Medical School

The findings reveal that a population of Caucasus Lower Volga people moved west and started mixing with locals, forming the distinct Yamnaya genome.

“We found that the Yamnaya descend from just a few thousand people living in a handful of neighboring villages from 5,700 to 5,300 years ago,” Reich said. “Their descendants developed a radically new economy that allowed them to follow their herds of livestock into previously inaccessible open steppe lands. This led to a demographic explosion, so that in a few hundred years Yamnaya descendants numbered many tens of thousands and were spread from Hungary to eastern China.”

Click to expand...

The wheeled wagon was probably a key invention that allowed Yamnaya to exploit large pasturing territories; wagons may have assisted their conquest of Western Europe. It's unknown where wagons were initially invented -- wagons appeared almost simultaneously throughout Eastern Europe and neighboring regions -- but it was the terrain and economy of Yamnaya that could best take advantage. (For example, wagons had little utility in the rugged hills of Greece.)

The Afanasievo people (early divergence from Yamnaya and speakers of proto-Tocharian) had wagons. The wooden parts of wagons do not survive but porcelain figurines of wagons have been found in Afanasievo diggings.

Early wagons were presumably drawn by oxen. When did horse-drawn wagons come into use?

lpetrich · Jul 30, 2025

Recent genetics research offers a window in the past, with some valuable data from sequencing of DNA from long-ago people. It can be preserved in the "petrous bone" even if nowhere else; the petrous bone an especially hard sort of bone at the base of the skull. This has helped elucidate long-ago population migrations. For instance, Massive migration from the steppe was a source for Indo-European languages in Europe - PMC - Europeans are descended from three populations:

Terminal-Pleistocene Paleolithic people
Neolithic farmers from the Middle East
People from the western end of the Eurasian steppe zone, most likely early Indo-European speakers

A more recent paper propose Eastern Siberian connections for Uralic and Yeniseian speakers, superimposed on "An Early Holocene Forest-Steppe Hunter-Gatherer Cline", from the Baltic Sea to NE Asia.

Yeniseian speakers are linked with the Cisbaikal region, just west of Lake Baikal. The Angara River goes from there to the Yenisei River, what Yeniseian is named after.

Uralic speakers are linked with the Transbaikal region, just east of Lake Baikal, and Yakutia, a bit northeastward. They spread from there to the west end of Mongolia, and then further westward in the Seima-Turbino cultures of a little more than 4,000 years ago.

Not much discussion of Yukaghir speakers, however, even though they live north - northeast of Yakutia.

But the circumpolar-peoples paper does discuss Uralic and Yukaghir, more specifically Samoyedic (easternmost Uralic) and Yukaghir, finding statistically significant evidence of common linguistic and genetic ancestry. I find the pronouns to be strongly convincing.

There is also evidence of both in Chukotko-Kamchatkan (Kamchatka Peninsula and northward) and Nivkh (northern Sakhalin and the nearby mainland), but mainly linguistic evidence for Yeniseian - Burushaski - Na-Dene.

References:

Results like these are a reason why present-day human population geneticists don't like to designate "races", despite their predecessors often doing so.

Swammerdami · Sep 3, 2025

A universal of speech timing: Intonation units form low-frequency rhythms may be an interesting paper.

Intonation units (IUs) are thought to serve as a fundamental organizing principle in human speech, facilitating essential communicative functions like information pacing and turn-taking. By analyzing natural speech recordings from 48 languages across diverse linguistic families and geographical regions, we reveal that IUs form a low-frequency rhythm that exhibits minimal variation across demographics and life stages. This stability in time highlights the central role of IUs in structuring spoken communication and underscores their importance for understanding the cognitive and linguistic underpinnings of human speech.

... We study the rate of IUs in 48 languages from every continent and from 27 distinct language families. Using an analytic method to annotate natural speech recordings, we identify a low-frequency rate of IUs across the sample, with a peak at 0.6 Hz,

That is, IU's are commonly 1.6 seconds in duration. (That appears contrary to Figure 1 but I didn't pursue this poorly presented paper.)

and little variation between sexes or across the life span. We find that IU rate is only weakly related to speech rate quantified at the syllable level, and crucially, that cross-linguistic variation in IU rate does not stem from cross-linguistic variation in syllable rate.

I only skimmed the paper. I wish they'd shown concrete examples, e.g. of an English sentence divided into its intonation units,

?? . . . . . . Now is the time / for all good men / to come / to the aid / of their party . . . . . . ??

lpetrich · Oct 11, 2025

"An Algorithm For Building Language Superfamilies Using Swadesh Lists" by Bill Mutabazi - does Andrew Ceolin's mistake, but even worse. I like what he wants to do, but if one is developing a new or improved language-phylogeny algorithm, one ought to test it on well-established language families like Indo-European, and if one wants to work on speculative macrofamilies, one should first work on the stronger ones, like Indo-Uralic and Altaic and Austro-Tai.

Here's what he looked at:

Indo-European: German, Latin
Uralic: Finnish, Hungarian
Afroasiatic: Arabic
Bantu: Swahili, Kinyarwanda

It would make MUCH more sense to work with protolanguages as much as possible. That would cut away some of the noise of language change, since that noise would be recognized from doing protolanguage reconstruction. Consider Indo-European words for "four" and "five". In the attested languages, the initial sounds for "four": /f/, /p/, /b/, /kw/, /k/, /t/, /tS/, and for "five": /f/, /p/, /kw/, /tS/, /th/, /s/, /h/ In Proto-Indo-European, they were *kw and *p. All of them are descended from the protoforms, and almost all by regular sound correspondences.

So to do what he did, one should look at Proto-Indo-European, Proto-Uralic, Proto-Semitic, and Proto-Bantu. For Afroasiatic, one could include Old Kingdom Egyptian, Proto-Berber, Proto-Chadic, Proto-Cushitic, and Proto-Omotic, and for Bantu, one could do supersets like Proto-Benue-Congo or Proto-Atlantic-Congo. Or at least as far back as one can reconstruct most of the Swadesh list or similar lists without a lot of synonyms.

Synonyms are usually rare in well-established families, but they do exist. Like for Proto-Indo-European, "one" is *oynos and *sem-, and for Proto-Semitic, "one" is */ahhad- and *\isht-, while 2 to 10 have single roots.

Indo-European and Uralic likely qualify, but Indo-Uralic (IE + U) doesn't.

Swammerdami · Dec 11, 2025

The RobWords YouTube channel may not appeal to serious students of English linguistics, but I often enjoy his videos. His recent video "The English words nobody can explain" covers words like dog (dogge) that have etymologists stumped.

He starts by mentioning that frogge(<frosc), hogge, pygge and stagge are also animal names mostly new in Middle English. From frosc > frogge he conjectures that "gge" might have been an affectionate suffix, just as "y" in Modern doggy, puppy, bunny. If they were just nicknames, they wouldn't appear in the monk-scribed Old English documents. (Further evidence for this is the appearance of these words in early surnames and place names.)

bygge>big (man) is another mystery, Wiktionary takes it back to Old Norse, as Rob points out.

Both bridd>brid>bird and thrid>third underwent metathesis, but where did bridd ("young bird") come from? Wiktionary acknowledges possible cognacy with breed/brood but Rob goes farther -- the young of snakes were also called bridd as recently as 1500 -- with an OED cite I found Googling:

ca 1450+ cited in Gesta Romanorum said:
A serpent had made his nest..And broȝt forth his briddis þere.

Rob mentions other "mystery" words in the video, including boy, girl.

Here are Wiktionary's comments on two of the words mentioned above.

Wiktionary said:
Hog ...
... possibly from Old Norse hǫggva (“to strike, chop, cut”), from Proto-Germanic *hawwaną (“to hew, forge”), from Proto-Indo-European *kewh₂- (“to beat, hew, forge”).

Cognate with Old High German houwan, Old Saxon hauwan, Old English hēawan (English hew). Hog originally meant a castrated male pig, hence a sense of “the cut one”. (Compare hogget for a castrated male sheep.) More at hew. Alternatively from a Brythonic language, from Proto-Celtic *sukkos, from Proto-Indo-European *suH- and thus cognate with Welsh hwch (“sow”) and Cornish hogh (“pig”).

...

Old English bridd
...
the word may trace back to a Proto-West Germanic *bridi ~ *briddj-, possibly to Proto-Germanic *bridjaz, a derivative of *bredą (“board, plank, shelf", possibly also "perch, roost”), and may have therefore been used of young birds or fowl that were fledged and able to perch but not yet able to fly

lpetrich · Mar 8, 2026

I recently found STIG ELIASSON: Old Danish vigesimal counting: A comparison with Basque and Contact and Prehistory: The Indo-European Northwest

More generally: Mark Rosenfelder's Numbers List and Numeral Systems of the World and Typology of Numeral Systems

Let's consider number words. There are several kinds:

cardinal (counting, set members) - one, two, three, four, five
ordinal (in sequence) - first, second, third, fourth, fifth
adverbial - once, twice, (thrice) three times, four times, five times
multiplier - single, double, triple, quadruple, quintuple - uni-, bi-, tri-, quadri-, quinque- - mono-, di-, tri-, tetra-, penta-
collective - singlet, doublet, triplet, quadruplet, quintuplet - set of one, two, three, four, five
fractional - whole, half, third, fourth, fifth -
...

I looked at several languages in Wiktionary, and the words for cardinal numbers are the primary ones in all of them that I looked that, with words for other kinds derived from these.

For words for numbers (numerals), it is impractical to invent lots of linguistically separate words for numbers, so after some point, one must compose numerals from existing numerals. One does that with arithmetic, usually addition and multiplication, sometimes subtraction, and rarely division, almost always halving.

For high numbers, a convenient shortcut is powers of some number, the number base.

The most common base is decimal, base 10. I'll list some families with at least a word for 100 = 10^2. Indo-European, Turkic, Mongolic, Tungusic, Korean, Japanese, Nivkh, Dravidian, Semitic, Egyptian, Sino-Tibetan, Na-Dene: Athabaskan, Yeniseian, Austronesian, Bantu, ...

In Proto-Bantu, one can reconstruct as linguistically distinct forms only 1, 2, 3, 4, 5, and 10, without 6, 7, 8, 9. Bantu speakers fill in these ones by doing arithmetic, like for numbers greater than 10. That is, 6 = 5+1, 7 = 5+2, 8 = 5+3, 9 = 5+4 and sometimes other formulas. This makes a dual-base system, 2 on top of 5, what I call 2-5.

WAB · Mar 8, 2026

yeah but...

lpetrich · Mar 9, 2026

The next most common number base is vigesimal, base 20. Vigesimal systems have an odd areal distribution, rather than being scattered more-or-less evenly.

Beringian, named from being near the Bering Strait. Eskimo-Aleut, Chukotko-Kamchatkan, Na-Dene: Tlingit - constructions 2-2-5, 4-5, 2-10

Mesoamerican or Central American. Mayan, Totonacan, Mixe-Zoquean, Huave, Oto-Manguean, Misumalpan, southern Uto-Aztecan, southern Hokan - constructions 2-2-5, 4-5, 2-10, 10.2-5, 15.5

10.2-5 is 1 to 10, 10 + 1 to 4, 15, 15 + 1 to 4.

Western European: Celtic, Danish, French, Basque

The Celtic languages have family tree

"Continental Celtic": Gaulish, Lepontic, Celtiberian, ...
"Insular Celtic"
- Goidelic: Irish, Scottish Gaelic, Manx
- Brythonic: Welsh, Breton, Cornish

I put in scare quotes because these groupings may not reflect ancestry. The Insular Celtic langs are likely closer to Gaulish than to other Continental Celtic langs.

The only tens word known from Gaulish is tricontis "thirty", very clearly decimal and inherited from Proto-Indo-European. Advancing in time to Old Irish, roughly 600 - 900 CE, it also has an inherited decimal system.

But after that, Celtic languages acquired vigesimal systems. Welsh has 10.2-5, and the others 2-10, though all are built out of inherited words. Welsh and Scottish Gaelic also have recently-introduced decimal systems. For example:

11 -- SG-x aon deug "1 10" -- W un ar ddeg "1 on 10" -- W-D un deg un "1 10 1"
16 -- SG-x sia deug "6 10" -- W un ar bymtheg "1 on 15" -- W-D un deg chwech "1 10 6"
40 -- SG dà fhichead "2 20" -- SG-D ceathrad "4 10"-- W deugain "2 20" -- W-D pedwar deg "4 10"

Manx and Cornish are vigesimal, and Irish and Breton have a mixture of decimal and vigesimal words.

All of them use inherited 100, and a common word for 50 is literally "half hundred".

lpetrich · Mar 9, 2026

Let us cross the English Channel, a.k.a. La Manche "The Sleeve".

Modern Standard French has inherited tens from 20 vingt to 60 soixante, and then a vigesimal system: 70 soixante-dix "sixty-ten", 80 quatre-vingts "four-twenties", 90 quatre-vingt-dix "four-twenties-ten". But Belgian and Swiss dialects have inherited decimal 70 septante, 80 huitante, 90 nonante. For 80, some French speakers use octante, from Latin octô.

But Old French had a full-scale 2-10 vigesimal system that coexisted with inherited decimal numerals.

Various other Romance languages have evidence of vigesimal numerals, like in southern France, nearby Switzerland, and southern Italy, especially Sicily.

But their ancestor, Latin, is decimal.

Turning to Germanic langs, English "score" was sometimes used for 20 in vigesimal numerals, like in the King James Version: Psalms 90:10 "The days of our years are threescore years and ten; and if by reason of strength they be fourscore years, yet is their strength labour and sorrow; for it is soon cut off, and we fly away." Modern-English translations all use decimal numerals, like the New English Translation: "The days of our lives add up to seventy years, or eighty, if one is especially strong. But even one’s best years are marred by trouble and oppression. Yes, they pass quickly and we fly away."

Old Danish had a 2-10 vigesimal system that coexisted with the inherited decimal system. Present-day Danish has 50 = (3-1/2)*20, 60 = 3*20, 70 = (4-1/2)*20, 80 = 4*20, and 90 = (5-1/2)*20

50: halvtreds < halvtredsindstyve < halvtredje ("half third": 3 - 1/2) + sinde "times" + tyve 20
60: tres < tresindstyve < tre 3 + sinde "times" + tyve 20

lpetrich · Mar 9, 2026

Danish also has words for tens borrowed from Norwegian and Swedish, like 50 femti "fifty" and 60 seksti "sixty". Wiktionary: "Sometimes used for clarity on checks and documents and in communication with other Scandinavians."

Where might these vigesimal numerals come from? There is a clue in the Pyrenees Mountains, where some people continue to speak Basque, a relic of the pre-Indo-European languages of Europe. Basque has a 2-10 vigesimal system, and many other Western Europeans may have used vigesimal numerals before Indo-European speakers moved in.

Let's see how far back this might go.

There is an additional center of vigesimal numerals in the Caucasus Mountains. Kartvelian, Northwest Caucasian, and some Northeast Caucasian (the rest are decimal), all with 2-10.

So did early Neolithic farmers have vigesimal numerals? If so, then they would have been very unstable, with only the overall vigesimality being preserved.

lpetrich · Mar 9, 2026

Theo Vennemann also proposes the survival of another structural quirk: two "copulas", two words for "to be".

The best-known example is likely in Spanish, which has two words for "to be", ser and estar. Of these, ser is often explained as persistent and estar as transitory, though ser is typically used with qualities and estar with locations. For instance (Google Translate, Wiktionary):

The tree is orange. - El árbol es anaranjado.
The tree is in the yard. - El árbol está en el patio.

The first one is a transitory state for deciduous trees, when their leaves have died but are still attached. The second one is a persistent state for trees.

Origins:

ser < Latin esse "to be"
estar < Latin stâre "to stand, stay"

Also with two copulas, with the same origins and similar meanings, are Portuguese (ser, estar), Galician (ser, estar), Catalan (ésser, estar), Old French (estre, ester), and to a lesser extent Italian (essere, stare), and Sicilian.

Spoken near Spanish is Basque, and it also has two words for "to be", used in much the same way in Southern Basque. izan ~ ser, egon ~ estar.

Latin, however, has only one, esse, and this feature seems derived from a substrate, like vigesimal numerals.

lpetrich · Mar 10, 2026

Going northward, Old English had two words for "to be"

wesan, 1s eom, 3s is, past 1s, 3s waes
beon, 1s bêo, 3s bith

Of these, wesan was used for most purposes, and beon for a "gnomic" (timeless) present and the future.

The two conjugations were merged in Middle English.

One reconstructs Proto-Germanic

*wesanan, 1s *immi, 3s *isti, past 1s, 3s *was
*beunan, 1s *biumi, 3s *biuthi, past 1s, 3s *was

They became merged into one conjugation in most of West Germanic, like German 1s bin, 3s ist, and only the first one survived in North Germanic.

Going further, one finds Proto-Indo-European *es- "to be, remain" (imperfective), *bheuH- "to be, become" (perfective), *h2wes- "to dwell, stay".

The first two are merged into one conjugation in Latin (1s sum, 3s est, past 1s fui), Balto-Slavic, and Celtic.

But Old Irish acquired a distinction between is (with a noun, pronoun, or adjective) and attá (with an adverb, an adverbial phraise, or a prepositional phase). It survives in present-day Irish as is (with a noun phrase) and bí (with an adjective or a prepositional phrase).

Theo Vennemann then notes Irish English having a habitual construction "do be <verb>ing", as a result of influence from Irish.

So this is evidence of a two-copula substrate in northwestern Europe, in addition to southwestern Europe.

lpetrich · Mar 10, 2026

Theo Vennemann then gets into the position of the accent in the word. This is reconstructed as variable in Proto-Indo-European and in the protolanguages of most subfamilies, with three exceptions: Germanic, Italic, and Celtic, all in Western Europe. Their accent was on the first syllable. Substratum influence?

But this fits with the subgroupings found in Rapid radiation of the inner Indo-European languages: an advanced approach to Indo-European lexicostatistics One of them is a western one: Germanic, Italic, and Celtic.

Language as a Clue to Prehistory

Swammerdami

Vice Marshal

Jokodo

Veteran Member

lpetrich

Contributor

lpetrich

Contributor

Swammerdami

Vice Marshal

lpetrich

Contributor

Swammerdami

Vice Marshal

lpetrich

Contributor

Swammerdami

Vice Marshal

lpetrich

Contributor

WAB

Contributor

lpetrich

Contributor

lpetrich

Contributor

lpetrich

Contributor

lpetrich

Contributor

lpetrich

Contributor

lpetrich

Contributor