Language as a Clue to Prehistory

lpetrich · Dec 2, 2023

Grammars Across Time Analyzed (GATA): a dataset of 52 languages | Scientific Data -- all of them langs without long literary traditions. Did the authors try to avoid literary traditions as linguistic interference?

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features | Scientific Data

The time and place of origin of South Caucasian languages: insights into past human societies, ecosystems and human population genetics | Scientific Reports - the Kartvelian ones, including Eurasian Georgian

They used

the past distribution ranges of wildlife elements whose names can be traced back to proto-Kartvelian roots
the distribution ranges of past cultures
the genetic variations of past and extant human populations.

They found

Zan: 1,200 BP - Laz, Megrelian
Karto-Zan: 2,617 BP - Georgian, Zan
Proto-Kartvelian: 7,641 BP - Svan, Karto-Zan

Our analyses place the Kartvelian Urheimat in an area that largely intersects the Colchis glacial refugium in the South Caucasus. The divergence of Kartvelian languages is strongly associated with differences in the rate of technological expansions in relation to landscape heterogeneity, as well as the emergence of state-run communities. Neolithic societies could not colonize dense forests, whereas Copper Age societies made limited progress in this regard, but not to the same degree of success achieved by Bronze and Iron Age societies.

Colchis is at the east end of the Black Sea.

According to the mean split dates estimated by the phylogenetic model, the divergence between Svan and Karto-Zan occurred prior to or at the beginning of the introduction of metallurgy in the study area, while Georgian and Zan diverged in the Iron Age, specifically during the Urartian period.

lpetrich · Dec 2, 2023

However,

Proto-Georgian-Zan language mentions date estimates Proto-Kartvelian 3850 BP and Karto-Zan 2700 BP -- Karto-Zan is close, but Proto-Kartvelian is far off.

Automated Similarity Judgment Program - Welcome to The ASJP Database - "The database of the Automated Similarity Judgment Program (ASJP) aims to contain 40-item word lists of all the world's languages. A lexical distance can be obtained by comparing the word lists, which is useful, for instance, for classifying a language group and for inferring its age of divergence."

Only 40 meanings? I'd want 100 or 200 for good statistics.

I must say that I find the Automated Similarity Judgment Program a disappointment. In ASJP Downloads are some versions of a comprehensive family tree that was found with it.

I checked it against known cases, and I found very odd results.

For Germanic, (English, (North Germanic, other West Germanic: ((Frisian, Dutch), German))) -- which is just plain absurd. English is a West Germanic language and is closest to Frisian.

For Romance, French is an outlier rather than being grouped with Catalan, Spanish, Portuguese, and Italian as Continental Western Romance langs, with Sardinian and Romanian being outside that group. It has a big fat omission, however: Latin.

The Slavic langs are also odd: ((((((Serbocroatian, Slovenian), (Czech, Slovak)), Polish), Bulgarian), (Belarusian, Russian)) Ukrainian) -- actually S Slavic: ((Serbocroatian, Slovenian), Bulgarian), W Slavic: ((Czech, Slovak), Polish), E Slavic: (Russian, Belarusian, Ukrainian)

But it does get Balto-Slavic correct: (Slavic, (Lithuanian, Latvian))

Dialect continuum - inhabitants of a continuum can easily understand and converse with speakers of nearby dialects, but not necessarily speakers of more distant dialects. That is a big problem for a

Tree model of language relationships, and a

Wave model can fit very well.

Several of the langs I've mentioned are members of dialect continua: the Continental West Germanic langs, the Continental North Germanic langs (Scandinavian ones), the Western Romance ones, the Eastern Romance ones, the Northern Slavic ones (West + East Slavic), and the Southern Slavic ones.

There are many others, like the Arabic ones, the Indic ones, and the Chinese ones.

These can cause trouble for tree algorithms, but such algs ought to be able to avoid gross mistakes like making English, French, and Ukrainian outliers. So there must be something about ASJP that causes trouble. Overly simplistic comparisons of word forms?

lpetrich · Dec 2, 2023

To see how this can happen, let us consider Dolgopolsky-style word-form reduction, collecting all voicings of consonants and ignoring vowels. Initial vowels and vowel-vowel sequences become H. Let us first consider a very easy example: "two".

Most Germanic: T, TW, German: CW (C = affricate, like ts)
Latin: TH, Romance: Italian TH, Spanish TS, French T, ...
Celtic T, Greek TH, Balto-Slavic TW
Indo-Iranian: Persian, Kurdish T, Sanskrit TW, later Indic TW, T, TK, P

The method would get the first consonant correct - T - though it would be confused by German C - but the following ones would be more difficult. W? H?

"Three": Germanic, Latin-Romance, Celtic, Balto-Slavic, early Indo-Iranian TR, except for Polish TS, Persian, Kurdish S, later Indic often TN.

So one would get T there also, with the second consonant likely R.

"Four": Germanic PR, PWR, PYR, Latin KWTWR, Romance KWTR, KTR, Romanian, some Sardinian PTR, Irish KHR, Welsh PTWR, Greek TSR, Lithuanian KTR, Slavic CTR, some STR, Kurdish CR, Persian CHR, Sanskrit CTR, present-day Indic CR, some SR, Sinhalese HTR.

So one would be confused like crazy about the first consonant. P? K? T? C? S? H? Though not M, N, L, R.

Armenian is a weird one, being 2, 3 HRK, 4 CRS, though Albanian isn't much different from the others: 2 T, 3 TR, 4 KTR, and Tocharian is somewhat different from the others: 2 W, 3 TR, 4 STWR

One has a risk of both false positives and false negatives, as is readily evident here.

These can be resolved with sound correspondences, but if one is doing exploratory work, one does not know these correspondences in advance, and grouping by voicing is a crude form of sound correspondence.

Swammerdami · Dec 3, 2023

lpetrich said:
For Germanic, (English, (North Germanic, other West Germanic: ((Frisian, Dutch), German))) -- which is just plain absurd. English is a West Germanic language and is closest to Frisian.

Two linguists wrote a book arguing that English is a North Germanic language! -- that Middle English descends from Old Norse, not Old English. Even if wrong, their book is an interesting read.

English: The Language of the Vikings

Abstract: English as North Germanic: Modern English is Modern Norse It is well known that Middle English (and its descendent Modern English) has a large number of words of Scandinavian origin. This is conventionally attributed to language contact and

www.academia.edu

Perhaps Old Norse and Old English speakers allied themselves in response to the Norman invasion ("my enemy's enemy is my friend"); it was then politically expedient to develop a new dialect with roughly equal inheritance from the two source languages.

lpetrich said:
These can cause trouble for tree algorithms, but such algs ought to be able to avoid gross mistakes like making English, French, and Ukrainian outliers. So there must be something about ASJP that causes trouble. Overly simplistic comparisons of word forms?

IIRC, the Gray-Atkinson study of the I-E tree had a defect. Suppose that a 3-way (A B C) doesn't resolve easily into two 2-ways, i.e. both (A (B C)) and ((A B) C) are possible. Gray-Atkinson force a choice; this will push back the date of proto-ABC.

lpetrich · Dec 3, 2023

Computational phylogenetics - tree-finding algorithms have been in use for some decades to find family trees of genes and proteins, and they have also been used on languages.

Distance matrices in phylogeny - a common kind of method is to create a matrix of distance values for amounts of difference, then find a tree from that matrix.

For dialect continua, one may want to work out relationships in other ways, like

Principal component analysis and

Multidimensional scaling - PCA is fitting data to a multidimensional ellipsoid and then finding the directions and lengths of the axes. MDS is taking a vector for each data point, finding the distances between them, and making those distances fit the distance matrix for those data points. I've found some papers about using PCA and MDS on dialect continua, but only two papers on using either technique for Indo-European and superfamilies of it, by Alexander Kozintsev. I'd mentioned them earlier, and I'd also mentioned a paper comparing various phylogenetic algorithms for finding the Indo-European family tree. They mostly agreed on the branchings in the subfamilies - oldest recorded members used - but not much in the overall branching, agreeing on a Greek-Armenian branch but not much else. Was Proto-Indo-European a dialect continuum for a while?

Another candidate for a dialect continuum is (Narrow) Altaic. In it, Turkic - Mongolic and Mongolic - Tungusic consistently turn out closer than Turkic - Tungusic. Triangulation supports agricultural spread of the Transeurasian languages | Nature proposes that their early speakers lived in locations

(west) - Turkic - Mongolic - Tungusic - (east)

thus being consistent with Proto-(Narrow)-Altaic being a dialect continuum.

Jokodo · Dec 3, 2023

lpetrich said:
John Bengtson: "Burushaski and the Western Dene-Caucasian Language Family: Genetic and Cultural Linguistic Links"

WIth this proposed Dene-Caucasian family tree, split time 10,660 BCE (from glottochronology - treat as very approximate)

East: Sino-Dene - 9000 BCE - Sino-Tibetan, Na-Dene

West - 8330 BCE

6570 BCE - Burushaski, Yeniseian

Basque, North Caucasian

Cardium pottery makers were the first farmers in Basque country, and they arrived around 5700 BCE. They got their name from a style of decorating pottery, by pressing cockle shells into it to make heart-shaped imprints. Cockles are a kind of bivalve.

The first cardium-pottery makers lived in NW Greece near Albania around 6400 - 6200 BCE, and they were preceded by Neolithic farmers who reached the Greek islands in 7000 BCE. Their predecessors, Pre-Pottery Neolithic B spread across the Fertile Crescent and southern Anatolia, starting in 8800 BCE, and those ones' predecessors, Pre-Pottery Neolithic A, spread across the Levant, starting in 10,000 BCE, with the invention of agriculture. Their foraging but sedentary predecessors, the Natufians, go back to 13,000 BCE.

So the timing is about right for a Basque - NC split in the northern Fertile Crescent around 8000 - 7000 BCE.

JB finds cognate words in Basque, NC, and Burushaski for some kind of grain and also for threshing and threshing floor. Threshing - removing grain from the rest of the plant by beating it or crushing it. For beating the grain, a flail was often used, a stick with another stick loosely attached to its end. For crushing it, a hoofed animal stepping on it, dragging a board with embedded stone flakes across it, or using a roller on it. Afterwards is Winnowing - separating out the now-loosened grain from the rest of the plant, the chaff. Premodern winnowing was often throwing this mixture up into the air on a windy day, with the wind blowing away the chaff.

Machines for doing threshing and winnowing were invented in the 18th cy., and combines, machines that thresh and winnow grain crops as they reap those crops, have been used for well over a century.

This also means that the Basque-NC dispersion also involved the Burusho people, who now live in the mountains in the northern end of Pakistan.

Also cognate across Basque-NC-Burushaski are some words for domestic animals and dairying.

I may have said this before, but NC-Burushaski may tell us more about history than prehistory. Over the course of the last, say, 3500 years, there have been several empires spanning the area from the Caucasus to where Burushaski is spoken - Parthians, Seleucids, Sassanids, to name just a few of the better known ones. Empires have a habit of translocating subjugated populations - the Romans posted Sarmatians in Britain, the Hungarians fortified their Western border with Pechenegs from their Southern border, the Soviets forcibly resettled Volga Germans and Crimean Tatars from modern Ukraine to Central Asia, and Ashenazi Jews to the "Jewish Autonomous Oblast" in Siberia, along the Mongolian border.

I find it entirely plausible that Parthians, Seleucids or Sassanids resettled a population from close to the modern day range of North Caucasian to the Pamir region, and that their parent population, or what remained of them, subsequently got linguistically assimilated by Azeris or Persians. In other words, Burushaski may well be a sister clade to North Caucasian without demonstrating that their family was once widespread all over Southwest Asia.

Jokodo · Dec 3, 2023

lpetrich said:
However, Proto-Georgian-Zan language mentions date estimates Proto-Kartvelian 3850 BP and Karto-Zan 2700 BP -- Karto-Zan is close, but Proto-Kartvelian is far off.

Automated Similarity Judgment Program - Welcome to The ASJP Database - "The database of the Automated Similarity Judgment Program (ASJP) aims to contain 40-item word lists of all the world's languages. A lexical distance can be obtained by comparing the word lists, which is useful, for instance, for classifying a language group and for inferring its age of divergence."

Only 40 meanings? I'd want 100 or 200 for good statistics.

I must say that I find the Automated Similarity Judgment Program a disappointment. In ASJP Downloads are some versions of a comprehensive family tree that was found with it.

I checked it against known cases, and I found very odd results.

For Germanic, (English, (North Germanic, other West Germanic: ((Frisian, Dutch), German))) -- which is just plain absurd. English is a West Germanic language and is closest to Frisian.

For Romance, French is an outlier rather than being grouped with Catalan, Spanish, Portuguese, and Italian as Continental Western Romance langs, with Sardinian and Romanian being outside that group. It has a big fat omission, however: Latin.

The Slavic langs are also odd: ((((((Serbocroatian, Slovenian), (Czech, Slovak)), Polish), Bulgarian), (Belarusian, Russian)) Ukrainian) -- actually S Slavic: ((Serbocroatian, Slovenian), Bulgarian), W Slavic: ((Czech, Slovak), Polish), E Slavic: (Russian, Belarusian, Ukrainian)

But it does get Balto-Slavic correct: (Slavic, (Lithuanian, Latvian))

Slavic was arguably one big dialect continuum at least until the 11th century, when the Germanization of Eastern Austria and the Magyarization of Hungary produced a barrier where there had been none before. What's more, it very much seems that all of Slavic was still mutually intelligible not much before: when Cyril and Method were translating the Bible into the dialect of the Slavs in the hinterland of Thessaloniki (Greece), they apparently had no trouble using that to preach in Moravia in the 9th century, and in the 10th, the Rus' of Ukraine and Russia adopted the language of their translation as their liturgical register. Slavic even shows evidence of regular sound shifts occurring more or less at the same time in its entire then territory long after it spread out, without a BBC our an Academia Real to push them

Even today, there are isoglosses that transcend the South/ West/ East groupings even today: Slovak and Czech (but not Polish) among the West Slavic languages as well as Ukrainian and Belorussian (but not Russian) in the East have voiced [h] where the other languages have [g] - and apparently so do some (Northern) Slovenian dialects. In the East, Belorussian and Ukrainian have been strongly influenced by Polish, while Russian has a heavy South Slavic vocabulary layer from Church Slavonic, sometimes leading to doublettes where the same Proto-Slavic root coexists in two forms, one with the expected East Slavic sound shifts and one with the South Slavic ones. I believe "vlast/volost" is an example: both nouns come from a root "to rule" but the East Slavic one (volost) has come to mean land ownership and the South Slavic one is reserved for more abstract meanings, like ruling a country, or the Lord ruling the Heavens.

Incidentally, Mr. Putin and Mr. Zelenski's first names are both basically the same, except Mr. Putin uses the Macedonian version. The "real" Russian version would be "Volodymir" too.

Tl;dr: a bracketing that doesn't conform to the traditional West/East/South scheme isn't necessarily wrong, it may just give different weights to different types of features.

bilby · Dec 3, 2023

Swammerdami said:
Two linguists wrote a book arguing that English is a North Germanic language! -- that Middle English descends from Old Norse, not Old English. Even if wrong, their book is an interesting read.

English: The Language of the Vikings

Abstract: English as North Germanic: Modern English is Modern Norse It is well known that Middle English (and its descendent Modern English) has a large number of words of Scandinavian origin. This is conventionally attributed to language contact and

www.academia.edu

Perhaps Old Norse and Old English speakers allied themselves in response to the Norman invasion

England didn't really exist prior to the Norman Invasion, dedpite the insistence of romantic, hyper-patriotic, royalist Victorians, who recast all of history before the nineteenth century to fit a British Imperial perspective.

Even today, many dialects spoken in the North of England are obviously and clearly more related to Norwegian than to Dutch, and are frequently unintelligible to southern English speakers.

Geordies not only pronounce English words in ways that are barely recognisable to southerners; They have their own vocabulary of words that clearly owe a lot more to their Norse heritage than to the Anglo-Saxons of the southern part of what is now England.

The Danelaw still casts a long shadow over the North, and particularly the North East. It's probably more sensible to think of pre-Norman, post-Roman* "England" as two countries, or better yet to think of the Early Medieval island of Great Britain as three countries - Scotland, Danelaw, and England.

The idea that England might not have dominated the island was anathema to the Victorians, but is a far better reflection of reality than the concept of a beleaguered England that bravely fought off viking interlopers. The Vikings weren't interlopers, any more than the Angles, Saxons or Jutes were. They lived there, albeit (like the Scots in later centuries) north of a border whose location went back and forth as various kings became powerful enough to shift it.

Immediately before the Norman Conquest (and I am talking weeks, not months or years), King Harold pushed the border a long way north with his victory at Stamford Bridge; The Normans were much more dedicated record-keepers than the peoples they conquered, so this temporary state of affairs has tended to become enshrined as the "pre-Conquest situation", without the realisation that it was novel and might plausibly have been ephemeral. Certainly it wasn't the norm for the preceeding several centuries.

Modern English is a mess, and trying to unpick "the" precursors to it is futile. Old English owed plenty to Old Norse; But clearly also owed plenty to various Germanic languages, which themselves were related to Old Norse in various ways.

Geoffrey Chaucer would likely be better able to read Modern Icelandic than he would Modern English.

Previously known as the "Dark Ages", now more reasonably as "Early Medieval"

Jokodo · Dec 3, 2023

bilby said:
Swammerdami said:

Two linguists wrote a book arguing that English is a North Germanic language! -- that Middle English descends from Old Norse, not Old English. Even if wrong, their book is an interesting read.

English: The Language of the Vikings

Abstract: English as North Germanic: Modern English is Modern Norse It is well known that Middle English (and its descendent Modern English) has a large number of words of Scandinavian origin. This is conventionally attributed to language contact and

www.academia.edu

Perhaps Old Norse and Old English speakers allied themselves in response to the Norman invasion

Click to expand...

England didn't really exist prior to the Norman Invasion, dedpite the insistence of romantic, hyper-patriotic, royalist Victorians, who recast all of history before the nineteenth century to fit a British Imperial perspective.

Even today, many dialects spoken in the North of England are obviously and clearly more related to Norwegian than to Dutch, and are frequently unintelligible to southern English speakers.

Geordies not only pronounce English words in ways that are barely recognisable to southerners; They have their own vocabulary of words that clearly owe a lot more to their Norse heritage than to the Anglo-Saxons of the southern part of what is now England.

The Danelaw still casts a long shadow over the North, and particularly the North East. It's probably more sensible to think of pre-Norman, post-Roman* "England" as two countries, or better yet to think of the Early Medieval island of Great Britain as three countries - Scotland, Danelaw, and England.

The idea that England might not have dominated the island was anathema to the Victorians, but is a far better reflection of reality than the concept of a beleaguered England that bravely fought off viking interlopers. The Vikings weren't interlopers, any more than the Angles, Saxons or Jutes were. They lived there, albeit (like the Scots in later centuries) north of a border whose location went back and forth as various kings became powerful enough to shift it.

Immediately before the Norman Conquest (and I am talking weeks, not months or years), King Harold pushed the border a long way north with his victory at Stamford Bridge; The Normans were much more dedicated record-keepers than the peoples they conquered, so this temporary state of affairs has tended to become enshrined as the "pre-Conquest situation", without the realisation that it was novel and might plausibly have been ephemeral. Certainly it wasn't the norm for the preceeding several centuries.

Modern English is a mess, and trying to unpick "the" precursors to it is futile. Old English owed plenty to Old Norse; But clearly also owed plenty to various Germanic languages, which themselves were related to Old Norse in various ways.

Geoffrey Chaucer would likely be better able to read Modern Icelandic than he would Modern English.

Previously known as the "Dark Ages", now more reasonably as "Early Medieval"

Did you perchance forget Wales as a fourth country? And weren't at least the lake district and Cornwall also still dominantly (Brythonic) Celtic at the time of the Norman conquest, even if they had been under Anglosaxon and/ or Norse suzerainty at various times?

Incidentally, this entire situation reminds me much of how Austria treats its Slavic heritage: Austria East of a line from Linz to Lienz (that is, a majority of its current territory, including the capital) was mostly Slavic in the early medieval period, with the Avars (who had originally come from East Asia in the 6th century, at least the part of them that gave them the name) exerting more or less direct control over the Slavic majority (more in some places, at some times, than others). In the 8th century, the Southern part of that area came under Bavarian suzerainty, and at the turn of the 9th, the Franks seized the Northern part from the Avars, but Germanization was a drawn out process that likely didn't result in a German speaking majority before the mid-11th century, probably later in some places. Guess how much of that is mentioned in our official history schoolbooks!

bilby · Dec 3, 2023

Jokodo said:
Did you perchance forget Wales as a fourth country? And weren't at least the lake district and Cornwall also still dominantly (Brythonic) Celtic at the time of the Norman conquest, even if they had been under Anglosaxon and/ or Norse suzerainty at various times?

For sure, Wales*, Cornwall, and parts of what is today Cumbria were separate and owed more to Lesser Britain (Ireland) than to the rest of Great Britain.

And these areas also have significant viking settlement, particularly parts of Ireland.

None of this lends itself to a nice simple tale to relate to bored teenagers as part of instilling them with patriotic fervour for England and her global empire on which the sun never sets, though. So what little evidence survived of it was largely memory holed by the Victorians, who are a massive impediment to any understanding of English history before about 1870.

* If it even exists, and isn't just a made up place to scare children

Jokodo · Dec 4, 2023

bilby said:
Jokodo said:

Did you perchance forget Wales as a fourth country? And weren't at least the lake district and Cornwall also still dominantly (Brythonic) Celtic at the time of the Norman conquest, even if they had been under Anglosaxon and/ or Norse suzerainty at various times?

Click to expand...

For sure, Wales*, Cornwall, and parts of what is today Cumbria were separate and owed more to Lesser Britain (Ireland) than to the rest of Great Britain.

And these areas also have significant viking settlement, particularly parts of Ireland.

None of this lends itself to a nice simple tale to relate to bored teenagers as part of instilling them with patriotic fervour for England and her global empire on which the sun never sets, though. So what little evidence survived of it was largely memory holed by the Victorians, who are a massive impediment to any understanding of English history before about 1870.

* If it even exists, and isn't just a made up place to scare children

I personally find early medieval history is much more interesting than the knights and castles and crusades that came after. And people were much more mobile than we give them credit for - and I'm not talking about the "migration period", but the centuries that came after. To stay on the topic of the thread - what language tells us about (pre-) history - I mentioned the unexpected uniformity Slavic maintained as it was already spread over half of Europe. I'm a mere syntactician, not a historical linguist, but apparently Ukraine, Eastern Austria/Germany and Macedonia (the current Greek province, but also the other one) underwent the same sound shifts more or less at the same time 4-500 years after the language(s) radiated from wherever they radiated from, and I don't see how that is supposed to work if people don't talk (a lot), which in the absence of the telephone implies they moved a lot.

People also moved to and from the British Isles, including to Central Europe. The christianisation of Southern Germany and Austria in the 7th and 8th century was in no small degree carried out by Gaelic (and to a smaller degree, Anglosaxon) monks/ missionaries.

Virgil of Salzburg (or should I say Fearghal of Aghaboe) was apparently educated at Iona before ending up in what was then the Franks' Eastern periphery and becoming instrumental in the christianisation of the Slavic population of today's Southern Austria. Meanwhile, the

Tassilo Chalice, believed to have been donated by the last semi- independent duke of Bavaria (right before the Franks decided that a mostly loyal vassal wasn't good enough to guard their border with Central Asia, ie Avars and Bulgars) to one of the monasteries he founded to promote the christianitisation of Slavs in Northern Austria, seems to have been crafted by an artisan who learned his trade in England, if not produced in England itself.

lpetrich · Dec 5, 2023

I have a relative whom I shall call J. She majored in Russian literature in college, and she took a course in Lithuanian. Why? Because it is the most conservative of present-day Indo-European languages. Why that and not Sanskrit? I don't recall her answer. Nowadays, I'd ask about Hittite.

Let's look at Lithuanian. She noted nominative singular -s, notable in Latin, but surviving only in Greek, Lithuanian, Latvian, and when turned into r, Icelandic.

Checking on

Lithuanian grammar and

Lithuanian declension I find that that is indeed correct, but a less-conservative feature is limited survival of consonant-stem declensions. This type of declension is also present in the older IE langs: Hittite, Sanskrit, Greek, Latin, and early Germanic and Celtic, and it is reconstructed for PIE.

Thematic (o-stem): vilkas "wolf" < PIE *wlkwos & sniegas "snow" < PIE *snoigwhos
A-stem: plunksna "feather" < PIE *plunksneh2- < *plewk- "to fly, float" > Latin *plüma "feather" & galva < PIE *golHweh2- < *gelH- "naked, head" & migla "mist" < PIE *h3mighleh2- "mist, cloud"
I-stem: angis "viper" < PIE *h2engwhis, *h2ngwhe/i- "snake" & ugnis "fire" < PIE *h1ngwnis
U-stem: sûnus "son" < PIE *suHnus
N-stem: vanduô, vanden- "water" < PIE *wodr, *uden-, *wedn- & akmuo, akmen- "stone" < PIE *h2ekmon- & shuo, shun- "dog" < *k'won-
R-stem: sesuô, seser- "sister" < PIE *swésor- & dukte, dukter- "daughter" < PIE *dhugh2ter-
S-stem menuo, menes- "Moon, month" < PIE *meh1ns

Some PIE consonant-stem nouns were made vowel-stem ones:

Consonant to i-stem: ausis "ear" < PIE *h2ows & akis "eye" < PIE *h3ekw- & nósis < PIE *neh2s & dantis < PIE *h3donts & shirdis "heart" < PIE *kerd- & naktis "night" < PIE *nokwts
Consonant to e-stem: saule "Sun" < PIE *soh2wl- & zeme "land, Earth" < PIE *dheghôm, *ghm- & upe "river" < *h2ap- "water"
Consonent to thematic (a-stem): nagas "(finger)nail" < PIE *h3noghs

I used Appendix:Lithuanian Swadesh list - Wiktionary, the free dictionary - I expected many of the words to be inherited from PIE and I was right. I also expected many of them to have originally been consonant-stem words, and I was also right.

So while Lithuanian has 7 out of the 8 usually-reconstructed noun cases for PIE, some of its consonant-stem words were made vowel-stem ones.

lpetrich · Dec 6, 2023

Turning to verbs, Lithuanian ones are somewhat distant from the PIE ones. The language's personal endings:

-u, -i, -; -me, -te, - (3rd person singular and plural are the same)

The vowels combine with stem vowels - I looked at a lot of examples from Cool Lithuanian Verb Conjugator | Cooljugator.com and from Wiktionary's Lithuanian Swadesh List
-u, -i, -a; -ame, -ate, -a
-au, -ai, -o; -ome, -ote, -o
-iau, -ei, -e; -eme, -ete, e
-iu, -i, -ia; -iame, iate, -ia
-iu, -i, -i; -ime, -ite, -i

buti, "to be" is irregular:
Present: esu, esi, yra; esame, este, yra
Past: buvau, buvai, buvo; buvome, buvote, buvo

The Indo-European is-be suppletion is evident there also. The 3rd-person form of the present is yra and not esa, as one might expect from the rest of the conjugation.

Looking at other verbs, the past tense is often not predictable from the present tense, as one might expect from survivals of the PIE verb aspects: imperfective -> present, perfective -> past. One survival is the -n- infix in the present tense of some verbs but not the past tense, like for rasti "to find": present 3P randa, past 3P rado. But unlike the Germanic languages, the stem vowels seem completely leveled in every example that I found that did not look suppletive, meaning that PIE verb ablaut was gone.

Lithuanian also has a continuous past with -dav- and a future with -s-, and lots of compound tenses.

Reflexives are done with -s(i) -- *se tacked on, like in Romance, North Germanic, and Slavic. Modern Greek is the only present-day IE language to keep the old Indo-European mediopassive.

So Lithuanian is some way off from Proto-Indo-European.

Jokodo · Dec 7, 2023

Well, of course it is different: PIE is an extinct language

Although if you asked me to name the living language that's closest to PIE, I'd probably go for Lithuanian (not before telling you it's a rather meaningless question without some context what kind of feature you're interested in). Latvian is also fairly conservative structurally I believe, but their lexicon has rin amok with taboo replacements. You know how Germanic, Slavic and Baltic have lost their inherited IE word for "bear" (cf Greek "arctos", Latin "ursus", replacing it with euphemisms like "Brown one", or "Honey knower" ("medvjed"), or whatever it is the Balts use, because calling his name was apparently considered sacriligeous, or bad luck? Well, Latvian does that same thing for "son" and "sister"!

Certainly, reconstructed Balto-Slavic is, under anyone's reconstruction, much closer to Baltic than to Slavic. It would be easy to assume that the changes that make them distinct accrued over a prolonged period of time, say since 2000 or 1500 BC. This, however, doesn't appear to be the case: there is a layer of presumed Germanic loans in German - unless you have some non-standard assumptions about the Slavic homeland, or some nonstandard assumptions about the timing of the spread of East Germanic - can't have easily entered the proto- language much before the second century AD, and they shows almost all of them. There are Slavic derived placenames in parts of Austria and Greece that were Slavicized around 600 AD and Germanized/re-Hellenized not much before 800 and 8th century transcriptions of Slavic proper names in Byzantine sources that show that certain sound shifts that are today shared by all extant Slavic languages hadn't completed by 800 (but the Bible translation of Cyril and Method from the late 9th show the shifted version). And we have the name of Charlemagne (German "Karl") that was adapted into Slavic meaning "king" as "kralj" (Serbocroatian), "král" (Czech), "król" (Polish), "korol" (Russian) etc. There's also Romance loans (Dalmatian place names) and Slavic land into Romanian, and similar lines of evidence.

It turns out that a lot of what makes Slavic Slavic which we might otherwise have pushed way into prehistory, happened in the first millennium AD, and the ancestor of Proto-Slavic might have looked much like just another Baltic language (presumably, Baltic had already split into several languages) as recently as 2000, at most 2500 years ago.

Maybe this can serve as a cautionary take against giving to much weight to glottochronology and similar approaches where we don't have external evidence a a calibration.

Jokodo · Dec 7, 2023

Here are some specific examples:

Liquid metathesis, ie Ca{l,r}C->C{l,r}aC (with slightly different or outcomes in Polish and East Slavic): Rab (Croatian island) from Romance "Arbe"; "Gardiki" (Slavic toponym in Peleponnese), the modern South Slavic form would be something like "gradec"; "ardagastos", "dargameros" as glosses of personal names that would be "Rad(o)gost", "Dragomir" today. Karl->kralj etc

Second regressive velar palatalization: k,g,h->ts,z,s: Zilja (German "Gail"), river in Carinthia, from a pre-Roman name *gīla or similar; the German version indicates that this must have been shifted in Slavic, if the local Romance dialect had done the shifting (cf modern Italian) Wr wouldn't expect [g] in German. It even indicates either that the shift happened after the 8th century when Bavarians showed up in the region, or that Romance remained sufficiently widely spoken under Slavic rule that the Bavarians could have plausibly picked up the Romance rather than the Slavic name.

Progressive palatalization: k->ts *after* front vowels: "Gardiki" in Greece above; "Piesting" (modern German, "pesteniccha" in medieval documents), cf Slovenian "peščenica", "sandy creek", in Lower Austria. The area was firmly controlled by Avars until ~800.

lpetrich · Dec 7, 2023

Expanding on the theme of computational phylogenetics, an important part of it is finding alignments of gene sequences and protein sequences. Part of that is finding which insertions and deletions (indels) produce the least sequence mismatch. One can get a perfect match with complete-sequence indels, so one sets "gap penalties" for the amount and size of indels.

But for historical linguistics, the problem is much more difficult, in all three of phonology, morphology, and semantics.

Phonology: that's sound correspondences, and there are lots of nontrivial ones. Armenian erku "two" is sometimes cited, but it's an extreme case. More typical is in IE words for "four": most forms are cognate, but they start with p, f, kw, k, t, tS, S (S = sh). Not very obvious, but they are related by regular correspondences. Some common ones are p ~ f, ke/i ~ tSe/i, kw ~ p, s ~ h, h ~ ().

Aharon Dolgopolsky's simplified phonology and variations of it, grouping consonants by point of articulation and ignoring vowels, is a crude model of sound correspondence, one that produces both false negatives and false positives, but it is useful for automated comparisons. False positives like Latin deus ~ Greek theos "god", both TH in this scheme, and false negatives like words for 4 across the IE langs.

One can use a hint from bioinformatics and use transition tables, and those have been worked out for nucleotides and amino acids. For nucleotides, there are two one-ring nucleobases (thymine/uracil, cytosine) and two two-ring ones (adenine, guanine), and one-ring <-> one-ring and two-ring <-> two-ring are much more likely than one-ring <-> two-ring.

The same could be worked out for phonology, including simplified phonology, and someone might have tried to do that.

Morphology - word compounds, derivation, reanalysis

Derivation? Romance has some examples of that.

Latin sôl "Sun" > Romanian soare, Italian sole, Catalan, Spanish, and Portuguese sol, but French soleil. The French form is from reconstructed Vulgar Latin *soliculum, sol with diminutive ending -iculum

Latin auris "ear" but the Romance forms are all from auricula -- auris -icula (diminutive) in turn becoming ôricula: Romanian ureche, Italian orecchia, Spanish oreja, Portuguese orelha, Catalan orella, French oreille

Latin avis "bird" > Spanish, Portuguese ave, Catalan au ... avis > aucellus "little bird" > Italian uccello, French oiseau

However, Latin passer "sparrow" > Romanian pasare ... passer > *passarum > Spanish pajaro, Portuguese passaro -- all "bird" in general

Reanalysis? English pea is from earlier pease, but that was reinterpreted as a plural. "a napron" > "an apron" and "a nadder" > "an adder".

Semantics - meanings. Word forms can have odd histories, jumping from meaning to related meaning.

How does one get from the city of Rome to love affairs?

It starts out with the people of Rome conquering much of their known world and being fondly remembered by many people long after the fall of their empire. Latin, the language of the Empire, survived as a highbrow language, with ordinary language being called "Roman": Romanicus > Old French Romanz.

That gives rise to the first sense of the word: the Romance languages' name.

A millennium ago, when people started writing in those languages, they wrote a lot of stories in them, and the word got transferred to book-length stories: French roman, etc.

But what does one like to tell stories about? Love affairs are a favorite subject, so for present-day English speakers, a romance is a love affair.

As to book-length stories, the usual English word is "novel", and that's a borrowing from Old French that means "new". "Novel" is also used as an adjective, with noun form "novelty".

But that's a bit on the unusual side.

Jokodo · Dec 8, 2023

Jokodo said:
Well, of course it is different: PIE is an extinct language Although if you asked me to name the living language that's closest to PIE, I'd probably go for Lithuanian (not before telling you it's a rather meaningless question without some context what kind of feature you're interested in). Latvian is also fairly conservative structurally I believe, but their lexicon has rin amok with taboo replacements. You know how Germanic, Slavic and Baltic have lost their inherited IE word for "bear" (cf Greek "arctos", Latin "ursus", replacing it with euphemisms like "Brown one", or "Honey knower" ("medvjed"), or whatever it is the Balts use, because calling his name was apparently considered sacriligeous, or bad luck? Well, Latvian does that same thing for "son" and "sister"!

Certainly, reconstructed Balto-Slavic is, under anyone's reconstruction, much closer to Baltic than to Slavic. It would be easy to assume that the changes that make them distinct accrued over a prolonged period of time, say since 2000 or 1500 BC. This, however, doesn't appear to be the case: there is a layer of presumed Germanic loans in German - unless you have some non-standard assumptions about the Slavic homeland, or some nonstandard assumptions about the timing of the spread of East Germanic - can't have easily entered the proto- language much before the second century AD, and they shows almost all of them. There are Slavic derived placenames in parts of Austria and Greece that were Slavicized around 600 AD and Germanized/re-Hellenized not much before 800 and 8th century transcriptions of Slavic proper names in Byzantine sources that show that certain sound shifts that are today shared by all extant Slavic languages hadn't completed by 800 (but the Bible translation of Cyril and Method from the late 9th show the shifted version). And we have the name of Charlemagne (German "Karl") that was adapted into Slavic meaning "king" as "kralj" (Serbocroatian), "král" (Czech), "król" (Polish), "korol" (Russian) etc. There's also Romance loans (Dalmatian place names) and Slavic land into Romanian, and similar lines of evidence.

It turns out that a lot of what makes Slavic Slavic which we might otherwise have pushed way into prehistory, happened in the first millennium AD, and the ancestor of Proto-Slavic might have looked much like just another Baltic language (presumably, Baltic had already split into several languages) as recently as 2000, at most 2500 years ago.

Maybe this can serve as a cautionary take against giving to much weight to glottochronology and similar approaches where we don't have external evidence a a calibration.

What I'm trying to say is: We have multiple lines of evidence: Romance loans into Slavic (eg placenames from Carinthia to Dalmatia), Slavic loans into Balkan Romance, Germanic loans into Proto-Slavic, Greek and Latin glosses of Slavic personal names, Germanized and Hellenized Slavic toponyms... from the 8th century and earlier, which allow us to date, Slavic sound changes at least coarsely. Buy this only works because these things happened at the edge of historicity: we have written sources that give us a good idea when the relevant contact situations might have arisen. Even so, every single line of evidence can be debated though: if Thracian was a Balto-Slavic language, as some researchers have indeed suggested, many presumed Slavic loans into Balkan Romance could be substrate influences; or if we are wrong about our idea of where the Slavic homeland was, ie if it was in Northern Poland our Czechia rather than Southeastern Poland and Western Ukraine, Germanic losns could have entered much earlier and independent of the spread of Gothic; for all we know, implausible as it may be, some precursor of Slavic could have been spoken in Roman soil in Noricum (Austria and Slovenia) - that Noricum was Celtic is an assumption based on its belonging to an archeological culture shared with peoples we know to have spoken Celtic languages, but pots and people... its only when different lines of evidence are combined that we can confidently draw a rough picture of what might have happened.

And this is early medieval Central Europe, with Franks, Byzantines, Arabs giving us written contemporary sources in Latin, Greek and Arabic. Imagine the uncertainties involved when what we are trying to reconstruct happened in Bronze Age, or paleolithic, Siberia instead! Good luck dating a contact situation between PIE and Uralic based on linguistic evidence alone when Slavic vs Baltic shoes us how wildly the rate of change can vary! Good luck teasing apart contact from inheritance 5000 years before any of languages involved got written down when Latvian shows what kind of core vocabulary substitutions superstition alone can produce!

Bomb#20 · Dec 9, 2023

Jokodo said:
bilby said:

For sure, Wales*, Cornwall, and parts of what is today Cumbria were separate and owed more to Lesser Britain (Ireland) than to the rest of Great Britain. ...

Click to expand...

... People also moved to and from the British Isles, including to Central Europe. ...

Case in point. "Lesser Britain" isn't Ireland; it's Brittany. A lot of Britons moved there during the Anglo-Saxon takeover.

lpetrich · Dec 12, 2023

From Wiktionary, the free dictionary

Proto-Germanic *hlaibaz "bread" > Old English hlâf "bread" > English "loaf" meaning "block of bread or something similar"
Also > Proto-Slavic *khlebu > Russian khleb among others

Its origin is obscure, but it may be connected to Greek klibanos, kribanos, kribanon "oven for baking bread".

Another odd semantic shift: bakery business > nobility:

"lord" < OE *hlâfweard "loaf warden" - "bread guardian"
"lady" < OE *hlâefdîge "loaf digger/kneader" - "bread maker"

Looking at English "bread" itself, it is < PGmc *braudan "bread" with origin obscure. It may be connected with
English "to brew" < PGmc *brewwanan "to brew" < PIE *brewh1- "to boil, brew"

Irish arán < Old Irish arán "bread", Irish bairin "loaf" < Old Irish bairgen "bread, loaf", Welsh bara "bread", Armenian hac "bread" < Old Armenian hac "bread" - all with obscure origin

Latin panis also has obscure origin, though its descendants are well-represented in the Romance langs.

Greek artos - likely pre-Greek - sitos - obscure - Modern Greek psomi < Greek psomion "little morsel"

Sanskrit rotikâ "bread" has descendants like Hindi roti "bread, flatbread".

So most Indo-European words for bread are persistent for some 2,000 years, but cannot be traced beyond that with that meaning. Did Proto-Indo-European speakers never make bread? Or did they and did they end up acquiring a lot of bread making from the people they conquered?

lpetrich · Dec 12, 2023

So I looked in the Middle East, the place where wheat was domesticated, for further clues.

The Semitic languages? Hebrew has lehhem "bread" and Aramaic lahhmâ "bread, food", but its Arabic cognate, lahhm, means "meat".

The Arabic word for bread, khubz, is close to South Arabian and Ethiosemitic words like Ge'ez hhabast. Borrowing? Some inherited word that dropped out elsewhere?

The Amharic word is dabo; Amharic is a present-day Ethiosemitic language.

So it's hard to tell.

But Egyptian had a word for bread, te, going back some 4,000-5,000 years, and Sumerian also had a word for this foodstuff, ninda, going back around as far.

History of bread - this foodstuff goes back well some 9,100 years among early Neolithic farmers, and some evidence is even older among the Mesolithic (populous sedentary Paleolithic) Natufian people.

Language as a Clue to Prehistory

Contributor

Contributor

Contributor

Squadron Leader

Contributor

Veteran Member

Veteran Member

Fair dinkum thinkum

Veteran Member

Fair dinkum thinkum

Veteran Member

Contributor

Contributor

Veteran Member

Veteran Member

Contributor

Veteran Member

Contributor

Contributor

Contributor