Language as a Clue to Prehistory


Jul 27, 2000
Eugene, OR
Basic Beliefs
One might first ask how that can be possible. Pots aren't people, as archeologists warn us, and words aren't people, either. Consider all the people who have learned English. It is the third most spoken first language, after Chinese and Spanish, and the first most spoken second language. It is spoken by people all over the world, people of all races and almost all ethnicities.

But most people speak the language(s) that they grew up with, and that is how language can offer clues to prehistory.

Even so, people can borrow words, or more precisely, copy them, from other languages. Linguistic purists sometimes try to fight borrowing, like French linguistic purists who oppose "franglais" ("Frenglish"), English words borrowed into French. Like trying to say "le fin de semaine" instead of "le weekend". But phrases like that are calques or loan-translations, formations using existing linguistic resources.

So why don't people's languages get all mixed up?

To see what happens, we must look at history. Fortunately, speakers of some languages have left long paper trails. Or papyrus trails or clay trails or rock trails, as the case may be. So let us look at some of these trails.
The people of the ancient Roman Republic and Roman Empire spoke Latin, but in the territory of that former state, nobody speaks Latin as a primary language. In much of that territory, people speak languages that are very similar to Latin in some ways -- the Romance languages. Much of their vocabulary is Latin-derived, as is much of their grammar.

Let us count from one to ten, using this resource: ūnus duo trēs quattuor quinque sex septem octō novem decem

Latin ūnus duo trēs quattuor quinque sex septem octō novem decem

Italian uno due tre quattro cinque sei sette otto nove dieci
Spanish uno dos tres cuatro cinco seis siete ocho nueve diez
Portuguese um dois três quatro cinco seis sete oito nove dez
French un deux trois quatre cinq six sept huit neuf dix
Romanian unu doi trei patru cinci şase şapte opt nouă zece
Also numerous similar-looking words in dialects and non-national languages like Rumansch, Occitan, and Sardinian.

I'll do an approximate transcription of their pronunciation:

Latin: ûnus, dwô, três, kwattwor, kwînkwe, seks, septem, oktô, nowem, dekem

Itaiian: uno, due, tre, kwattro, tshinke, sei, sette, otto, nove, dyetshi
Spanish: uno, dos, tres, kwatro, thinko, seis, syete, otsho, nweve, dyes
Portuguese: aN, dois, tres, kwatru, seNko, seis, sete, oitu, nove, des
French: aN, dö, trwa, katr, seNk, sis, set, üit, nöf, dis
Romanian: unu, doi, trei, patru, tshintshi, sase, shapte, opt, noua, zetshe

They look rather similar, though some Latin sounds have been turned into other sounds. Something like tomayto vs. tomahto for that vegetable, but carried further. Such sound changes tend to be very regular, and they can be used to distinguish additional cognates from borrowings.

Another interesting result is that commonplace sorts of words tend to be preserved very well, seldom borrowed or replaced by other word forms. However, this is not an absolute rule. The Latin word for dog was canis, with accusative or object case canem. Italian: cane, Spanish: perro, Portuguese: cão, French: chien, Romanian: câine. So some medieval Spanish speaker used "perro" instead of something like "can" and it caught on. The other medieval Romance speakers did not participate, and they kept their Latin-descended words.
Can and canino are valid terms for the same animal in continental Spanish as well. Perro is preferred informally and in the colonias, but the latin-related term wasn't lost. Many think that perro began as some manner of onomatopeia, since it doesn't seem to have any equivalents in neighboring languages.
Turning to grammar, some things look very different. Latin had several noun cases, while the Romance languages have much fewer. But there is more similarity than what one might at first think. The Romance languages carry over Latin's prepositions, and use some of them to substitute for cases. To illustrate, let us consider "horse's head" or "head of the horse".

Latin: caput equi
(equi: genitive or of-case for equus, "horse")

Italian: testa di cavallo
Spanish: cabeza de caballo
Portuguese: cabeça de cavalo
French: tête de cheval
Romanian: capul calului

All of them are "head of horse" or "head of-horse" with that word order. Most of the Romance languages use descendants of the Latin preposition "de" ("from") for "of". Note also the word-form substitutions. For "horse", the Romance words descend from Late Latin "caballus", becoming common in the last few centuries of the Western Roman Empire. The words for "head" are more complicated, with "caput" descendants surviving in some cases, but being replaced by descendants of Latin "testa" ("pot") in others. The descendants of Latin "caput" in Italian and French are "capo" and "chef", both meaning "leader", as English "head" sometimes does.

Although this is simpler than Latin, the Western Romance languages have some complications. In particular, they have a definite article, a word for "the", a word that Latin lacks. In most of the Romance languages, it is derived from Latin "ille", meaning "that". They also have a lot of contractions of definite articles with prepositions.

  • de + le = du, à + le = au
  • de + les = des, à + les = aux
The other combinations are written separately: de la and à la. Writing -ux instead of -us is a French spelling quirk.

  • de + el = del, a + el = al
The others are written separately here also.

  • il (masc. sg. bf cons.) di + il = del
  • lo (masc. sg. bf c clus.) di + lo = dello
  • l' (sg. bf vowel) di + l' = dell'
  • la (fem. sg. bf cons.) di + la = della
  • i (masc. pl. bf cons.) di + i = dei
  • gli (masc. pl. bf vwl/clus.) di + gli = degli
  • le (fem. pl.) di + le = delle
All contracted here. The prepositions da ("from"), a ("to"), in ("in"), and su ("on") are similar, though in becomes ne-.
I started off with Latin and Romance because this is a well-documented case of a language having descendants. The speakers of Latin in different parts of the former empire changed their language's sounds, its vocabulary, and its grammar, and changed them in different ways.

Returning to vocabulary, it is evident that much of the Romance languages' vocabulary is inherited from Latin. They also have lots of words borrowed straight from Latin itself, causing such doublets as Spanish cabeza, capital. In such doublets, the inherited word has sound changes that the reborrowed word lacks.

I now turn to word borrowing - what can easily be borrowed and what is seldom borrowed. The Romance languages are not very good for this, but one of their neighbors is: English. This language has numerous words borrowed from medieval Norman French, including lots of commonplace words. In particular, words for various animal meats are borrowings of Norman French words for those animals: beef, veal, pork, mutton. Modern French has boeuf, veau, porc, and mouton.

But there are plenty of words that English has inherited from pre-Norman Old English, and those include lots of function words and very commonplace words. Also, the grammar of English has continuity with the grammar of Old English. In particular, the past tenses and past participles are formed in essentially the same way, though English has lost most personal verb endings. English has two types of verb: strong verbs, with vowel shifts, and weak verbs, with -ed.

Several linguists have tried to find which sorts of words are seldom borrowed, and in the mid 20th cy., Morris Swadesh came up with a list of 200 word meanings, and later a list of 100. Appendix:Swadesh lists - Wiktionary has a list of 207 meanings, and  Swadesh list has the 100-meaning list. That article has a shorter 35-meaning list, and the  Dolgopolsky list is a 15-meaning list for super-conserved words. A separate list of 100 words is the  Leipzig–Jakarta list.

Dolgopolsky's list is I/me, two/pair, you (singular, informal), who/what, tongue, name, eye, heart, tooth, no/not, nail (finger-nail), louse/nit, tear/teardrop, water, dead.

The lists include such meanings as "name", "not", small numbers, pronouns, humanity and family relations, body parts, and common animals, plants, substances, natural phenomena, environment features, actions and properties, like basic colors.
Let us look at English again. The first recorded form of English is Old English or Anglo-Saxon. It is different enough from present-day English to be a foreign language to present-day speakers, but it nevertheless has a lot of continuity with present-day English.

Looking at several other northern European languages, one finds a lot of similarity in grammar and basic vocabulary, especially in earlier forms, forms like Old English and Old High German and Old Norse. But we do not have any written record of any possible ancestral language. Yet we nevertheless identify a Germanic family of languages, and we propose that it had an ancestor, Proto-Germanic.

So we are sure that Proto-Germanic existed, even though we have no written record of it, not even a single word of it. We even have a likely location for where it was likely spoken, the  Jastorf culture of roughly 500 BCE - 1 CE in northern Germany and southern Scandinavia.

Likewise, in Eastern Europe, one can identify a Slavic family of languages and even propose a Slavic homeland: southern Poland and western Ukraine.

The champion of written record in Europe is Greek. The first surviving writings in Greek are from the Mycenaean period, around 1500 - 1200 BCE. The destruction of the Mycenaean palace society around 1200 BCE ended that period of literacy, with the burning palaces baking the clay tablets that were used for recordkeeping. Greek speakers became literate again around 750 BCE, and they have kept their writing system all the way to the present.

Turning to the Middle East, we find some more related languages, like Hebrew and Arabic and Aramaic and Akkadian, and in nearby east Africa, Amharic and the like. Someone named them Semitic after Noah's son Shem, who got the Middle East.

Looking further, in India, we find some languages with a long written history and with an ancestor called Sanskrit, much like the Romance languages and Latin.

So we have both direct evidence of ancestral languages - Latin and Sanskrit - and plenty of indirect evidence - Germanic, Slavic, Semitic, and Indic.
Curiously, in English exactly the same process happened to exactly the same word. "Dog" is preferred informally, but the Germanic-related term "hound" is still a valid term for the same animal, and "hound" is the Germanic cognate of Latin "canis", and nobody knows where "dog" came from, and onomatopeia has been proposed.
Curiously, in English exactly the same process happened to exactly the same word. "Dog" is preferred informally, but the Germanic-related term "hound" is still a valid term for the same animal, and "hound" is the Germanic cognate of Latin "canis", and nobody knows where "dog" came from, and onomatopeia has been proposed.

Good point! Well, it is a common enough means for a neologism to come about, though there others; I imagine lpetrich will be getting around to some of them.
Berger calls language a compendium of our collective experience and history. When some phenomenon becomes critical enough that it needs to be symbolized, language emerges to describe it. If it falls out of necessity, it falls out of our language. So language used in prehistory should have been a reflection of the circumstances the speakers lived in - mostly whatever was necessary to survive in hunter-gathering conditions.

Wait, what are we talking about again?
I also wonder if you could do an analysis on complexity, in theory the simplest words should have emerged first:

God, Bread, Dog, Horse, Sun, Tree

And so on..
Latin itself had some relatives, notably Faliscan, Oscan, and Umbrian. They are known from inscriptions around roughly 500 - 100 BCE, much like early Latin. In fact, one can see some changes from early Latin to canonical Classical Latin.

 Lucius Cornelius Scipio (consul 259 BC) - his epitaph:

Honc oino ploirume cosentiont Romai
duonoro optumo fuise viro
Luciom Scipione. Filios Barbati
consol censor aidilis hic fuet apud vos,
hec cepit Corsica Aleriaque urbe,
dedet Tempestatebus aide meretod votam.

Classical Latin:
Hunc unum plurimi consentiunt Romae
bonorum optimum fuisse virum
Lucium Scipionem. Filius Barbati,
Consul, Censor, Aedilis hic fuit.
Hic cepit Corsicam Aleriamque urbem
dedit tempestatibus aedem merito.

English translation:
Romans for the most part agree,
that this one man, Lucius Scipio, was the best of good men.
He was the son of Barbatus,
Consul, Censor, Aedile.
He took Corsica and the city of Aleria.
He dedicated a temple to the Storms as a just return.

Notice some changes between Old Latin and Classical Latin.
Can we go further? As Romans conquered Greece, they adopted a lot of Greek culture, even identifying Greek deities with theirs. They noticed that their languages are rather similar, and they concluded that Latin is descended from Greek. It was about 2000 years before anyone made any improvements on this. By the seventeenth century, Europeans had become acquainted with India, and they noticed something odd about Sanskrit, the language of the Vedas, the oldest Hindu religious literature. It had a remarkable resemblance to Latin and Greek. By the early nineteenth century, this led to the recognition of the Indo-European family, or Indo-Germanic (Indogermanisch) as German speakers like to call it.

The earliest recorded Germanic languages and their reconstructed ancestor, Proto-Germanic.
A prefixed * denotes a reconstruction.
Old English án twá þrí féower fíf sex seofon eahta niɣon tíen
Old High German ein zwâ drî fior fimf sehs sibun ahto niun zehan
Old Norse einn tveir thrír fjórir fimm sex sjau átta níu tíu
Gothic ains twai þreis fidwor fimf saíhs sibun ahtau niun taíhun
Proto-Germanic *ainaz *twai *þrijiz *fiþwor *fimfi *seks *sibum *ahtō *niwun *tehun

Latin ūnus duo trēs quattuor quinque sex septem octō novem decem

Classical Greek heīs dúō treīs téttares pénte héx heptá oktṓ ennéa déka
Greek éna ðío tría téssera pénde éksi eftá oχtó ennéa ðéka

Russian odín dva tri četÿre pyat’ šest’ sem’ vósem’ dévyat’ désyat’
Czech jeden dva tři čtyři pět šest sedm osm devět deset
Serbo-Croat jèdan dvâ trî čètiri pêt šêst sëdam ösam dëvēt dësēt

Sanskrit éka dvá trí catúr páñca ṣaṣ saptá aṣṭá náva dáśa

With this reconstruction of their ancestral forms:
Proto-Indo-European *oynos / *sem *duwō *treyes *kwetwores *penkwe *sweks *septṃ *oktō *newṇ *dekṃ

Outside the Indo-European family:

Semitic languages:
Akkadian ištēn šena šalaš erbe h`amiš šiššu sebe samāne tiše ešer
Arabic wāḥid iθnān θalāθah ’arba‘ah χamsah sittah sab‘ah θamāniyyah tis‘ah ‘ašarah
Classical Hebrew ’aḥat štayim šâlôš ’arba‘ ḥâmêš šêš šeba‘ šᵉmôneh têša‘ ‘eser
Biblical Aramaic ḥaḏ tərên təlāṯā ʾarbəʿâ ḥamšâ šittâ šiḇʿâ təmānyâ tišʿâ ʿaśrâ
Amharic and hulät sost arat ammɨst sɨddɨst säbat sɨmmɨnt zät’äññ asɨr

There is some resemblance in the words for 6 and 7, but that is about it.
As to what Proto-Indo-European was like, linguist August Schleicher decided to illustrate what he had worked out:  Schleicher's fable

The Sheep and the Horses

[On a hill,] a sheep that had no wool saw horses, one of them pulling a heavy wagon, one carrying a big load, and one carrying a man quickly. The sheep said to the horses: "My heart pains me, seeing a man driving horses." The horses said: "Listen, sheep, our hearts pain us when we see this: a man, the master, makes the wool of the sheep into a warm garment for himself. And the sheep has no wool." Having heard this, the sheep fled into the plain.

Schleicher 1868 (rather Sanskrit-like)

Avis, jasmin varnā na ā ast, dadarka akvams, tam, vāgham garum vaghantam, tam, bhāram magham, tam, manum āku bharantam. Avis akvabhjams ā vavakat: kard aghnutai mai vidanti manum akvams agantam.

Akvāsas ā vavakant: krudhi avai, kard aghnutai vividvant-svas: manus patis varnām avisāms karnauti svabhjam gharmam vastram avibhjams ka varnā na asti.

Tat kukruvants avis agram ā bhugat.

Lehmann and Zgusta 1979 (other recent versions are much like this one)

Owis eḱwōskʷe

Gʷərēi owis, kʷesjo wl̥hnā ne ēst, eḱwōns espeḱet, oinom ghe gʷr̥um woǵhom weǵhontm̥, oinomkʷe meǵam bhorom, oinomkʷe ǵhm̥enm̥ ōḱu bherontm̥. Owis nu eḱwobh(j)os (eḱwomos) ewewkʷet: "Ḱēr aghnutoi moi eḱwōns aǵontm̥ nerm̥ widn̥tei". Eḱwōs tu ewewkʷont: "Ḱludhi, owei, ḱēr ghe aghnutoi n̥smei widn̥tbh(j)os (widn̥tmos): nēr, potis, owiōm r̥ wl̥hnām sebhi gʷhermom westrom kʷrn̥euti. Neǵhi owiōm wl̥hnā esti". Tod ḱeḱluwōs owis aǵrom ebhuget.
Language does get mixed up together.

Sapnish prhrases are common in conversation here in Seattle.

I listend to Spanish language CDs. As I went along I realized language expresses a culture mor than I reralized.

Spanish speaking immigrants can sound like they are speaking bad English when they are conflating Spamish and Enlish. They fit Enf glish into Spanish form.

It is like language expresses diffeent thought process and paradigms through structure.

Ethiopian immigrants will commonly say 'I get you' for 'I will get it for you'. Spanish speakers similar.

I'd say language is history. Society has become hyper fast. Text and vrrbal communication has been compressed with acronyms and other short phrases.

Some languages have no specific word for self, or words for leaving. One never leaves even if gone for weeks in an island culture.
Such linguistic crosstalk produces combinations like  Spanglish - Spanish + English.

 Standard Average European is likely a result of such crosstalk in western Europe. Here's a nice video: Euroversals - Are all European languages alike? - YouTube French and German are the most alike, the result of the "Charlemagne sprachbund", named after that medieval king's empire. A sprachbund is set of languages which have converged on features because their speakers live close enough to suffer from lots of linguistic crosstalk. European languages in number of Euroversals:
  • French, German
  • Other Romance, Germanic languages
  • Slavic languages
  • (hardly any Euroversals) Celtic langs, Finnish, Turkish, Basque
This also correlates with the US State Department's estimates of language difficulty. It is roughly I: Romance and most Germanic languages, II: most languages, III: Arabic, Chinese, Japanese, Korean. Category I contains those languages high on Euroversals. II and III contain medium to low in Euroversals.
That is not a correct definition of the term cross talk, which refers to a systemic situation where signals are being exchanged but messages are not being correctly received and interpreted by their intended recipients. So two speakers of Spanglish are not engaging in cross talk, as they have actually created a consensus language that both are fluent in. Blended languages such as Spanglish occur because of two processes, pidginization and creolization. Cross talk is more common in cases where two speakers speak the same primary language but different dialects, sociolects, or vocational jargons.
I return to Proto-Indo-European.

Some linguists disdain trying to construct protolanguage text, but others consider it a good exercise for showing off how much one can be confident in about a protolanguage. Schleicher's fable is the best-known, and  The king and the god is a recent example. In any case, much such research is on how one got from there to here.

One gets
  • Phonology - how it was pronounced.
  • Vocabulary
  • Grammar
Indo-European is the best-studied of the larger language families, and even there, there are lots of differences in opinion on what Proto-Indo-European was like.

PIE phonology was a bit complicated, so I will discuss only a few issues about it here.

PIE had several stop consonants, inferred from correspondences like English foot ~ German Fuss ~ Latin ped- ~ Greek pod- ~ Sanskrit pad- ~ PIE *ped-, English two ~ German zwei ~ Latin duô ~ Greek duô ~ Russian dva ~ Sanskrit dvâ ~ PIE *dwô, English three ~ German drei ~ Latin três ~ Greek treis ~ Russian tri ~ Sanskrit trayas ~ PIE *treyes, English hundred ~ German hundert ~ Latin centum ~ Greek hekaton ~ Russian sto ~ Sanskrit satam ~ PIE *kmtom, ...

The columns are for point of articulation, where the sound is made: labial (lips together), dental (tongue against teeth), palatovelar (tongue against upper back of mouth), velar (tongue against back of mouth), and labiovelar (like velar, but with lips close together).

The rows are for voicing. They are traditionally reconstructed as unvoiced, voiced, and voiced aspirate, but in recent decades, some alternative reconstructions have been proposed. What are aspirate consonants? These consonants have a puff of breath after the main consonant sound, and English has some unvoiced aspirates.

till still
pill spill
kill skill

The first one is aspirate, and the second one non-aspirate. This "complementary distribution", as linguists call it, means that these two "phones" (low-level sounds) are "allophones" (sound variants) of "phonemes" (high-level sounds).

However, some languages distinguish voiceless aspirates and nonaspirates. Chinese does, and it has no voiced stops. The Wade-Giles transcription of Chinese indicates aspirates with apostrophes, while the Pinyin transcription omits them and writes nonaspirates in voiced. Thus:
Mao Tse-tungMao Zedong
The Thai language also distinguishes aspirates and nonaspirates, and also voiced consonants.
There are some problems with the traditional reconstruction of the PIE stops. Among known languages, when a language has voiced aspirates, it also has unvoiced ones, and in this reconstruction, there is no convincing evidence for voiceless aspirates.

Another problem is the rarity of *b, when the language has plenty of both *p and *bh. Example of the latter: English be ~ German bi- ~ Latin fu- ~ Greek phu- ~ Russian by- ~ Sanskrit bhav- ~ PIE *bheu- When a language lacks one of /p/ and /b/, it is always /p/ and never /b/. That means that the voiced unaspirated stops are likely some voiceless ones.

A prominent alternative is the "glottalic theory", where instead of traditional T, D, Dh, it's T(h), T', D(h), where the T and D standard for unvoiced and voiced stops. The "glottalic" sounds are the T' ones, pronounced with a small pause between the consonant and the following sound. Here is a table:
  • Traditional: T, D, Dh
  • Glottalic: T(h), T', D(h)
  • Thai-like: Th, T, D

Turning to grammar, PIE was very different from English and much like Latin, Greek, and Sanskrit. It had eight noun cases and complete sets of personal verb endings, though it was rather short on verb tenses. It had three aspects, imperfective (incomplete action), perfective (complete action), and stative (constant state). It had two main verb voices, active and mediopassive (reflexive + passive).

Its noun cases: vocative (for addressing someone), nominative (subject), accusative (object), genitive (of-case), dative (to-case), instrumental (with-case), locative (in-case), and ablative (from-case).

Its basic word order was subject-object-verb, unlike English subject-verb-object. It had no definite article, no word for "the". It indicated possession much like how Russian does, with "at me is something" instead of "I have something".

The earliest dialects of PIE likely had two grammatical genders, common and neuter, with common quickly getting split into masculine and feminine, making three. PIE had a dual number in addition to singular and plural; dual is a plural for two things.

PIE had lots of vowel shifts or "ablaut", and some of it survives, like in the past tenses and past participles of "strong" verbs in English and other Germanic languages.
Sanskrit éka dvá trí catúr páñca ṣaṣ saptá aṣṭá náva dáśa

With this reconstruction of their ancestral forms:
Proto-Indo-European *oynos / *sem *duwō *treyes *kwetwores *penkwe *sweks *septṃ *oktō *newṇ *dekṃ
That also shows that decimal system perhaps existed with PIE.
Yeah, returning after a long time. The forum structure here is different. :)
Sanskrit éka dvá trí catúr páñca ṣaṣ saptá aṣṭá náva dáśa

With this reconstruction of their ancestral forms:
Proto-Indo-European *oynos / *sem *duwō *treyes *kwetwores *penkwe *sweks *septṃ *oktō *newṇ *dekṃ
That also shows that decimal system perhaps existed with PIE.
Yeah, returning after a long time. The forum structure here is different. :)
Yes, indeed PIE had a decimal system. A word for "hundred" can also be reconstructed, but "thousand" varies among the dialects.

This leaves vocabulary, and one can make a lot of cultural inferences for what a language's speakers have words for.

The most stable sorts of vocabulary are, however, not very good for cultural inferences, because they are very commonplace.  Indo-European vocabulary Words for "Sun", "Moon", "fire", "water", "name", "eye", "ear", "tongue", "tooth", "foot", "to be", "to go", "to come", "big", "young", "old", "red", ...

But some words are for things that are not as commonplace, and these words have been used to try to locate the place and time where the Proto-Indo-European speakers lived.

For a long time, words for various kinds of trees were used, because different species have different ranges, and that would presumably be helpful. For example:

English birch ~ German Birke ~ Swedish björk ~ Latin fraxinus ("ash tree") ~ Russian beryoza ~ Sanskrit bhurja ("Himalayan birch") ~ PIE *bherHgos
English beech ~ German Buche ~ Swedish bok ~ Latin fâgus ~ Greek phêgos ("oak") ~ Russian buk ~ PIE *bheh2gos

So one must look for birches and beeches. But these trees grow over a wide area, so that is not very helpful.

Some words are more variable, like words for oak trees
English oak ~ German Eiche ~ Proto-Germanic *aiks
Latin quercus ~ English fir ~ PIE *perkus
Russian dub ~ Proto-Slavic *dobu

So we must look elsewhere.
