• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

Proto-Amerind Numerals by Merritt Ruhlen
The Amerind language family includes all the aboriginal languages of North and South America, except for those belonging to the Eskimo-Aleut and Na-Dene families. Comparative linguistic evidence from extant (or attested) Amerind languages indicates that Proto-Amerind - the language from which all Amerind languages derive - used a system of counting in which an obligatory numeral prefix, *ne- ,preceded the numeral root. The first three numerals in Proto-Amerind seem to have been *ne-kwe '1,' *ne-pale '2,' and *ne-kwatlas '3.' A fourth numeral, Proto-Amerind *ta-pale '4,' combined a reflexive prefix with the Proto-Amerind root for '2' in order to express the number '4.'
MR could find a sizable amount of evidence for his reconstruction of "1" and "2", but "3" mostly in Almosan (N America W Coast) and Andean (S America W Coast). For "4", he found 2+2 and (reflexive)-2 ("2 with itself"). He was unable to find any separate word for "5", noting that it is often derived from "hand", something common cross-linguistically, like "finger" > "1" and "human being" > "20".
 
TYPOLOGY OF NUMERAL SYSTEMS by Bernard Comrie

"Restricted systems, with little or no internal structure"
Pirahã has none
Most of them go up to 3 or 4 or at most 5 (often < "hand").

Then mentioning "subitizing", the ability to estimate a number of things at a glance without counting them. We typically go up to 4.

"Simple systems with addition only"
Often addition of 2's and 1's.

"More complex systems using multiplication and addition applied to a base"
Listing bases 6, 8, 10, 12, 20, 32, 60

All of them:
2, 3, 4, 5, 6, 8, 9, 10, 12, 20, 32, 40, 60, 80

Somatic origins:
  • 10 - fingers
  • 20 - fingers and toes; each finger twice (two phalanges/knuckles)
  • 8 - spaces between fingers (attested for some California languages)
  • 12 - phalanges or knuckles of fingers (excluding thumbs)
Phalange - finger segment
Knuckle - finger joint

If the lowest multiplicative base is higher than about 12, and the system is not an extended body-part counting system, then the higher numbers below the lowest multiple base make use of an additive base, ... some languages use this even for a smaller lowest multiplicative base, ...
like 5 then 10 and 10 then 20.
 
"Idiosyncrasies relating to bases" - lots of them

"Exponentiation and other higher bases" - Indo-European has powers 1 to 3 of 10, Greek adds 4: murias "myriad, 10,000", and Sanskrit and Chinese have even higher powers.

Classical Nahuatl: 20 ~ "to count", 400 = 20^2 ~ "hair", 8000 = 20^3 ~ "bag, sack"

English has short and long counts of high powers: million: 10^6, billion: 10^9 or 10^12, trillion: 10^12 or 10^18, ... replacing the m- with a numeral prefix.

Sometimes powers are from "big", like the origin of "million": Italian -one (augmentative, "big" something): mille "1000" milione "million" < "big 1000"

That likely happened in PIE to form *kmtom 100 from *dekm 10 and *tuHsont- 1000 from *tewH- "to swell" and *kmtom
 
Languages | Free Full-Text | Ancient Connections of Sinitic
Considering various hypotheses, then deciding on a "linkage" between Sino-Tibetan and Dene-Yeniseian.

Sino-Tibetan has two main subfamilies: Sinitic (Chinese) and Tibeto-Burman, though some phylogenies put Sinitic inside of "Tibeto-Burman" as an early brancher.
  1. Sino-Tibetan includes Kra-Dai and Miao-Yao as branches close to Sinitic
  2. Austric: Austronesian and Austroasiatic
  3. Austro-Tai: AN and KD
  4. East Asian / Trans-Himalayan: ST, (Yangzian: MY, AA), AT
  5. Sino-Austronesian or STAN: ST, AT
  6. ST related to DY, more broadly, Sino-Caucasian or Dene-Caucasian

About (1),
However, nearly all of the lexical evidence for this connection is loanwords from Sinitic into Tai-Kadai and Miao-Yao, as Li and others have shown. Shafer and others suggest that Sinitic, Tai-Kadai, and Miao-Yao form a subgroup within ST, but actually, Tai-Kadai and Miao-Yao each have distinct basic vocabulary.
Chinese cultural prominence? Like for Korea and Japan and Vietnam, all three of which have numerous words with Chinese origin, like Chinese number words alongside their native ones. That went further in Proto-Tai, its speakers dropping their inherited number words for Chinese ones.

About (2), "Diffloth (1994) shows that the Austric linkage hypothesis is unsustainable."

About (3), "Much more solidly, Benedict (1942) showed (3) a very close relationship with a large number of cognates between Tai-Kadai and Austronesian in a family which he called Austro-Thai; this connection is fairly widely accepted, and Tai-Kadai is subsumed within (4) Starosta’s East Asian and (5) Sagart’s STAN as part of Austronesian."

Unfortunately, he does not mention Weera Ostapirat's work on Austro-Tai.

Author David Bradley rejects (4) and (5) and then considers (6).
As we have seen, there is substantial evidence of syntactic, morphological, phonological, and lexical similarities between ST, Yeniseian, and ND. These are particularly strong in stable areas such as basic structural features, including negation, prohibition, valency increase, and so on; also in SAP pronouns, lower numerals, basic kinship terms, and so on.
Valency increase: causatives and the like, which add another verb argument, as it might be called.

SAP = Speech Act Participant: first and second person pronouns.

The linkage (5) of Proto-ST with Yeniseian and ND, which cannot be attributed to contact, is supported by various evidence briefly summarised above. The lexical evidence suggests a pre-Neolithic linkage, sharing only the domestic dog. The Yeniseian languages are to the northwest of Proto-ST in central Siberia; the ND groups later migrated from northeast Siberia into northwest North America. The linkage is also supported by genomic evidence presented in Bradley (2023, forthcoming). The shared linguistic retentions of ST and Dene-Yeniseian languages have persisted over great geographical distances, despite many millennia of lack of contact.
I concede that this paper is somewhat disappointing in having no statistical analyses of highly-conserved vocabulary, Kassian-Starostin and Ostapirat sorts of analyses.
The generally-agreed location for the origins of Sinitic is the upper Yellow River valley. In the early Neolithic period corresponding to Proto-ST, cultivation of Setaria and Panicum millets, Glycine (soybean), and the domestic pig started in this area and later diffused more widely. Etyma for these crops and this animal are reconstructed for Proto-ST and attested in Sinitic and nearly every branch of TB across East, Southeast, and South Asia (Bradley 2011, 2016, 2022). The chronology of the subsequent dispersal of Sinitic and the TB languages across this wide area can be traced through regular sound and morphosyntactic changes, as well as lexical innovation, including new vocabulary for new crops and new domestic animals over the period from 5.6K YBP to the present. For more discussion of the phylogeny and spread of Proto-ST, see Bradley (2022, 2023, forthcoming); Bradley et al. (forthcoming) and many other sources.

This chronology, along with archaeological and genomic findings summarised in Bradley et al. (forthcoming) and the early cognate etyma within Proto-ST, suggest that Proto-ST was possibly spoken during the Peiligang Culture and certainly during early to mid-Yangshao Culture in the upper Yellow River valley and that Sinitic was spoken during the late Yangshao and Longshan cultures, spreading downriver into northeast China, where Sinitic speakers took up the cultivation of rice and developed a high culture which they later spread and diffused across the rest of China (Bradley et al., forthcoming).
 Peiligang culture - 7000 - 5000 BCE - Neolithic - Yi-Luo river (Henan Province)
 Cishan culture - 6500 - 5000 BCE - Neolithic - E foothills of Taihang mountains
 Yangshao culture - 5000 - 3000 BCE - Neolithic - middle Yellow RIver
 Longshan culture - 3000 - 1900 BCE - Neolithic - lower and middle Yellow RIver
Longshan = "Dragon Mountain" in Chinese

Dated language phylogenies shed light on the ancestry of Sino-Tibetan | PNAS - "Our findings point to Sino-Tibetan originating with north Chinese millet farmers around 7200 B.P. and suggest a link to the late Cishan and the early Yangshao cultures." - 5200 BCE
Dated phylogeny suggests early Neolithic origin of Sino-Tibetan languages | Scientific Reports - "But we find that the initial divergence of this group occurred earlier than previously suggested, at approximately 8000 years before the present, coinciding with the onset of millet-based agriculture and significant environmental changes in the Yellow River region." - 6000 BCE
 
Prehistory is history before writing or before some starting point. So I'll consider the history of writing.

Before full-scale writing, writing that can fully represent spoken language, there was  Proto-writing possibly extending well into the Pleistocene. Writing was independently invented only a few times, with most other forms of writing being descended from these inventions or else inspired by learning about writing: stimulus diffusion.

The four independent inventions and their descendants:

Egyptian hieroglyphics
  • Hieratic -> Demotic -> Meroitic
  • Proto-Sinaitic
    • Ugariitic
    • South Arabian -> Ge'ez
Phoenician had oodles of descendants, and this is not a complete list:
  • Paleo-Hebrew
  • Aramaic
    • Brahmi -> numerous South Asian writing systems like Devanagari and Tibetan
    • Square Hebrew
    • Nabataean -> Arabic
    • Syriac -> Central Asian writing systems like Mongolian
    • Mandaic
  • Greek
    • Etruscan -> Roman
    • Coptic
    • Gothic
    • Armenian
    • Georgian
    • Glagolitic
    • Cyrillic

Cuneiform writing

Chinese writing -> Japanese hiragana, katakana

Central American writing
 
Now for kinds of writing.
  • Pictographic writing - of pictures
  • Ideographic writing - for concepts
  • Logographic writing - for words or word parts (morphemes)
  • Syllabary - for each syllable
  • Alphabet - for each speech sound
  • Abjad - for consonants, with vowels usually omitted
  • Abugida - for consonants, with vowels indicated if other than some default one

I remember an article about road signs from my childhood saying that they seem to have regressed by showing pictures instead of text, like a picture of a leaping buck rather than the words "DEER XING". But it's easier to recognize at a glance.

In road signs, possible or permitted directions are presented with arrows that may be straight or curved or multiple. That seems to me to be only quasi-pictographic, because it is an abstraction, thus making it ideographic.

A recent form of pictographic and quasi-pictographic ideographic writing is variously called smilies or emoticons or emojis. Though some people hate them, I like them.

Another form of ideographic writing is representation of mathematics, including numbers. Writing numbers ideographically is very old and very common, with the Egyptian and Mesopotamian numerals almost as old as their writing systems.

Writing numbers ideographically rather than as words sometimes causes trouble in historical-linguistics research: Hittite Grammar - Numbers - "The pronunciation of most numbers is unknown since numbers are generally written with cuneiform ideograms."
 
I remember an article about road signs from my childhood saying that they seem to have regressed by showing pictures instead of text, like a picture of a leaping buck rather than the words "DEER XING". But it's easier to recognize at a glance.
And is effective even for motorists who aren't fluent in the local language.

Using internationally recognised standard road signs with as little text as possible is a major benefit to road safety.

When the FIFA Womens World Cup was being hosted here in Australia, we had a match in Brisbane between Germany and South Korea. A lot of German tourists took the opportunity to visit our city, and a significant proportion did not comprehend the signs that read "Authorised Buses Only" and ended up driving hire cars on the busway (occasionally on the wrong side of the road).
 
Some ancient texts did not have much ideographic writing of numbers. Consider the Bible.

2 Chronicles 7:5 - interlinear with the original Hebrew

John 21:11 - interlinear with the original Greek

The originals are written out, and the King James Version also uses written-out forms:

2 Chr 7:5 - And king Solomon offered a sacrifice of twenty and two thousand oxen, and an hundred and twenty thousand sheep: so the king and all the people dedicated the house of God

John 21:11 - Simon Peter went up, and drew the net to land full of great fishes, an hundred and fifty and three: and for all there were so many, yet was not the net broken.

However, some modern-English translations use ideographic writing of numbers, like the New English Translation:

2 Chr 7:5 - King Solomon sacrificed 22,000 cattle and 120,000 sheep. Then the king and all the people dedicated God’s temple.

John 21:11 - So Simon Peter went aboard and pulled the net to shore. It was full of large fish, 153, but although there were so many, the net was not torn.

Yes, (Hindu-)Arabic numerals are ideographic.
 
The first writing systems were pictographic, made logographic by using pictures for similar-sounding words in the fashion of a  Rebus puzzle. That name is short for Latin "non verbis sed rebus": "not with words but with things", about making pictures to represent names in heraldry.

I say logographic and not ideographic because each symbol represents a word or a word part (morpheme), and not necessarily a concept.

Trying to invent logograms for everything is very difficult, and in most systems, some logograms becaame used for their syllable sounds - syllabograms - making a logosyllabic system, a logosyllabic system with a syllabary. Egyptian went the same way, but only for initial consonants, making a logoconsonantal system, a logographic system with an abjad.

Chinese speakers went the farthest in trying to make a logogram for every word, and they invented many compound characters, often with one part specifying some meaning and the other part specifying some sound. For instance, the Chinese character for mother is the characters for woman and horse, a word for a kind of woman, a word which sounds like the word for horse. But much of this analysis was done for Chinese pronunciation of some 2,000 - 3,000 years ago, and such details are often obscured.

The result is a very difficult writing system -- one has to learn a character for every one-syllable word or word part.

Even so, some Chinese characters are used as a syllabary for foreign names and the like.

Japanese continue to use about 2,000 - 3,000 Chinese characters, kanji, alongside two syllabaries, kana, both derived from Chinese characters: hiragana and katakana. Hiragana is used for native words and grammatical parts, and katakana for non-Chinese borrowings and names.

Koreans, however, use Chinese characters, hanja, much less than in the past, mostly using their alphabet, hangul. Letters in a syllable are written in a circle in it, giving hangul a vaguely Chinese-like appearance.

All the other logographic systems have fallen out of use long ago - Egyptian, Mesopotamian, Anatolian, Cretan, Central American - all replaced by alphabets, abjads, or abugidas ultimately derived from Proto-Sinaitic, a descendant of Egyptian hieroglyphics. Like what you are reading right now.
 
 Numeral (linguistics)
Base 80: octogesimal
Supyire:
Bases: 1, 5, 10, 20, 80, 400
Ratios: 4, 2, 2, 4, 5

Some other mixed ones:
Sumerian sexagesimal: 1, 10, 60 -- 10, 6
1, 5, 10 -- 5, 2
1, 5, 20 -- 5, 4
1, 10, 20 -- 10, 2
1, 5, 10, 20 -- 5, 2, 2

For writing numbers, Western (Hindu-)Arabic numerals, as they might be called, have almost completely taken over, with alternatives mainly persisting for numbers in sequence and other special uses.

Among the systems displaced were  Roman numerals
Usage varied greatly in ancient Rome and became thoroughly chaotic in medieval times. Even the more recent restoration of a largely "classical" notation has failed to produce total consistency: variant forms are even defended by some modern writers as offering improved "flexibility".
1 I, 2 II, 3 III, 4 IV, 5 V, 6 VI, 7 VII, 8 VIII, 9 IX, 10 X, 11 XI, 12 XII, 13 XIII, 14 XIV, 15 XV, 16 XVI, 17 XVII, 18 XVIII, 19 XIX, 20 XX, 30 XXX, 40 XL, 50 L, 60 LX, 70 LXX, 80 LXXX, 90 XC, 100 C, 500 D, 1000 M
with alternatives like IIII for IV

Another system displaced was  Greek numerals It uses letters of the Greek alphabet, the first nine for 1 to 9, the second nine for 10, 20, ..., 90, and the third nine for 100, 200, ..., 900. Since the Greek alphabet has only 24 letters, disused letters were used for the remaining three letters. This is an  Alphabetic numeral system and a Roman-alphabet version would be

1 to 9: A B C . D E F . G H I
10 to 90: J K L . M N O . P Q R
100 to 900: S T U . V W X . Y Z &

Isaac Asimov mentioned & as the extra letter in one of his essay books, but I can't find a source for that.

In antiquity,  Isopsephy (Greek) and  Gematria (Hebrew) was the practice of adding up the numerical values for a word's letters and then trying to interpret the meaning of that number.

That's what's behind 666 being the Number of the Beast in the Book of Revelation. It's often interpreted as a gematria version of "Neron Caesar".
 
Turning to other systems, the Egyptian one had a symbol for each power of 10 and repeating each symbol as needed. Thus, 42 is 10 10 10 10 1 1

The Babylonian one had a symbol for 1 and a symbol for 10 repeated as needed for each base-60 digit. It also had a place system, with zero indicated as a space, and later as a symbol.

It's rather obvious that these systems for writing numbers are all ideographic, corresponding to the mathematics rather than to their users' words.

Consider 12,345.

Written out in English, it is "twelve thousand three hundred forty-five" - somewhat irregular.

A more regular example is Chinese: 一万二千三百四十五
yī wàn èr qiān sān bǎi sì shí wǔ

Word for word, it is
one - ten thousand - two - thousand - three - hundred - four - ten - five
or
1 - 10,000 - 2 - 1,000 - 3 - 100 - 4 - 10 - 5

But Chinese, like English, has separate words for 1, 10, 100, 1,000, and Chinese also has 10,000 "myriad".

Though English, like many other Indo-European languages, is somewhat irregular from 11 to 99,  Hindustani numerals are a champion, with just about every one in that range being irregular. But I must note that much of this irregularity is due to various contractions.
 
I've been discussing cardinal numbers, the numbers for counting members of a set, but there are several other types of number words. For instance, ordinal numbers are numbers in sequence: first, second, third, fourth, ...

To keep the clutter down for other kinds of numbers, I will only do 4.
  • Cardinal number: four
  • Ordinal number: fourth, quaternary
  • Adverbial number: four times
  • Multipler: fourfold, quadruple
  • Collective: set of four, foursome, quadruplet, tetrad, quadri-, tetra-
  • Distributive: four at a time, four of each, in fours, in groups of four, four of something in a group, quadruply
  • Fractional: quarter, fourth
Nearly all of these various kinds of numbers have words that are derived from the corresponding cardinal numbers, whether native or Latin or Greek.

English "second" is borrowed from Old French, in turn descended from Latin secundus, literally "following". Also used in Latin was alter "other". Romance languages have descendants of secundus, though many French speakers nowadays use deuxième, the regularly formed ordinal: deux 2 -ième.

English "first" and most other Indo-European langs' words contain PIE *per- "before, in front, first".

Elsewhere, Hebrew rishon "first" < rosh "head".

However, the Turkic langs are completely regular, with "first" and "second" formed from 1 and 2 with the Turkic ordinal suffix. Thus, Turkish birinci < bir -inci and ikinci < iki -inci.

Korean also has an ordinal suffix for all numbers, 째 -jjae. Though 첫째 cheotjjae "first" has that suffix, it is otherwise irregular.

Chinese has an ordinal prefix for all numbers, 第 dì-, Japanese, 第 dai-, Thai, ที่ tîi-, Vietnamese, thứ.
 
Last edited:
(PDF) Seven Dene-Caucasian Etymologies by John Bengtson
Seems rather paltry, but I decided tor read it anyway.
Therefore it is not imagined that the discussion of these seven etymologies is sufficient to “prove” the Dene-Caucasian macro-family.
Just to illustrate it, I think.

Then noting that Basque has a causative prefix -ra-, West Caucasian causative -r-, Tibetan valency increaser r-, Na-Dene *tl-, also Tibetan s-, Haida s-, and the Burushaski transitivizer -s-.

Then addressing the counterargument of the extent of DC, from SW Europe to W North America. That's not really a problem, because of its time depth and how far people have traveled over similar time depths. Imagine a group of people who travel a day's walk each generation. That's roughly 1 kilometer per year, and that is fast enough for dispersal over most of the time that our species has existed. Speeding up to a day's walk each year, some 30 kilometers, one gets much faster dispersion. The total extent of DC is roughly 20,000 km, and one could travel that entire distance in only 700 years.

Furthermore, it seems to me that wide dispersion strengthens the case for shared ancestry rather than for borrowing. Ancestry vs. borrowing has been a big problem for Altaic, because the homelands of the three Core Altaic subfamilies are rather close, in or near present-day Mongolia.

Then getting to the question of Haida. Is it a member of Na-Dene or is it an isolate? JB argues that one can ignore that question, since one can work from Tlingit, Eyak, and Athabaskan.

His list:
  1. stomach, vomit -- Bsq, NC, Bur, ST, ND: A E H
  2. fire, smoke -- Bsq, NC, Bur, Yen, ST, ND: A E
  3. gum, wax -- Bsq, NC, Bur, Yen, ND: A T
  4. limb, bone -- Bsq, NC, Bur, ST, NDL A E T H
  5. liver -- Bsq, NC, Bur, Yen, ST, ND: A E
  6. finger, thumb -- Bsq, NC, ND: A E T H
  7. water -- Bsq, NC, Bur, Yen, ST, ND: A T H
Some of the semantics are a bit stretchy, it seems to me.

The article ended with some sound correspondences.
 
Back
Top Bottom