• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

There is an assertion about the Hopi language, that it only has "rain" as a verb, not as a noun, but I've yet to be able to test that assertion with what I could find online about that language.

But if Hopi has a way of forming nouns from verbs, then a Hopi speaker could make "rain" (noun) as "a raining" or something similar. Something like English "fall" (noun) from "fall" (verb).

I got to thinking about that when I checked on English "wind (in air)" and Russian veter. Both are from *h2weh1- > *(a)wê- listed as "to blow (of wind)". So "wind (in air)" is primarily a verb in PIE, much like "rain" supposedly is in Hopi.

But those nouns were formed separately.

*h2weh1-nt- (present participle) > Proto-Germanic *windaz > English "wind" ... Also > Latin ventus

*h2weh1-tro- (agent noun) > Proto-Slavic *vetru > Russian veter

This root is also present in Celtic, Armenian, Baltic, Indo-Iranian, and Anatolian with these or other verb-to-noun shifts.
 
Turning to "rain" itself, English "rain (noun)" and "rain (verb)" have cognates across Germanic, derived from Proto-Germanic *regnan (noun) and *regnônan (verb) wtih *regnônan < *regnan + -ônan (suffix that makes verbs from nouns and adjectives). Its origin is obscure. So the noun is primary.

Romance words come from Latin pluvia (noun) < pluvius "rainy" < pluere (verb) < PIE *plew- "to fly, flow, run". So the verb is primary.

Celtic:
Scottish Gaelic uisce (noun) primarily means "water", fras (verb) "to rain, scatter, shower, drip"
Irish báisteach (noun) < baiste "baptism", fearthainn (noun) < Old Irish feraid “to grant, afford, supply”, cuir (verb) "to put, send, sow, plant, bury, rain"
Both Goidelic langs have lexically separate words for the noun and the verb.
Proto-Brythonic *glaw (noun)

Proto-Slavic *duzhdzhi (noun) < ? PIE *dus-dyu- "bad sky"

Lithuanian lietus (noun) < lyti (verb) < lieti "to pour"

Old Armenian anjrew (noun)

Greek brekhô (verb), huô (verb), ombros (noun)

Persian bârân (noun)

PIE *h₁wers- (verb) > Sanskrit nouns, verb

So over Indo-European, words for rain are derived from both nouns and verbs.
 
From Tocharian B swese (noun), su- (verb), Greek huei (verb 3sg), PIE *sh2ew- > *saw- (verb)

The Irish verb is cuir (báisteach, fearthainn) -- (rain) falling

Outside of Indo-European,

Proto-Uralic *sada- (verb) "to fall"

Proto-Turkic *yag- (verb)

Mongolian borô (noun)

Korean bi (noun)

Japanese ame (noun)

Proto-Georgian-Zan *ts'wim- (verb)

Proto-South-Dravidian *maz'ay (noun)

Arabic m-t-r (verb, noun) - has cognates in other Semitic langs

Basque euri (noun)

Sino-Tibetan:
Chinese yu (noun)
Burmese mui (noun) "sky"

Na-Dene:
Navajo -tsaa (verb)

Quechua para (noun)

Proto-Austronesian *quzaN (noun)

Yoruba ojo (noun)

Proto-Bantu *mbúdà (noun)

Khoisan:
ǃXóõ ǃqhàa (noun) "water", ǃáa (verb)
 
Last edited:
"Rain (noun)" is often derived from "water" or "cloud" or something similar.
"Rain (verb)" is often derived from "to fall" or something similar.
There are plenty of examples of both kinds of semantic shift.

But when a new word is introduced for this effect, it can be introduced as either a noun or a verb, with the other part of speech then formed from the new word. So one should not read too much into English "rain" being primarily a noun and its Hopi counterpart being primarily a verb.


Returning to "wind (in air)" I notice its much greater stability in Indo-European. I checked on some other language families.

Proto-Finno-Permic *towle
Proto-Turkic *yel
Proto-Chukchi-Kamchatkan *zhohwu
Proto-Eskimo *anuqa
Proto-Kartvelian *kar-
Proto-Dravidian *kâl-, *gâl-

Proto-West-Semitic *rûhh-

Proto-North-Caucasian:
- Nakh, Avaro-Andian, Lezgian: *Htlwimâ
- Avaro-Andian, Tsezian, Lak, Lezgian: *miltswa
- Lak, Lezgian: *tlHiba
Proto-Sino-Tibetan *g-lëy

Proto-Polynesian *matangi < *mata "eye, point" + *angi "to blow (of wind)"
Proto-Malayo-Polynesign *hangin
Proto-Austronesian *bali

Proto-Kra-Dai *R-lum.A
Proto-Mon-Khmer *kjaal

As far as I could tell, these are roughly as stable as in Indo-European or somewhat less in some cases.
 
Repeating my link - (PDF) The "Nostratic" roots of Indo-European: From Illich-Svitych to Dolgopolsky to future horizons - by Alexei Kassian, George Starostin, Mikhail Zhivlov
Ultimately, it is our firm belief that Nostratic linguistics, while currently in a state of mild stagnation, may overcome this state by means of important methodological reforms – even if many of these reforms might not be for the liking of conservative supporters of the hypothesis who believe that the “classic” comparative-historical method, good enough for Indo-European, would be just as good for Nostratic without additional restrictions. We also believe that these reforms, in the long run, will be useful not only for all the other promising hypotheses of long-distance relationship as well, but also for further research on uncontroversial families of smaller time depth, including Indo-European itself.
This method has had some success with some of the shallower long-range hypotheses: Indo-Uralic, Core Altaic, and Austro-Tai.

I'd mentioned this earlier - 400-item basic lexicon wordlist for potentially "Nostratic" languages of Eurasia - and it IMO is the way to go. Increasing the length of one's word list past about 40 reportedly does not give much improvement, but I suspect that that's for well-established families, those that are relatively easy to recognize. But it's hard to recognize relatively distant relationships with such a small number of meanings, so increasing the number will make possible better statistics.
 
The data: 400-item basic wordlist for potentially "Nostratic" languages

The authors: George Starostin, Ilya Egorov, Alexei Kassian, Artem Trofimov, Mikhail Zhivlov. All familiar names to me. Some of them wrote what I recently posted on ("The Nostratic Roots of Indo-European"), and also about applying statistical tests to Indo-European, Indo-Uralic, Altaic, and circumpolar langs.

They themselves concede that it is a work in progress.

The language families in it: Indo-European, U-Y (Uralic, Yukaghir), Altaic (Turkic, Mongolic, Tungusic, Korean, Japanese), Dravidian, Kartvelian, E-A (Eskimo, Aleut), Chukotko-Kamchatkan (Chukotkan, Itelmen), Nivkh/Gilyak.

They used subfamilies of some families because that's what they could find good reconstructions for. They also want to test Uralic-Yukaghir and Altaic.

This list covers what is nowadays considered at least possible members of Nostratic. Afroasiatic was omitted as most likely too distant and in a very incomplete state of reconstruction. Judging from the Tower of Babel Afroasiatic database, they would split it up into something like this:

Semitic, Egyptian, Berber, Chadic (West, Central, East), Cushitic (North: Beja, Central: Agaw, South, Lowland East, Highland East, Saho-Afar, Dahalo, Yaaku: Mogogodo, Dullay: Warazi), Omotic.
 
The 400-word-list collectors note such common semantic shifts as eye ~ to see, black ~ night, earth ~ dust ~ sand, and they've listed related entries in their entries, along with what part of speech.

For instance, "name" is a noun, with related entries "call", "word". "Black" is an adjective with related entries "coal", "dark", "dirt", "night". "Two" is a quantifier with related entries "four", "other", "pair", "twins". "Hear" is a verb with related entries "ask", "ear", "feel", "know", "listen", "understand". "Who" is a pronoun with related entries "person", "what". "Not" is a "clitic / particle" with two related forms: not-1: indicative, not-2: prohibitive (negative imperative).

Indicative: you are not reading my forum post.
Prohibitive: don't read my forum post.

Though in that context, I'd call "not" an adverb.

They currently have 186 nouns (N), 132 verbs (V), 63 adjectives / descriptive verbs (A), 13 pronouns (P), 5 quantifiers (Q), and one particle / clitic (C): "not".
 
How Many Is Enough?—Statistical Principles for Lexicostatistics - PMC

How big a list? 15, 33, 35, 40, 100, 200, 300 to 500
Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics.

Open Problems in Computational Historical Linguistics | Open Research Europe

Mentioning 10 unsolved problems in computational historical linguistics, and assessing the progress in solving them.
  1. Automated morpheme segmentation (i) -
  2. Automated borrowing detection (i) + contact layer detection
  3. Automated sound law induction (i) -
  4. Automated phonological reconstruction (i) +
  5. Simulating lexical change (m) -
  6. Simulating sound change (m) -
  7. Statistical proof of language relatedness (m) (+)
  8. Typology of semantic change (a) -
  9. Typology of sound change (a) -
  10. Typology of semantic promiscuity (a) (+) typology of lexical motivation
i = inference, m = modeling, a = analysis

Semantic promiscuity: "the degree to which certain words, due to their original meanings, are re-used or re-cycled in the human lexicon." -- semantic productivity?

Lexical motivation: attraction and expansion.
Attraction refers to cases where a given concept “attracts” different words to express it. In theory, we might be able to measure the attractivity of concepts, that is, their propensity to be expressed by multiple words. Expansion refers to cases where a word receives new meanings. If one agrees that the expansivity of words typically depends on the meaning they express originally, one could take this idea one step further and measure the expansivity of concepts and compare it across languages. Taking it one additional step further, one could then ask not only which concepts are good at triggering the extension of a word’s meaning, but also which concepts are good at triggering the reuse of a word in word formation processes, which is what I meant to denote with the term “semantic promiscuity”.
 
2012a_The_Arabic_origins_of_numeral_wor.pdf
"The Arabic Origins of Numeral Words in English and European Languages"
by
Zaidan Ali Jassem
Department of English Language and Translation, Qassim University,
P. O. Box 6611, Buraidah, KSA
Essentially denying the Indo-European origins of these words for numbers.

1 to 10 in Arabic (his transcription): waa2id, ithnan, thalathat, arba3at, khamsat, sittat, sab3at, thamaniat, tis3at,
3ashrat

2 = throaty h-like sound, 3 = voiced version

Like
khams → khamf → famf → fanf (German: fünf) → faf (English: five)
khams → kams → kamk → kank (Latin: quinque) → sank (French: cinque)
khamsat → kham(th/f)at → famfat → fampat → pampat → panat (Greek: pente)
I've only edited the parts of parentheses.

Lots of ad hoc sound shifts, nothing systematic.
 
On the Numerals as Evidence of the Progress of Civilization on JSTOR
On the Numerals as Evidence of the Progress of Civilization
John Crawfurd
Transactions of the Ethnological Society of London, Vol. 2 (1863), pp. 84-111 (28 pages)
Noting a correlation between highest named numbers and what we might call level of technology.

He seemed rather naive about historical linguistics, seeming to think that European numerals are borrowed from Sanskrit.

He also believed that different people's languages were invented separately. I found  John Crawfurd - and he thought that the ancestors of different groups of people were separately poofed into existence:  Polygenism He got nicknamed "the inventor of forty Adams" and Charles Darwin stated that he believed that humanity was some 60 separately-created populations.
 
I checked John Crawfurd's transcriptions against Mark Rosenfelder's The Numbers List and I couldn't find them there. But they have patterns that are common in MR's list:
3 = 2+1, 4 = 2+2, 5 = 2+2+1, 6= 2+2+2
Unary on binary: base 1 on base 2 -- base 1 is anything greater than 1 is repeated 1's.

I checked the European ones, and "Biscayan" is Basque.

I then checked on whether the lower Thai numerals are borrowed from Chinese, something I vaguely remembered from somewhere.

Chinese ones: MR's list only - I didn't bother to check with Wiktionary.
Language12345678910
Proto-Chinese*skət·ʔ*’nejs·ʔ*súm*s·lih*hŋá`*gruk*sñəs*prjat*t·ku`*gip
Old Chinese*ʔjit*njis*sum*s(p)jij/ts*ngaʔ*C-rjuk*tshjit*pret*kwjuʔ*gjip
Middle Chineseʔjitnyìsamsìjngúljuwktshitpeatkjúwdzyip
(Karlgren)’iêt8ñzhi6sâm2si6nguo4lyuk8ts’yet7pwat7kyeu4zhyep7
Mandarin (Běijīng)èrsānliùjiǔ shí
(Běijīng IPA)i 55ɹ 51san 55sz 51u 213lioʊ 51tɕʰi 55pa 55tɕioʊ 213ʂɹ 35
Cantonesejɐt 5i 22sam 53sei 33ŋ 23lok 2tsʰɐt 5pat 3kɐu 35sɐp 2
Sino-Koreanilisamsaoyukch’ilp’alkusip
Sino-Japaneseichinisanshigorokushichihachikyū

Proto-Austronesian and Kra-Dai: also using Wiktionary and Alexander Smith's paper on vowel correspondences for the numerals of Buyang, a KD language that he used as an example.
Language12345678910
Proto-Austronesian*esa/isa*duSa*telu*Sepat*lima*enem*pitu*walu*Siwa*sa-puluq
Buyangtsam 45θa 322tu 322pa 322ma 33nam 33tu 33mu 31ða 33va 55
Buyang (AS)ɕa-A1tu-A1pa-A1ma-A2nam-C1tu-A2ðu-A2va-A1
Ch. src (Wik)MC sraewngMC samMC sijHOC *ŋaːʔOC *ruɡMC tshitMC peatMC kjuwXMC dzyip
Proto-Taiatji:hsa:msi:hxe:`xokčetpɛ:tkaodžip
Proto-Tai (Wik)nɯːŋᴮsoːŋᴬsaːmsiːᴮhaːkrokᴰcetᴰpeːtᴰkɤw(sip)
Thaineung 22sawng 15saam 15sii 22haa 51hok 22jet 22paet 22kao 51sip 22
Buyang's forms are rather obviously Austronesian without the first syllables.

Thai's forms, however, are borrowed from Chinese from 3 to 10, though 2 may be from Proto-Austro-Tai. My sources differ on 1 and 2, however.

I didn't bother to try to compare MR's whole list for KD.
 
Uralic_Numerals_2003-libre.pdf - "Uralic numerals: is the evolution of numeral system reconstructable?
(reading new Václav Blažek’s book on numerals in Eurasia)" - by Vladimir Napolskikh
... the core of Blažek’s interest is not the simple reconstruction of development of the numerals in concerned languages, but the origin of the reconstructed numerals (mainly of first decade) of the big language families enlisted above. The main authour’s task seems to be not simply to trace the development of the word, but to give an etymology in the original meaning of the term, to show the very way of “creating” the numeral, to find the source proto-form, which later gave birth to the numeral stem.

...
Dr. Blažek’s position on this subject is partly revealed in the chapter devoted to IE ‘seven’: “Studying the systems of numerals in various language families, I am convinced that it is almost always possible to determine an original motivation of all higher numerals beginning with “5”.
Then about numerals possibly being borrowed.

VN then says
It is general Nostratists’ mistake to think, that all the language contacts producing the loanwords had begun after the disintegration of daughter languages of the «Proto-Nostratic» and all the parallels between, e.g., Indo-European and Turkic must be considered as relics of «Proto-Nostratic» but not as traces of ancient borrowings.
After going into a lot of detail,
As it can be seen, most of these Blažek’s Proto-Uralic recontructions are too optimistic if not to say simply bad. Therefore his conclusion: “the internal evidence and external parallels allow us to reconstruct the proto-Uralic numeral system consisting of the numerals 1-5."
Toward the end, he has his own hypothesis:

Proto-Uralic-Yukaghir had only 1 and 2: *ket -- 1, 2, 2+1, 2+2, ...

Proto-Uralic had only 1, 2 *kakta, 5 *witte, and 20 *koje-Se "man" from we having 20 fingers and toes.

His assessment assumes a split between Finno-Ugric and Samoyedic, but if Uralic phylogeny is more of a lawn, with Samoyedic coequal to Finnic, Saami, Permic, Volgaic (Mordvinic, Mari), and Ugric, then Proto-Finno-Ugric = Proto-Uralic in most cases.
 
Using Wiktionary, I find:

1 *ükte - all but Khanty it, Hungarian egy, Samoyedic *op
2 *kakta, *käktä - all
3 *kolme, *korme - all but Samoyedic *nakur
4 *neljä - all but Samoyedic *tettə < Bulgaric (Chuvash tăvat) < Proto-Turkic *tört
5 *witte - all but Samoyedic *səmpəläŋkə < ? *səmpə- "hand"
6 *kutte < ? "ridge, back (of body)" - all but Samoyedic *məktut < *məka "back (of body)"
7 *ćäjćemä < ? Indo-Iranian - all but Proto-Ugric *säptɜ < Indo-Iranian
8 *kakteksa "something - 2" - Finnic, Saamic, Mordvinic; - Proto-Ugric *ńalɜ
9 *ükteksä "something - 1" - Finnic, Saamic, Mordvinic - Hungarian kilenc
10 *luka < *luke- "to count" - Saamic, Mari, Ugric - Proto-Finnic *kümmen, Proto-Mordvinic *keməń, Proto-Samoyedic *wüət < Proto-Uralic *witte "5"
20 *kuśɜ, *kuuśi - Permic, Ugric - Finnic is literally "2 10's"
 
Papuan-Austronesian contact and the spread of numeral systems in Melanesia | John Benjamins
Proto-Austronesian had a base-10 system, as did Proto-Austro-Tai.

But many Austronesian languages of Melanesia - - Papua New Guinea - Nauru - Fiji - New Caledonia -- have base 5.

But among non-Austronesian speakers of New Guinea, base 2 is common, or more precisely base-1-2: 1, 2, 2+1, 2+2, 2+2+1, 2+2+2, ... and the authors conclude that Papuan speakers got base 5 and base 10 from imitating AN speakers that used those bases, even if not necessarily the actual words. Some New Guinean AN speakers likewise ended up acquiring base 2.

Their coding of number systems:
  • Binary: base 1-2: 1, 2, 2+1, 2+2, 2+2+1, 2+2+2, ...
  • Binary+3: 1, 2, 3 2+2, 2+2+1, 2+2+2, ... (simple form of 3 that's not used later)
  • Binary+4: 1, 2, 2+1,4, 2+2+1, 2+2+2, ... (simple form of 4 but not 3, a form that's not used later)
  • Quinary: base 5: , 1, 2, 3, 4, 5, 5+1, 5+2, 5+3, 5+4, 2*5, 2*5+1, ...
  • Decimal proper: base 10: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10+1, 10+2, ...
  • Decimal modified: 1, 2, 3, 4, 5, 6, 7, 10−2, 10−1, 10, 10+1, 10+2,
  • Quaternary: base 4: 1, 2, 3, 4, 4+1, 4+2, 4+3, 2*4, 2*4+1, ...
  • Senary: base 6: 1, 2, 3, 4, 5, 6, 6+1, 6+2, 6+3, 6+4, 6+5, 2*6, 2*6+1, ...
  • Unknown
Mixed quinary-decimal was coded as quinary: 1, 2, 3, 4, 5, 5+1, 5+2, 5+3, 5+4, 10, 10+1, ...

Looking at words for 3 and 4 in base-5 Papuan langs, they were either borrowed from some AN lang or else they were separately invented -- and invented with a lot of variety.
 
There is a section on "Limited conventionalization in Papuan quinary systems" - "Moreover, even among the contemporary Papuan languages that do exhibit quinary systems (with monomorphemic terms for, at least, ‘one’ through ‘four’), many display a rather limited degree of conventionalization in the formation of higher numerals." - meaning a lot of variation.

What does it mean to be completely conventionalized? Exact values, used for every countable entity, in every social circumstance, by all the speakers.

Exact means no "few" or "some" or "many". Every countable entity? Japanese and Korean have native and Chinese numerals, used in different contexts.

Back to the paper. Some Papuan quinary users make 10 either with a simple form, or 2*5 or 5+5 or "two hands" -- variation by speakers of the same language, or even by the same speaker at different times.

First, although the presence of conventionalized quinary numeral systems in the Austronesian languages of Melanesia may ultimately be due to interactions between Austronesian-speaking and Papuan-speaking groups, this does not necessarily mean that such systems were borrowed. Rather, Austronesian quinary systems may have emerged due to contact with Papuan groups whose languages lacked simplex terms for numbers greater than 2 or 3, but who engaged in the cultural practice of counting on their fingers and toes. That is, it may have been this physical cultural practice, and not a particular linguistic system, that was adopted by the Austronesian groups who first came into contact with Papuan groups. The widespread pattern of Austronesian languages in Melanesia using an expression indicating a (completed) ‘man’ or ‘person’ for the numeral 20 further points to the influence of digit-tallying practices on Austronesian numeral systems.
Because we human beings have 20 fingers and toes. Digit-tallying: counting on fingers and toes.
Ross (forthcoming a) suggests that digit-tallying methods were likely used within early Oceanic linguistic communities alongside the inherited decimal system, and that the inherited decimal system was often mostly restricted to ceremonial purposes, namely for counting gifts, such as food, at customary feasts.
 
That is, quinary number systems are an "areal" feature in Melanesia and New Guinea, and binary ones areal in New Guinea, with counting in that system borrowed rather than the words of that that counting.

Vigesimal (base-20) systems are also areal, like in Northwestern

Numeral systems in Mande languages - the Mande family is a West African family whose membership in Joseph Greenberg's Niger-Congo macrofamily is often considered doubtful, alongside Ijo (W Africa), Dogon (W Africa), and Kordofanian (N C Africa).
The counting systems and systems of numerals found in Mande languages are rather heterogeneous, and some of these systems display unique or, at least, typologically rare features.

...
Typologically, all numeral systems in the languages of the world are either restricted with no arithmetic base or non-restricted.

We assume that the emergence of numeral systems which have an arithmetic base is a relatively recent phenomenon. According to the World Atlas of Language Structures (Comrie 2005), there still exist “restricted” systems, which use three or four numerals. Larger numbers here are expressed by combinations of these few elements. Within the framework of such a system, the expression of numbers larger than 20 is extremely cumbersome, and as a result, there are limits of the application of exact counting (traces of words with meanings like “the final number” are attested in many languages whose systems are no longer restricted). Such systems are typical of small closed communities, which, until recently, were unconcerned by commercial relations, had no monetary system and conducted a traditional way of life based on hunting and gathering.
Like how humanity was for most of its existence. So for most of our existence, we did not count very far, with words for anything more than 2 or 3 or 4 being very unstable, and coexisting with counting on body parts like fingers, usually in some canonical order.

I must note that such instability may appear among speakers with more advanced technologies: in the Indo-European langs, 2 to 10 and 100 are rock-solid, 1 and 1000 are somewhat unstable, and in-between numerals are more unstable.

Certain terms for numbers in Mande languages can be traced back to body parts: hand, foot, mouth, head, human being, a fact that can be explained in terms of original body-part counting. At the same time, there are terms for higher numerals which go back to the names of certain sets: string of cowries, basket of cola nuts.
Then noting that all modern Mande numeral systems use number bases.
 
Then going into detail. Irreducible numerals:
  • Old Bamana: 1 to 10 (traces of quinary), 20, 40, 60, 80, 800
  • New Bamana: 1 to 10, 20, 100 (old 80) -- decimal
  • Boko: 1 to 5, 10, 20, 200
  • Dzuungoo: 1 to 10, 20, 40, 60, 80, 800
  • Mwan: 1 to 5, 10, 20, 100, 1000
  • Dan-Gweetaa: 1 to 5, 10, 100, 1000
  • San-Maka: 1 to 10, 80, 800
  • Soninke: 1 to 10, 100, 1000
By comparison, Proto-Indo-European had 1 to 10, 100, 1000 -- decimal
20 ~ "human being" is common, from our having 20 fingers & toes.

The authors then reconstruct a sequence of emergence:
  • Quinary system
  • Decimal, pentadecimal (base 15), and vigesimal systems
  • Octogesimal systems from vigesimal ones
  • Decimal systems force out older non-decimal ones
Octogesimal systems were likely provoked by trade in cowrie shells (pre-colonial money) and kola nuts (a common pre-colonial trade item).
The dynamics of the evolution of numerical systems is also reflected in the use of connectors. In languages spoken by communities involved in large-scale trade for many centuries, it is typical to have one universal connector (normally a coordinative/comitative conjunction/preposition), while languages spoken in the forest zone, not used until very recently for large-scale trade, tend to employ various connectors for different orders.
Connector: "and", "with", ... - more standardized for lots of trade, less standardized for more isolated communities.

Some additional connectors:
Russian оди́ннадцать odinnadtsat' 11 lit. "one on ten" - na "on"
Latin ūndēvīgintī 19 lit. "one from twenty" - dē "from" > Western Romance de, di "of"
 
Number Systems of the North American Indians - Eels-NumberSystemsNorth-1913.pdf
Number Systems of the North American Indians
Author(s): W. C. Eels
Source: The American Mathematical Monthly , Dec., 1913, Vol. 20, No. 10 (Dec., 1913), pp. 293-299
Published by: Taylor & Francis, Ltd. on behalf of the Mathematical Association of America
Stable URL: https://www.jstor.org/stable/2972526

"Counting cannot be carried far by the use of successive unrelated terms or symbols for each number." That's as true of writing as it is of speech. One eventually has to make a compound word or compound symbol.

WCE first mentions decimal (base 10), a common one:

100 = "completed", "stock of 10's", 10*10
1000 = 10*10*10, 10*100, "big 100", "old man 100", "large stock of 10's"
10^6 = 1000*1000, "big 1000", "too many to count"

In Eurasia,
Indo-European:
100 = "super 10" (?)
1000 = "heap" (<"full hand"), "swollen 100" (?)
Semitic:
1000 = "herd"

Then quinary (base 5) and quinary-decimal (5, 10).

Then vigesimal (base 10) and quinary-vigesimal (5,20) and decimal-vigesimal (10, 20) and quinary-decimal-vigesimal (5, 10, 20). Often, 20 = "human being" or "human being completed", because of our 20 fingers & toes.

Greenlandic Inuit: 7 = other hand two, 12 = first foot two, 17 = other foot two

Unalit has 20 = "human being ended", but for 40 and above, "2 sets of animal paws", ...

Wintun has 20 = "Indian", 30 = 3*10, 40 = "2 Indians", 50 = 5*10, ...

Common variations: duplication, like 8 = 2*4, and subtraction, 9 = 10-1. In Eurasia, IE has the first one and possibly the second one also.

Then quaternary (base 4) systems.

Omitting thumbs? Counting only spaces between fingers? 4 is often "stick" or "middle" or "body".

4 as a special number? Sometimes "complete" or "right" or "perfect". But that is not usually associated with base 4.

Then some rare ones: ternary (base 3), octal (base 8), bits of binary (base 2), senary (base 6), nonary (base 9), quadragesimal (base 40), 40 = "stick", sexagesimal (base 60)
 
Then under "Conclusions",
The most striking feature of the systems which have been studied is their diversity, even in languages of the same family, and much more marked when the country as a whole is considered. For instance in the closely related languages of the Yukian family in California, although the numerals from one to four are quite similar, yet two of the systems are quinary-decimal, a third is quinary-vigesimal, while the fourth is octonary; or in the Pujunan family in which one system is decimal, eight quinary-decimal and two quinary-vigesimal.
Then discussing how far one can count with these people's number systems. Very large numbers they named with "leaves on the trees," " stars of the heavens," " blades of grass on the prairie," "sand on the lake shore."
It is probable that before coming in contact with European civilization the Indians had little occasion to use numbers beyond a thousand. But the systems of many of them were such as to admit of indefinite and easy extension when needed.
Such limits are evident elsewhere. Proto-Indo-European only went up to 1000.

However, Central Americans could count very high, however:  Mesoamerican Long Count calendar
 
Then noting numeral classifiers. Often, a classifier prefix or suffix is added to the numeral, like Haida having 15 classifiers. Tshimshan has classifiers for:
  • Abstract counting
  • Flat objects or animals
  • Round objects or time
  • People
  • Long objects
  • Canoes
  • Measures
These classifiers are common on the North Pacific coast and rare elsewhere.

Also noting number words conjugated as verbs in a few langs: "to be <number> of them".
 
Back
Top Bottom