• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Language as a Clue to Prehistory

In Semitic languages [4], a hypothetical transition from biconsonsonantal (2c) to triconsonantal (3c) language morphology was debated for quite some time [5]. Semitic lexemes are derived from roots consisting of predominantly three radicals (i.e., root consonants), termed 3c. However, there is a small corpus of 2c roots (defined in Methods), responsible for most of the irregular Semitic verbs. Are these remnants from a more archaic linguistic phase? One observation favoring this is the relative abundance of 2c body parts and, particularly facial features (“eye”, “tooth”, etc.). If this semantic field originated early in language development then so did the 2c morphology. But how can we know this?

Further progress can be made by correlating linguistic and archeological innovations. Selecting an archeologically dateable semantic field (e.g., materials), we have shown [6] that, in the reconstructed Proto-Semitic (PS) language [7,8], names of materials known to and utilized by early hunter-gatherers (wood, reed, stone, flint, lime, gravel, sand, mud, clay, cloth, skin and water) are overwhelmingly (85%) of 2c morphology, while materials introduced as of the Neolithic period in W. Asia (bitumen, sulfur, salt, charcoal, pottery, brick, wool, lead, antimony, copper, silver and gold) were all given 3c names. This non-uniform distribution of 2c vs. 3c lexemes in these two semantic fields suggests that a 2c > 3c language morphology change accompanied the transition to agriculture in the Early Neolithic, ca. 11,000 years Before Present (BP).
Appendix:Afroasiatic Swadesh lists - Wiktionary, the free dictionary

For instance, "head" is Arabic ra`s, Hebrew rôsh, Syriac rêsha, Akkadian rêshu, Ge'ez rəʾəs

Likewise, "name" is Arabic ism, Hebrew shem, Syriac shma, Akkadian shumu, Ge'ez səm

One can nowadays reconstruct PS rather reliably [8] thanks to the extensive Akkadian (Akk.) texts [9], which go back 2.5–4.5 kyr. PS was supposedly spoken during the Chalcolithic period, sometime between 5,750 BP [10] and 6,300 BP [11].
That's the time of the dispersion of the Proto-Semitic speakers, when different subsets of them went in different directions.

This age is roughly the age of Indo-European and other relatively old generally-accepted language families.
 
The authors then discuss word-frequency analysis, using  Zipf's law and an extension,  Zipf–Mandelbrot law

For word rank r, the frequency f is
(Zipf) f = f0 / r
(Zipf-Mandelbrot) f = f0 / (r + b)a

where a ~ 1 and b ~ 2.7

As recently shown [29], frequently used words (actually, meanings) are replaced (by other words of the same meaning) less often than the less frequent ones. Thus if the 2c stratum indeed predated the 3c one, the frequently used 2c lexemes may have simply survived replacement during the subsequent 3c era. This is supported by their frequency-rank dependence in Figure 1. As opposed to the total BH verbs with α ≈ 1, the 2c/BH verbs (collected in Table S5) exhibit an observably larger Zipf exponent (α2c=1.28), whereas the high frequency 3c verbs have a smaller α3c=0.82(linear fits not shown). This might be explainable by the 2c > 3c transition: while the 2c language was alive, 2c words of a given meaning were depleted at the same rate as alternate 2c lexemes were generated, and the language maintained its steady-state with the usual exponent α ≈ 1. After the 2c era has ended, 2c roots were no longer created only eliminated. Because less frequently used words decay faster, α2c increased with time.

Noting Frequency of word-use predicts rates of lexical evolution throughout Indo-European history | Nature - "Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly."

The authors cited "tail" and "two", with the former being very variable and the latter being well-conserved in Indo-Europeandom: *dwôu
 
The Swadesh wordlist. An attempt at semantic specification

Some languages lack words for some of its members: "‘horn’ for languages of Oceania or ‘(domesticated) dog’ for the greater part of pre-Neolithic cultures" -- from Oceanians' lack of horned animals and from hunting-gathering people usually lacking domestic dogs.

This wordlist is defined using single English words, and that causes problems. For "all", Latin distinguishes between omnis, for collections of objects, and tôtus, for completeness of individual objects. Some Turkic languages distinguish between 2D and 3D roundness. Etc.

"In all cases, as is prescribed by the standard procedure, we advocate the choice of the most stylistically basic, neutral, and unmarked word." Thus, English "belly" instead of "stomach" or "paunch" or "tummy".

"2) The word also has to be encountered in unbound use, not constrained exclusively to specific idiomatic constructions or compound forms."

"3) A frequent source of synonymity and, therefore, confusion concerning the Swadesh wordlist is the phenomenon of suppletion within a paradigm." One has to use the least marked form, like "good" instead of "better" or "best" or "well".

"4) The list is generally anthropocentric, meaning that most of the anatomic terms and action verbs (such as ‘sit’, ‘lie’, ‘eat’, etc.) are to be taken as human body parts and actions;" Thus, for "to eat", German distinguishes essen (human eating) and fressen (animal eating, gluttonous or undignified human eating), so one should use the former. Also, "(human) (finger)nail" instead of "claw". There are exceptions for body parts and actions with no human counterparts, like "feather", "horn", "tail" and "to fly".

"5) Many of the items on the Swadesh list form functional antonymous pairs, e.g., ‘big : small’, ‘black : white’, ‘man : woman’, or groups of two or three elements with tight semantic connections, e.g., ‘sun : moon’, ‘eat : drink’, ‘sit : stand : lie’ etc." So look for uses that contrast these words, like "I need to eat and drink to stay alive".

"6) One of the cases where synonymity is unavoidable is when we deal with a transitional stage in language history, during which an older word is gradually being “ushered out” by a more recent replacement (e.g., the relation between ‘stone’ and ‘rock’ in American English)." In that case, use both words.

"7) Synonimity must also be considered (but can sometimes be avoided) in dealing with the phenomenon of compound forms found on the list, consisting of two or more root morphemes;" which ones does one compare?
 
"As a scientifically inferior, but pragmatically more helpful, alternative to formal semantic definitions we currently suggest the method of diagnostic contexts for the Swadesh items."

Then going into a lot of detail for each member of the 100-word Swadesh list and 10 additional words, with both English and Russian versions of the words, examples, and notes.

For "what" and "who" it specified the interrogative pronouns, not relative ones.

For "we" if there are both inclusive and exclusive versions, "you and I" and "we without you", then use both together, as synonyms.
 
newglot.doc - Starostin_Glotto.pdf
COMPARATIVE-HISTORICAL LINGUISTICS AND LEXICOSTATISTICS
Sergei Starostin
The elder Starostin of macro-linguistics work.

One has to watch out for
(a) accidental resemblances, such as English woman and Old Japanese womina 'woman'.
(b) ideophones, such as Russian kukushka and English cuckoo.
(c) loans, e.g. proto-Japanese *kui and proto-Austronesian *kaju' 'tree, wood'.
Present-day Japanese has onna for "woman", close to Italian "donna".

Then saying that many American linguists like to use binary comparisons, something especially vulnerable to confusion.

Then discussing Morris Swadesh's 100-word list.
Every comparativist who has worked with glottochronology knows that closely related dialects usually have a cognacy rate of 90% or more on Swadesh's l00-word list: closely related languages (such as those within the Slavic, Romance, Germanic or Turkic groups — that is those which diverged around one and a half to two thousand years ago) share from 70 to 80% of items on this list, and language families such as Indo-European which split up five or six thousand years ago have a rate of 25 to 30%. Once we start to talk about more ancient families such as Uralic or Altaic we find a rate of at most 10 or 20 percent. Finally, the cognacy rate for modern languages belonging to different branches of such a macro-family as Nostratic is even less —around 5 — 9 %.
 
It is well-known that similar rates of cognacy are found between languages related at an equivalent level in other language families, such as Austronesian, Uralic, Sino-Tibetan etc.
However, this may be a side effect of which families are readily recognizable.
All these considerations give us some indication that the rate of change in the lexicon (in some of its domains at least) really might be steady and universal.
That would need to be tested with archeological and historical correlates. One can do that with the Indo-European and Austronesian families and their subfamilies, for instance.

He then gets into a radioactive-decay-like equation for vocabulary resemblance, with the hypothesis that we tend to replace word forms at constant rates. Then noting variations in replacement rate, like for Icelandic and literary Norwegian (Riksmal), with Icelandic having much less replacement than Norwegian over the last millennium.

After discussing the odd fate of IE *gweru- "heavy", represented a lot in the older IE langs (Latin gravis, Greek barus, Sanskrit guru), but not as much in present-day ones, "Comparativists are familiar with many examples of this kind -namely the wide distribution of some words or roots in ancient languages and their almost total absence from the modern languages of the same family— which seem to be connected to the 'lifetime' of given words."

That seems silly, because it would be hard for people to have much acquaintance with how long their ancestors have been using some word form. There are other possibilities, like social upheaval leading to language change -- that would explain why Norwegian has changed much more than Icelandic, for instance.

So SS proposed an alternative to exp(-r*t) -- exp(-r*t^2) with the square of the time.
 
SS then gets into the possibility of different words having different replacement rates.

"For example, such words as 'cloud' and 'tail' are very stable in the Turkic languages but unstable in Germanic; the word 'belly' is very stable in Romance but unstable in Slavic and so forth."

So one needs to find word-form retention rate as a function of time.
For example, we date the disintegration of Belorussian and Ukrainian (97% correspondences) to be the fourteenth century A.D.; for different pairs of Germanic, Romance, Slavic or Turkic languages we get datings in the first millennium; for the disintegration of the Balto-Slavonic languages we get a dating around the end of the second millennium B.C.

Then he finds out how many non-borrowed cognates various langs are. Numbers are percentages.
  • English: German 94, Lithuanian 58.5, Russian 60, French 58 (from the text of the article)
  • English: German 95, Lithuanian 51, Russian 52, French 54 (from the Swadesh list with 87 out of 100 used)
  • Russian: Polish 95 +- 3, Lithuanian 74 +- 3, German 54 +- 3, French 51 +- 1 (collected for several texts)
It's rather easy to find (English, German), French, (Lithuanian, (Polish, Russian))

Thus supporting Germanic, Slavic, and Balto-Slavic.
 
Preliminary Lexicostatistics as a Basis for Language Classification: a New Approach by George Starostin
The younger Starostin

He distinguishes between classic lexicostatistics, done on families with well-understood word histories, and preliminary lexicostatistics, done on families without. The latter has risks of false positives, due to borrowing and coincidences, and false negatives, due to sound changes. He then gets into a lot of the complexities of doing historical linguistics.

Macro-comparative linguistics in the 21st century: state of the art and perspectives
The paper represents an attempt to explicitly summarize most of the major theoretical and methodological problems that, as of today, hinder significant progress in the field of macro-comparative linguistics (research on distant relationships between the various language families of the world). Among these problems are such issues as: the amount, quality, and nature of linguistic data that is necessary to establish long-distance relationship; methodological priorities of the etymologization process; and the complex interdependencies of “objective” (automated) and “subjective” (manual) data comparison. Partial solutions and/or recommendations of a general character are offered for each of the specified issues.

"Problem 1: Quantity or quality?"

Some macro-linguists go the quantity route, with huge etymological dictionaries, like 2,800 for Altaic and Nostratic, and 3,000 for Afroasiatic.
A common criticism of such “etymological mastodonts” is that these huge numbers are only possible since there are so many languages to choose from — implying that most, if not all, of the individual etymologies simply reflect chance resemblances that accumulate in daughter languages as time passes.
Using only a small number of meanings also has problems, like too few of them for good statistics.
Proposed solution: Size does not matter. A long-range genetic relationship hypothesis may operate with as many comparisons as it needs, no more and no less — but only provided there is some sort of objective methodology that helps range these comparisons and measure their degree of correspondence to historical and typological expectations.

...
What really does matter is our ability to arrange the evidence in a sort of pyramid, where the strongest comparisons (phonetically, semantically, distributionally) should be clearly positioned at the top, and then propped up by as much supportive evidence as is necessary at the bottom.
So if one has a possible correspondence for very good comparisons, one can check them on lower-quality ones.
 
"Problem 2: Grammar or lexicon?"

Morphology - word inflections
or
Lexicon - vocabulary

Though word inflections are not often borrowed, especially if they are not affixes, they are nevertheless more vulnerable to phonetic erosion, like dropping final sounds. That's what happened to the noun cases of Latin in the Romance languages, Latin's descendants. It's also happened to most of the Germanic languages.

Inflections also form a more convenient conceptual package than vocabulary.

His solution:
Morphology need not matter. (Note that this is quite different from “morphology does not matter”, which would be completely wrong).
It's good if one can have it, but one often doesn't. There's very little that's shared between English and Russian, for instance, as GS points out.
... The reverse, however, is not true: basic lexicon always matters ...
George Starostin proposes that there is no recognized genetic relationship that cannot be demonstrated with basic lexicon alone.
 
"Problem 3: Internal solutions or external evidence?"
This issue is a rich source for endless debates that usually take place between “narrow” specialists in particular language groups or families and “broader” specialists in macro-comparative studies. The model of the debate is always the same: a Vasconist / Japanologist / Sinologist / Indo-Europeanist / etc. criticizes select long-range etymologies (“Dene-Caucasian”, “Altaic”, “Nostratic”, etc.), concluding that the Basque / Japanese / Chinese / Indo-European / etc. part of the etymology is explainable as secondary on internal grounds, implicitly assuming or explicitly stating that internal etymologization should always take precedence over far-flung attempts at external explanation.
So the two sides portray each other as closed-minded conservatives and reckless enthusiasts.

What is really necessary in such cases is an elaborate standard against which the alternatives could be weighed objectively — for instance, a general database of the various types of language change, against the data of which (including statistical data) it would be possible to test the conflicting solutions.
We don't have a lot of good data on semantic shifts, for instance.

In other words, it is necessary, first and foremost, to recognize that most of the debates over internal vs. external etymologization do not so much reveal the personal flaws and biases of the participants (although these things happen, too) as they highlight the weak spots of general comparative methodology; inasmuch as they stimulate us to think of the possible ways to improve the method, they are quite useful, but the important thing is to not let oneself get carried away by ideological, let alone personal, motives
 
"Problem 4: Formal objectivity or subjective judgement?"

Noting how a lot of people have entered linguistics from various other fields, especially relatively hard sciences.
What puts them all together is the employment of formal probabilistic methods, usually based on Bayesian principles, to hunt for automatically generated optimal scenarios of language classification (Gray & Atkinson 2003), models of language evolution (Pagel et al. 2007), and, most recently, even protolanguage reconstruction (Bouchard-Côté et al. 2013).
Such methods have problems of their own, however.
(a) formal methods may give the impression of filtering out the subjective factor ... However, even in the strictest procedures of this kind subjectivity is never really ruled out completely.
In effect, GIGO: garbage in, garbage out.
b) despite the steady flow of works describing automated procedures, most of them cover relatively “safe” territory ...
Mainly Indo-European and Austronesian, sometimes Bantu, Semitic, and Turkic. They are all well-studied, and for the most part, with the exception of some IE langs, they have relatively simple phonetic changes.

So one should use both methods, using each one to check the other.
(1) Work first and foremost with evidence that may be quantified and statistically assessed
(2) Build up reference corpora of typological evidence
(3) Try to develop and apply universal standards and reference frames

Toward the end, he states this comparison of quality of hypothesis:
Nostratic > Austric > Nilo-Saharan, Amerind
 
Proto-Indo-European-Uralic Comparison from
the Probabilistic Point of View

Alexei Kassian,1 Mikhail Zhivlov,2 and George Starostin3

In this paper we discuss the results of an automated comparison between two 50-item groups of the most generally stable elements on the so-called Swadesh wordlist as reconstructed for Proto-Indo-European and Proto-Uralic. ...

Altogether we have counted 7 pairs where Proto-Indo-European and Proto-Uralic share the same biconsonantal skeleton (the exact same pairs are regarded as cognates in traditional hypotheses of Indo-Uralic relationship).
The 7 matches (IE, Ur):
  • "to hear" ... *klew- ... *kuwli ... KL
  • "I/me" ... *me- ... *min ... MH
  • "name" ... *nomn ... *nimi ... NM
  • "thou" ... *ti ... *tin ...TH
  • "water" *wed- ... *weti ... WT
  • "who" ... *kwi- ... ku- ... KH
  • "to drink" ... *eghw- ... *ighi- ... HK
The word forms were matched by turning them into two-consonant sequences with the consonants assigned to consonant class by place of articulation (where their sounds are made).

The authors then repeatedly scrambled the word lists and found how many matches for each scramble state. They found a probability of at least 7 matches for 1.9% and 0.5% depending on how fine-grained the consonant classes were.

So that gives a little bit of support for Indo-Uralic.
 
Analyzing Genetic Connections between Languages by Matching Consonant Classes
Confirming Indo-European and Semitic, and finding that Turkic, Mongolic, Tungusic, Japanese, and Eskimo are recognizably related.

I'll collect the various consonant classes. Some authors lump some of them together, and those I will note.

Proto-Indo-European-Uralic Comparison from the Probabilistic Point of View
Alexei Kassian,1 Mikhail Zhivlov,2 and George Starostin3
has the most split one:

P (p, f, ...), T (t, th, ...), S (s, sh, ...), C (ts, tsh, ...), K (k, kh, ...), M (m, ...), N (n, ng, ...), R (r, ...), L (l, ...), Q (tl, ...), W (w, ...), Y (y, ...), H ((none), h, ...)

The tl is a "lateral affricate", t followed by l.

The Global Lexicostatistical Database's Consonant Classes

P, T, (S, C), K, M, N, (R, L), Q, W, Y, H

Significance testing of the Altaic family | John Benjamins and Diachronica-Ceolin.pdf
has Aharon Dolgopolsky's list:

P, T, S, (K, C), M, N, (R, L), W, Y, H

Y may be written J, C as a Z-like symbol, and H as 0-slash or #.

The lumped ones have R and L together, and C with either S or K. Dolgopolsky's one doesn't have Q.
 
Proto Polynesian *-CIA - PL-519.193.pdf

While Polynesian languages have mostly isolating morphology, they do have a bit of inflection of their verbs, a suffix that is usually -Cia, and less often -a, -na, -Cina, collectively -CIA.
Individual Polynesian languages generally have between six and eleven alternants of -CIA . For example, Maori has eleven: -a, -ia, -hia, -kia, mia, -ljia, -ria, -tia, -ina, -kina and -whina, Samoan has ten: -a, -ia, -fia, -lia, -mia-, -ljia, -sia, -tia, - 'ia, -ina, Tongan also has ten: -a, -ia, -fia, -hia, -mia, -ljia, -tia (sia), - 'ia, -ina, -kina, Niuean has nine: a, -ia, -hia, -kia, -mia, -nia, -tia, -ina and -na, and Hawaiian has eight: -a, -ia, -hia, -kia, -lia, -mia, -nia and -na. In most Polynesian languages it is possible to consider all the -CIA suffixes as being say, a particular base (verb or other) selects a particular alternant but the choice cannot be predicted from the form of the base.

It is a straightforward matter to reconstruct for Proto-Polynesian many pairings of a particular base plus a particular alternant of *-CIA
Like *inu, *inumia "drink, drunk".

It has a variety of uses in the langs
(i) marks imperative mood,
(ii) derives a passive verb from a transitive verb,
(iii) derives a stative verb from a noun or an active intransitive or transitive verb,
(iv) derives a stative verb from another stative verb with a change of meaning,
(v) derives a transitive verb that takes ergative case-marking from an intransitive verb or an accusatively marked transitive verb,
(vi) changes the meaning of an ergative verb,
(vii) added to an ergative verb has no semantic effect
The -C- part is well-understood. It is the final consonant of the root that was lost when the root was not suffixed.
After the breakup of Proto Oceanic several daughter languages regularly lost final consonants in absolute word-final position However, the original final consonant of bases was preserved when a suffix followed. Such suffix-supported stem-final consonants are generally known as 'thematic consonants'.

One of the branches of Oceanic which lost final consonants was that ancestral to Polynesian, Fijian and Rotuman, which together form the Central Pacific subgroup. Proto-Polynesian preserved some POc final consonants not only in the *-CIA forms but also in three other suffixes: *-Ci, *Caki and *-Canga. PPn *-Ci and *-Caki derived a verb from another verb (with meaning change) or from a noun; PPn *-Canga derived a noun from a verb. Most present-day Polynesian languages retain at least a few reflexes of all four suffixes, although *-Caki reflexes are no longer fully productive in any contemporary language and reflexes of *-Ci are fully productive only in Tongan, and then only in a realisation - 'i, which does preserve original thematic consonants. In contemporary languages a particular thematic consonant is often retained in several suffixes.

What the -i- is: "The vowel -i- in the -CIA suffixes continues the Proto Oceanic 'short' transitive suffix *-i"

The article listed 4 hypotheses for the origin of the -a, and seems to support this sequence:

Suffixed 3rd-person object pronoun > transitivizer > passivizer
 
I thought of posting something on Columbus Day / Indigenous Peoples Day, but I neglected to do so. Either on its three-day-weekend date of Oct 9 or its traditional date of Oct 12. So I'll post something today.

 Classification of the Indigenous languages of the Americas gives several classifications, including Joseph Greenberg's classification.

() - Zamponi_R_2017_First_person_n_and_second.pdf -- another location of that document:
DOI: 10.26346/1120-2726-113
Italian Journal of Linguistics, 29.2 (2017), p. 189-230 (received March 2017)
First-person n and second-person m in Native America: a fresh look
Raoul Zamponi

I'd posted on his work earlier, so I'll try to collect it and Greenberg's classification. I'll try to fill in a gap: RZ didn't list the departures from n-m, because those may have patterns of their own.

Northern Amerind first.

Almosan-Keresiouan:
  • Almosan
    • Algic: n-k
    • Kutenai: u-i
    • Mosan
      • Wakashan: n-s
      • Salish: n-'an
  • Caddoan: t-s
  • Siouan (Western):
    • Sioux: (active subj) wa-ya, (active obj, stative) ma-ni
    • Crow: (active subj) baa-daa, (active obj), stative) bii-dii
    • Hidatsa: ma-da
    • Ho-Chunk: (intr active, trans subj) ha-ra, (intr stative, trans obj) hi-ni
    • Tutelo: m-y
    • Biloxi: nk-ay
    • Ofo: miti-tSiti
  • Iroquoian: k-hs
Hard to find any patterns here. Western Siouan pronouns jump around like crazy. Siouan has active-stative alignment:
  • Transitive-verb subject: with an active intransitive verb
  • Transitive-verb object: with a stative intransitive verb, like an adjective used as a verb

Hokan, Penutian: n-m
 
Central Amerind
  • Kiowa-Tanoan: (Jemez) nii-uwa
  • Oto-Manguean: na-(l,n,r)
  • Uto-Aztecan: n-m
Sort of fits n-m

Chibchan-Paezan:
  • Chibchan: da-ba
Looks like n>d and m>b

Andean:
  • Aymara: naya-huma
  • Quechua: ñuqa-qam (yaqa-qam)
  • Cahuapana–Zaparo: kwa-ki
  • Yahgan/Yamana: hai-san
  • Mapuche/Mapudungun: inche-eymi. (with verbs) n-imi
Looks like partially-preserved n-m.
 
Equatorial–Tucanoan:
  • Equatorial
    • Macro-Arawakan: n-p
    • Cayubaba: ai-a
    • Camsa: atSe-aka
    • Trumai: ha-hi
    • Tupi: on-en
    • Guarani: che-nde
    • Cofan: na-ke
    • Guahiboan: xani-xami
  • Tucanoan (Greenberg)
    • Tequiraca: kun-(?)
    • Kanoe: ai-mi
    • Kwaza: si-xyi
    • Nambikwaran: t'ai-w'ain
    • Pankararu: Se-pene
    • Maku: tene-ene
    • Tucanoan: ji-mi (?)
One can see a little bit of n-m, but not much.

Ge–Pano–Carib:
  • Macro-Ge: i-a
  • Carib: o-a
  • Pano-Tacanan: i-mi
  • Toba Qom: ayen-ahan
  • Mataco–Guaicuru: y-a
  • Andoque: o'o-ha'o
  • Boran: uu-ii
Even less n-m.

I've used Wikipedia articles on individual languages, families, and protolanguages, and  Je–Tupi–Carib languages as sources. There seems to be a second pronoun pattern in South America: i-a.
 
Is French moving towards polysynthesis? : linguistics
The result of all these changes is that the sequence subject clitic + object clitic + verb stem has become a fused unit within which other elements cannot intervene, and no other combination is possible. Put at its simplest, we may regard, for example, tu l’aimes? /tylem/ with rising intonation ‘you love him/her?’ as one polymorphemic word (subject-prefix + object-prefix + stem).
PandaTickler:
Another interesting example would be *chtelédi (je te l'ai dit). 1st person singular subject prefix, 2nd person singular (indirect) object prefix, 3rd person singular (direct) object prefix, preterite 1st person singular marker, verb stem. To be fair this can be interrupted by an adverb like bien, so only the chtelé part seems unsplittable.
NateSquirrel:
and one example would be conjugation of verbs being less and less common with the main form being the (often homonymous) infinitive/past participle/présent except for some verbs that are used very often with other verbs kinda like english style auxiliaries (eg "faire")

the fact that so many inflected forms of adjectives/nouns/verbs etc. sound the same and that the difference between them seems to be less and less understood by some parts of the population.

subject reduplication (eg "c'est moi qui l'ai fait", or even "c'est moi qui l'a fait" in French that I subjectively find very ugly) which may have the long term effect that verbs will always be used in the 3rd singular person...
So French seems like it's moving from the ancestral Indo-European sort of personal verb conjugation to a new one with subject and object prefixes, much like the Bantu languages. That ancestral sort is evident in Latin and somewhat less evident in the other Romance languages.

French speakers have dropped many final sounds, with the result that singular and plural nouns and adjectives almost always sound the same, with singular and plural being distinguished by determiner words like articles and possessives.

French verb conjugations are also reduced. In pronounciation, -, -, -; -oN, -e, - Also, the indefinite pronoun on seems to be taking the place of the first person plural, making effectively -, -, -; -, -e, -
 
I'll now review the work that got me interested in macro-linguistics: Vitaly Viktorovich Shevoroshkin's collections. Internet Archive: Shevoroshkin

The first one was "Typology, Relationship, and Time" (1986), by VVS and TL Markey. Its intro starts off with noting that Soviet macro-linguistic work is not very well-known in the Western world, and that many Western linguists are very skeptical of claims of long-range relationships, whether justified or not.

Toward the end of that intro, the editors mention how macro-linguistic comparisons could resolve such issues as the voicing of the stop consonants of Proto-Indo-European. Traditionally, they are *T, *D, *Dh -- *treyes (three), *duwô (two), *dhwer- (door). That has the problem of having plenty of words with *p and *bh but hardly any with *b, when it's almost always /p/ that drops out.

Tamaz Gamkrelidze and Vyacheslav Ivanov have proposed their glottalic theory, T(h), T', D(h), where T' is glottalic or ejective, pronounced with a short pause between the consonant and the voicing.

VVS and TLM propose Th, T, D, much like Thai, originating from Nostratic T', T, D

Laryngeals are very difficult, however, but the velar stops have a remarkable correspondence:
  • Ky (palatovelar) -- K + a
  • K (plain velar) -- K + ä,e,i -- front vowels
  • Kw (labiovelar) -- K + o,u -- back vowels

I've seen a theory elsewhere that Indo-European ablaut developed from pre-IE vowel harmony, like what Uralic, Turkic, and Mongolian have.

So someone should extend Miriam Robbeets's work on Transeurasian / Macro-Altaic to include Uralic and Indo-European.
 
VVS cites this example:
American linguists have debated whether or not there is such a thing as a genetic unity for "Hokan languages", and most of them have asserted that there is not, while Soviet linguists have proceeded to reconstruct Proto-Hokan.
Then discussing Penutian, stating that there is a well-defined family that contains Miwok-Costanoan, Maidu-Misenan, Yokuts, Wintu-Patwin, Klamath-Sahaptin, Takelma-Kalapuyan, Chinoon, and Tsimshian, but not Coos, Siuslaw, or Alsea.

Then on how Proto-Hokan and Proto-Penutian have very similar phonology and most stable elements, like personal pronouns. From  Amerind languages Joseph Greenberg has Penutian–Hokan.

But he says that Algonquian-Wiyot-Yurok, Yuchi-Siouan, Caddoan, Iroquoian, Salishan, Wakashan -- in JG's Almosan-Keresiouan -- belong to a different macrofamily, and from other publications, Dene-Caucasian.
 
Back
Top Bottom