... I would be very surprised if the current crop did not rely heavily on some of the fairly decent shallow parsing systems that are out there. I can't imagine how they would work without them.
Like this?
Link Grammar with
Parse a sentence
Some source code is at
GitHub - opencog/link-grammar: The CMU Link Grammar natural language parser
English should be fairly easy for an LLM, since it doesn't have much word morphology (variation in word forms) in its grammar, making it close to analytic or isolating.
Nouns have only two forms: singular and plural. Adjectives may have one form or three forms: plain, comparative, and superlative (good, better, best). Verbs may have four forms (parse, parses, parsed, parsing) or five forms (see, sees, saw, seen, seeing), with only one verb having more (am, is, are, was, were, be, been, being). Pronouns are more complicated than nouns and adjectives, but not by much. (I, me, my, mine; we, us, our, ours; you, your, yours; he, him, his; she, her, hers; it, its; they, them, their, theirs; this, these; that, those).
But English has oodles of compound verb constructions in its grammar, though they are very regular. Also compound comparatives and superlatives (regular, more regular, the most regular).
Languages like Chinese (for the most part), Vietnamese, and Thai have only one form per word, and their verb conjugations are all compound forms.
But some languages have large numbers of forms for some of their words, and one might have to give those inflections or else hint that the model should look for them.
Spanish verb 'amar' conjugated - amo, amas, ama, amamos, amáis, aman; amaba, amabas, amábamos, amabais, amaban; amé, amaste, amó, amasteis, amaron; amaré, amarás, amará, amaremos, amaréis, amarán; ame, ames, amemos, améis, amen; amara, amaras, amáramos, amarais, amaran, amare, amares, amáremos, amareis, amaren; amaría; amarías; amaríamos, amaríais, amarían; amad; amar, amando, amado (-a, -os, -as) -- I'm leaving out the compound forms. That's 46 forms, with 1 adjective form, counted only once. Fortunately, Spanish nouns only have 2 forms and Spanish adjectives 4 forms.
Russian nouns typically have 10 forms, Russian adjectives 13 forms, and for an imperfective-perfective pair, Russian verbs around 32 forms, counting all 5 adjective forms only once.