I don't believe they are 'massively impacted' by SES or any of those other things, inasmuch as I think most of the variability in scores is indeed attributable to the academic aptitude of the test taker.
Well, "massive" is vague enough.
You don't think a 16 percentile difference between first and 3rd attempt counts as "massive", that what you're trying to say?
What per cent of the total variability in SAT scores do you think is actually attributable to academic aptitude? 10% 50% 90%?
The way it has been discussed by some, they seem to have no problem uttering obvious falsehoods such as 'SAT score measures SES status', which it does not.
I noticed you didn't use real quotes. Did anyone actually say that?
Also, your question is too broad to be meaningful. Particularly with any psychological or aptitude test, how much a measure captures X and X alone is completely context dependent. My prior link to the research on Raven's Matrices, often considered one of the most "pure" measures of IQ, illustrates this. If two people being compared both have zero experience with that test then a much larger % of the difference in their scores is a result of general "aptitude" differences, although interest and motivation to take that test in the given testing conditions are still factors. But if one person has taken it twice already, they may score 1/4 standard deviation higher (which is pretty sizable) which could be the exact opposite of the actual difference in the general abilities the test is supposed to measure. IOW, a person who is 1/4 standard deviation higher then another person in the true general aptitudes Raven's is intended to measure will score 1/4 SD lower if that other person has experience taking the Ravens. So, in that comparison 0% of the variance in those people's scores reflect true variance in aptitude. In fact, it's analogous to negative 100%, if such a thing were possible, which it does kinda operate as a negate when added into the computation of total variance explained in a larger population.
So, the answer to your question is that it changes with changing degrees of contextual factors that vary both across time and between sub populations, and is not constant across the full range of scores. That last point is important, b/c most practice/training designed to improve scores on some test wind up having a "rich get richer" effect, which is a standard term referring to those who already have higher scores are the one's most helped by that practice and training. This is usually b/c people whose current skill level is low (either for innate or prior development reasons) lack the neccessary general skill to gain test-specific skills from the opportunities that practice and training afford. So, paying for extra training or test attempts doesn't account for that much of the difference between those at the 30th vs. 40th percentiles. The fact that most of them are lower SES and can't afford such opportunities anyway makes that even more true. But those test takers don't wind up applying to college anyway, so they aren't central to the issue.
In contrast, for people whose true aptitude is already in the top 30th percentile, those who comprise the vast majority of college applicants, training/practice opportunities can make a big difference, boosting them from the 75th vs. 91st percentile scores, which would be the deciding difference at most schools. So, let's say that across the total variance in all SAT scores, just 15% is due to SES related factors tied to paying for test-specific opportunities like tutors, courses, more tests, etc.. And suppose that 2/3 of that 15% was concentrated in the top 30th percentiles of test takers (due to "rich get richer" effects). That means the 1/3 of the variance is due to these testing-specific SES factors, among those students for whom it matters b/c they are applying to college. It means that a sizable % of the time using SAT scores will favor the wrong student.
Now, other than that coaching and testing has large effect sizes on scores, we don't know the exact % of total variance it accounts for. But my point is to illustrate why your question isn't meaningful, b/c even if the total variance it accounts for is rather low, the % it accounts for is likely much higher in the high end of the distribution where all the stakes are.
Finally (I know, right), I kept talking about "test-specific SES factors" to distinguish from the huge influence we know SES has on the development of real academic aptitude via determining the quality of teachers, classroom materials, computers at school and home, reduced external stressors, not having to work in high school, etc.. All of those impact the very academic skills that do determine college success. It's a problem, but not a problem to be fixed at college admissions b/c it's too late by then. My arguments about SAT and ACT are not about those SES influences, but about the influence of wealth on paying for opportunities directly about boosting admission tests scores.