• Welcome to the Internet Infidels Discussion Board.

You're measuring it wrong or How dinosaurs can help child malnutrition

Perspicuo

Veteran Member
Joined
Jan 27, 2011
Messages
1,289
Location
Costa Rica
Basic Beliefs
Empiricist, ergo agnostic
GatesBlog: How Dinosaurs Could Help Us Fight Malnutrition
http://www.gatesnotes.com/Health/How-Dinosaurs-Could-Help-Us-Fight-Malnutrition

For example, some researchers recently looked at the relationship between gross domestic product and childhood stunting and, to everyone’s surprise, they found no correlation—until Nathan pointed out that they were using the wrong statistical methods to analyze the information. The methods he suggested instead—based on his work on dinosaurs—showed that the relationship was actually even stronger than many people in the field had thought. And that could have a big impact on how policymakers and health-care workers approach the problem of childhood nutrition.

Nice, heartwarming piece from one of my favorite Bills. But what called my attention is that if we're measuring wrong this very mundane and uncontraversial subject, what else are we measuring wrong?

I'm thinking eye-opening studies about yoga being as good as exercise, and meditation helping all sorts of things. I'm not against these facts... I'm against no "facts", because I'm an empriricist.

Can our trust in what now is being recommended by experts in respected fields be trusted?

My statistics professors never told me their field was such a minefield.
 
God, do I hate statistics. The field, not the plural of 'statistic'. I analyze lab data and compare it to computer predictions for my job, and the number of ways to statistically polish a turd is impressive and also terrifying.

The old standby of biology researchers is the p-value. If it's below 0.05, publish. If not, do more experiments until it gets there. If that still doesn't work, cut it in half by using a one-tailed distribution in your statistical test instead of a two-tailed distribution. After all, the difference between the two comes down to the private mental state of the investigator:

GraphPad Statistics Guide said:
When is it appropriate to use a one-sided P value?

A one-tailed test is appropriate when previous data, physical limitations, or common sense tells you that the difference, if any, can only go in one direction. You should only choose a one-tail P value when both of the following are true.

• You predicted which group will have the larger mean (or proportion) before you collected any data.
• If the other group had ended up with the larger mean – even if it is quite a bit larger – you would have attributed that difference to chance and called the difference 'not statistically significant'.

This stupid little condition has opened the floodgates for so much abuse in academia, and probably industry as well if we're talking pharmaceuticals (we knew beforehand that the pill would work, therefore the p-value is 0.04 instead of 0.08).

/rant
 
À propos, this came out yesterday:

Psychology Journal Bans Significance Testing
http://www.sciencebasedmedicine.org/psychology-journal-bans-significance-testing/

This is good news, as I understand it.
It goes:

However, the p-value was never meant to be the sole measure of whether or not a particular hypothesis is true. Rather it was meant only as a measure of whether or not the data should be taken seriously. Further, the p-value is widely misunderstood. The precise definition is:

The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true.​

In other words, it is the probability of the data given the null hypothesis. However, it is often misunderstood to be the probability of the hypothesis given the data. The editors understand that the journey from data to hypothesis is a statistical inference, and one that in practice has turned out to be more misleading than informative. It encourages lazy thinking – if you reach the magical p-value then your hypothesis is true.
 
À propos, this came out yesterday:

Psychology Journal Bans Significance Testing
http://www.sciencebasedmedicine.org/psychology-journal-bans-significance-testing/

This is good news, as I understand it.
It goes:

However, the p-value was never meant to be the sole measure of whether or not a particular hypothesis is true. Rather it was meant only as a measure of whether or not the data should be taken seriously. Further, the p-value is widely misunderstood. The precise definition is:

The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true.​

In other words, it is the probability of the data given the null hypothesis. However, it is often misunderstood to be the probability of the hypothesis given the data. The editors understand that the journey from data to hypothesis is a statistical inference, and one that in practice has turned out to be more misleading than informative. It encourages lazy thinking – if you reach the magical p-value then your hypothesis is true.

So true... I wish more people would understand it. I work with a post-doc who routinely says " the p-value is less than 0.001, so it means there is only a 0.001% chance that the hypothesis is false!"
 
À propos, this came out yesterday:

Psychology Journal Bans Significance Testing
http://www.sciencebasedmedicine.org/psychology-journal-bans-significance-testing/

This is good news, as I understand it.
It goes:

So true... I wish more people would understand it. I work with a post-doc who routinely says " the p-value is less than 0.001, so it means there is only a 0.001% chance that the hypothesis is false!"

P-values are abused and misunderstood, but are still a useful piece of information. What is tells us is, if there was no effect, how often would a similar or bigger effect be observed due merely to sampling error. If sampling error alone will often produce the same results, then the results provide very little evidence for the experimental hypothesis. A single result of mean differences with a p value of .001 is much much more likely to hold up and be replicated than the same mean difference with p value of .5.

The bigger problem is the "alpha" level itself and the practice of treating .05 as a magical level which strongly determines what effects differences are treated as "real" and what are not. And the practice of refusing to publish studies unless they reject the null hypothesis at that alpha level. For example, a paper with 2 experiments that have the same result and a p = .06 will be treated as "not real" and likely rejected over a paper with a single experiment that has a p = .04. The for former set of experiments is objectively stronger evidence of a "real" effect. Replication is stronger evidence against a mere sampling error explanation, than is a small difference in p value.

This journal is being unreasonably reactionary. In fact, they are allowing the p values to be in there during review, but then they must be reviewed when published. That is the exact opposite of what they should do. The biggest problem is reviewers bias in making publish or not decisions on p values. Only the methods should matter for publication, not the results. They should strip the paper of all results before review, then put the stats back for publication.
 
So true... I wish more people would understand it. I work with a post-doc who routinely says " the p-value is less than 0.001, so it means there is only a 0.001% chance that the hypothesis is false!"

P-values are abused and misunderstood, but are still a useful piece of information. What is tells us is, if there was no effect, how often would a similar or bigger effect be observed due merely to sampling error. If sampling error alone will often produce the same results, then the results provide very little evidence for the experimental hypothesis. A single result of mean differences with a p value of .001 is much much more likely to hold up and be replicated than the same mean difference with p value of .5.

The bigger problem is the "alpha" level itself and the practice of treating .05 as a magical level which strongly determines what effects differences are treated as "real" and what are not. And the practice of refusing to publish studies unless they reject the null hypothesis at that alpha level. For example, a paper with 2 experiments that have the same result and a p = .06 will be treated as "not real" and likely rejected over a paper with a single experiment that has a p = .04. The for former set of experiments is objectively stronger evidence of a "real" effect. Replication is stronger evidence against a mere sampling error explanation, than is a small difference in p value.

Agreed. Many publications I have been part of have been shuffled down to lower impact-factor journals for this reason. In my field (immunology), getting consistent results across multiple experiments is so much harder to do than getting a significant result in one experiment, especially when the samples are from human tissue. But some studies in the really prestigious journals just have an experiment on one dude's cells with a p value of <0.0001... all that means is that the null hypothesis is very unlikely to randomly produce the observation in that particular person's cells, but some people have the crazy idea that immunological data should be applicable to more than just one person. Anyway.

This journal is being unreasonably reactionary. In fact, they are allowing the p values to be in there during review, but then they must be reviewed when published. That is the exact opposite of what they should do. The biggest problem is reviewers bias in making publish or not decisions on p values. Only the methods should matter for publication, not the results. They should strip the paper of all results before review, then put the stats back for publication.

That's probably true. It's easy to go overboard either way.
 
Back
Top Bottom