Jump to content

Regression toward the mean


Recommended Posts

  • Replies 474
  • Created
  • Last Reply

Top Posters In This Topic

Thanks for making assumptions about what I do or don't understand.

866936[/snapback]

 

No need for assumptions. Reading every one of the 60+ pages in these few threads will give thousands of examples of things you dont understand. I can list them again, if you'd like.

 

Looks like johnny pinko commie coli took you to school tho. I think he needs an award for probably being the thousandth poster to prove you are wrong.

Link to comment
Share on other sites

The problem is there's no usefull information that can be used in this example to form a reasoned response to. 

-What is the mean of the population? 

-How was that mean determined? 

-If it was determined using the same imperfect test how do you know you have a true mean? 

-What is the data range? 

-Are there any scores above 140? 

-Why are you choosing 140? 

 

Another huge problem is that you're testing according to what you want to see.  You are setting it up to get what you want.  If any of the 140 retest below 140, will you test them again, or will you be happy that they scored lower because that's what you want to see? What if they score higher upon a third test? Hell, what if they all test higher because they're getting better at taking the test?

 

That's why I said you're not even approaching the problem from a scientific standpoint.  You're already totally biased.  Why the hell am I even involved in this thread?

866877[/snapback]

Mostly this is a debate about whether a specific statistical phenomenon does or does not exist. I'll avoid answering some of your questions--such as how the mean was determined--to focus on the underlying statistical question.

 

Suppose an underlying, normally distributed population with a mean of 100, and a standard deviation of 10. (This could be a normal distribution for height, I.Q., running speed, whatever. Doesn't matter.) Measure each member of the population with an imperfect test. If someone were to take the test 1000 times, the actual results would be normally distributed around this person's true score, with a standard deviation of 2.5.

 

Give the whole population the test one time. Arbitrarily define a subset of the population, based strictly on their test scores the first time they took the test. For example, you might choose as your subset those who scored between 1 and 1.5 standard deviations above the mean. Or you might choose only those who scored below two standard deviations below the mean. The only restriction on the subset is that it cannot contain both members who scored above and below the mean.

 

Administer a second test to the members of your subset; with the same mean error term of zero and the same standard deviation of error of 2.5. I contend that, as a group, the members of the subset are expected to obtain scores that are closer to the population mean upon retaking the test, than the group averaged when it first took the test. This is because any subset whose members scored below the mean will contain more people who got "unlucky" than "lucky" on the first test; whereas any subset whose members scored above the population mean will conversely contain more people who got "lucky" than "unlucky." When you retest the subset, good luck will presumably balance out bad; and the group as a whole will obtain scores somewhat closer to the population's mean.

 

Whatever view you have of me or of my ideas, you cannot deny that the above is valid statistical reasoning.

Link to comment
Share on other sites

Mostly this is a debate about whether a specific statistical phenomenon does or does not exist. I'll avoid answering some of your questions--such as how the mean was determined--to focus on the underlying statistical question.

 

Suppose an underlying, normally distributed population with a mean of 100, and a standard deviation of 10. (This could be a normal distribution for height, I.Q., running speed, whatever. Doesn't matter.) Measure each member of the population with an imperfect test. If someone were to take the test 1000 times, the actual results would be normally distributed around this person's true score, with a standard deviation of 2.5.

 

Give the whole population the test one time. Arbitrarily define a subset of the population, based strictly on their test scores the first time they took the test. For example, you might choose as your subset those who scored between 1 and 1.5 standard deviations above the mean. Or you might choose only those who scored below two standard deviations below the mean. The only restriction on the subset is that it cannot contain both members who scored above and below the mean.

 

Administer a second test to the members of your subset; with the same mean error term of zero and the same standard deviation of error of 2.5. I contend that, as a group, the members of the subset are expected to obtain scores that are closer to the population mean upon retaking the test, than the group averaged when it first took the test. This is because any subset whose members scored below the mean will contain more people who got "unlucky" than "lucky" on the first test; whereas any subset whose members scored above the population mean will conversely contain more people who got "lucky" than "unlucky." When you retest the subset, good luck will presumably balance out bad; and the group as a whole will obtain scores somewhat closer to the population's mean.

 

Whatever view you have of me or of my ideas, you cannot deny that the above is valid statistical reasoning.

866973[/snapback]

No. You're assuming the entire subset tested above the population mean because of "luck." That assumption is entirely wrong. Retesting the subset will give you a more accurate mean for the subset. The members of the subset have just as much chance of doing better than of doing worse upon further testing. The group's mean should not shift. There is no way you could definitively make the case that the subset's mean would move towards the population mean. If the entire group shifted lower then red flags and alarms should be going off that either your test is garbage, or the data from the first test for the whole population is garbage.

 

To put it in simple terms, you can not take all the smart people in the room and expect that they will, as a group, test dumber the second time because they were lucky the first time.

 

I'm done with this. Leave me out of your discussion.

 

Ignore button, from hell's heart I stab at thee.

Link to comment
Share on other sites

No.  You're assuming the entire subset tested above the population mean because of "luck."  That assumption is entirely wrong.  Retesting the subset will give you a more accurate mean for the subset.  The members of the subset have just as much chance of doing better than of doing worse upon further testing.  The group's mean should not shift. There is no way you could definitively make the case that the subset's mean would move towards the population mean.  If the entire group shifted lower then red flags and alarms should be going off that either your test is garbage, or the data from the first test for the whole population is garbage.

 

To put it in simple terms, you can not take all the smart people in the room and expect that they will, as a group, test dumber the second time because they were lucky the first time. 

 

I'm done with this.  Leave me out of your discussion.

 

Ignore button, from hell's heart I stab at thee.

867076[/snapback]

You're wrong. Suppose your subset consisted of those who obtained exceptionally high scores. Given that actual scores are due in part to luck, the subset you selected got disproportionately lucky on the first test. Retest them, and on average their score will be somewhat closer to the population mean. But I'm the one who doesn't even have a rudimentary understanding of statistics. :oops:

Link to comment
Share on other sites

To summarize three threads and 50 pages in a nutshell. HA thinks everyone else here doesn't know squat about statistics and insists that they are wrong by bringing out his logic. Everyone else thinks HA is butchering statistics and the usage of regression towards the mean.

I'll give HA credit for taking 50 pages of abuse, and still plugging along. However, I feel dumber every time i read these threads :oops:, especially knowing that i have contributed to this cluster!@#$.

Link to comment
Share on other sites

No.  You're assuming the entire subset tested above the population mean because of "luck."  That assumption is entirely wrong.  Retesting the subset will give you a more accurate mean for the subset.  The members of the subset have just as much chance of doing better than of doing worse upon further testing.  The group's mean should not shift. There is no way you could definitively make the case that the subset's mean would move towards the population mean.  If the entire group shifted lower then red flags and alarms should be going off that either your test is garbage, or the data from the first test for the whole population is garbage.

 

To put it in simple terms, you can not take all the smart people in the room and expect that they will, as a group, test dumber the second time because they were lucky the first time. 

 

I'm done with this.  Leave me out of your discussion.

 

Ignore button, from hell's heart I stab at thee.

867076[/snapback]

 

The way he set it up, you can. It's just meaningless: you test everyone, discard the low-scoring ones, and the ones you retest score closer to their true test score.

 

Which is fine, if you have a priori knowledge of their "true score". But how do you know what their true test scores is? Well, you have to test them...

 

I'm sure everyone on the planet except for HA sees what a crock of sh-- that is... :oops:

Link to comment
Share on other sites

You're wrong. Suppose your subset consisted of those who obtained exceptionally high scores. Given that actual scores are due in part to luck, the subset you selected got disproportionately lucky on the first test. Retest them, and on average their score will be somewhat closer to the population mean. But I'm the one who doesn't even have a rudimentary understanding of statistics.  :oops:

867108[/snapback]

Ah, now I see how these threads get so long. You change everything when someone points out that you're completely wrong. But in this case, you're still wrong. You have no idea who got "lucky" in the the high score subset. You're also assuming that the majority of those with high scores got them because they were lucky. They can't all be lucky, therfore the majority of high scores are correct, and the mean of the subset should not shift towards the population mean. Also, if that much data was skewed in one direction and then shifts, it stands to reason that the population mean would also be wrong because they all took the same test.

Link to comment
Share on other sites

The way he set it up, you can.  It's just meaningless: you test everyone, discard the low-scoring ones, and the ones you retest score closer to their true test score.

 

Which is fine, if you have a priori knowledge of their "true score".  But how do you know what their true test scores is?  Well, you have to test them... 

 

I'm sure everyone on the planet except for HA sees what a crock of sh-- that is...  :oops:

867118[/snapback]

That's not what he said, though. Maybe the individuals test closer to their true mean and in some cases their true mean would shift toward's the population mean, but the entire subset's mean should not shift. He was arguing that the group's mean would shift toward's the population mean upon further testing. That doesn't make sense. That would argue for the majority of the subset having tested wrong to begin with. That's insane.

Link to comment
Share on other sites

That's not what he said, though.  Maybe the individuals test closer to their true mean and in some cases their true mean would shift toward's the population mean, but the entire subset's mean should not shift.  He was arguing that the group's mean would shift toward's the population mean upon further testing.  That doesn't make sense.  That would argue for the majority of the subset having tested wrong to begin with.  That's insane.

867133[/snapback]

 

Actually, if you think about it...it does. It's because of the completely arbitrary nature of the cut-off: picture the gaussian curve. Cut it sharply at 140 (hell - the three-sigma limit, let's keep it measurement-neutral). If you test that subset again, there will be SOME tests that are less than that arbitrary three-sigma limit, simply because of the natural distribution of error - the cut-off is no longer sharp, but tails off (very quickly, as it's dictated by the gaussian distribution of the error and not the population, but it does tail off). Therefore, the mean of EVERYONE who scores three sigmas or above does drop SLIGHTLY.

 

HA's utter and complete stupidity...well, it exists on several levels:

- he doesn't understand that there's NO earthly reason to even do anything like that.

- He's so ignorant about basic math that it takes him several HUNDRED posts to get his point across, regardless of whether or not his point is correct or not. He doesn't have the vocabulary to discuss it intelligently.

- he doesn't realize that, in the limit of the entire population, that tail-off is counter-balanced by a similar effect in the larger subset he's ignoring...so that the net effect over the population is zero.

- he keeps vacillating on the parameters of the arbitrary cut-off ("Take everyone who scores 140...no, wait, take everyone who scores ABOVE 140...no, wait, I was right the first time.")

- he can't distinguish between discrete and continuous variables. In particule, the numbskull keeps discussing IQ scores in increments of ten with errors of plus or minus ten. OF COURSE you're going to see some effect at that granularity. He keeps talking about scores of 130, 140, 150...what about the people at 134-146? Which brings us to:

- he can't do simple math. He can do simple bull sh-- math, in increments of tens. Nothing more complex.

- he can't distinguish between error and variance. Yes, I already said he doesn't have the vocabulary, but this is more basic: he can't differentiate between a normal population distribution and the normal distribution of error in a single test. Basically, he's equating the error in my taking an IQ test to everyone's error in every IQ test ever taken.

 

And that's just the math errors. Never mind the topics of genetics and psychology.

Link to comment
Share on other sites

That's not what he said, though.  Maybe the individuals test closer to their true mean and in some cases their true mean would shift toward's the population mean, but the entire subset's mean should not shift.  He was arguing that the group's mean would shift toward's the population mean upon further testing.  That doesn't make sense.  That would argue for the majority of the subset having tested wrong to begin with.  That's insane.

867133[/snapback]

It's an effect that happens at the margin of the cutoff. Suppose you were to retest everyone who scored above the population mean. Some of the people you're retesting will be people with true I.Q.s below the population mean who got lucky on that first test. What about the people who should be balancing them out--those with true I.Q.s above the mean who got unlucky and scored below the mean? Those people didn't make the cutoff, and aren't being retested.

 

Hence, more than 50% of the people who made the cutoff got lucky on that first I.Q. test. When you retest the subgroup, its average score will decline somewhat.

Link to comment
Share on other sites

Actually, if you think about it...it does.  It's because of the completely arbitrary nature of the cut-off: picture the gaussian curve.  Cut it sharply at 140 (hell - the three-sigma limit, let's keep it measurement-neutral).  If you test that subset again, there will be SOME tests that are less than that arbitrary three-sigma limit, simply because of the natural distribution of error - the cut-off is no longer sharp, but tails off (very quickly, as it's dictated by the gaussian distribution of the error and not the population, but it does tail off).  Therefore, the mean of EVERYONE who scores three sigmas or above does drop SLIGHTLY. 

I agree with the above paragraph, and I was amused by your long list of excuses to keep calling me an idiot. You clinging to that idea even after it's been disproven is like . . . well, a lot of stuff that happens on these boards.

 

I'd like to add to your above paragraph by pointing out that the effect I'm describing happens even if you have two cutoffs. Suppose you defined your subset as being only those people who scored between 1 and 2 standard deviations above the population mean. In addition to those who ought to be in the cutoff, and are in, you have four groups of people:

1. Lucky people bleeding in from the left cutoff.

2. Unlucky people bleeding in from the right cutoff.

3. Unlucky people who bleeded out through the left cutoff.

4. Lucky people who bleeded out through the right cutoff.

 

The number of lucky people who bleeded in through the left cutoff exceeds the number of unlucky people who bled out through that cutoff. This is because the number of people at, say, 0.9 - 1 standard deviations (available for getting lucky) exceeds the number of people at 1 - 1.1 standard deviations available for getting unlucky. The existence of this cutoff increases the percentage of lucky people in your subset.

 

What happens at the other cutoff? Clearly there are more people at 1.9 - 2 standard deviations available for getting lucky, than there are people at 2 - 2.1 standard deviations, available for getting unlucky. The existence of this cutoff decreases the percentage of lucky people in your subset. However, its effect is much smaller than the other cutoff. This is because everything is happening on a smaller scale as you drift on out toward the right hand tail of the distribution. When the effects of the two cutoffs are averaged out, you're still left with a subset who got disproportionately lucky on that first test. Test them again, and their average will be somewhat closer to the population mean.

Link to comment
Share on other sites

I agree with the above paragraph, and I was amused by your long list of excuses to keep calling me an idiot. You clinging to that idea even after it's been disproven is like . . . well, a lot of stuff that happens on these boards.

 

I'd like to add to your above paragraph by pointing out that the effect I'm describing happens even if you have two cutoffs. Suppose you defined your subset as being only those people who scored between 1 and 2 standard deviations above the population mean. In addition to those who ought to be in the cutoff, and are in, you have four groups of people:

1. Lucky people bleeding in from the left cutoff.

2. Unlucky people bleeding in from the right cutoff.

3. Unlucky people who bleeded out through the left cutoff.

4. Lucky people who bleeded out through the right cutoff.

 

The number of lucky people who bleeded in through the left cutoff exceeds the number of unlucky people who bled out through that cutoff. This is because the number of people at, say, 0.9 - 1 standard deviations (available for getting lucky) exceeds the number of people at 1 - 1.1 standard deviations available for getting unlucky. The existence of this cutoff increases the percentage of lucky people in your subset.

 

What happens at the other cutoff? Clearly there are more people at 1.9 - 2 standard deviations available for getting lucky, than there are people at 2 - 2.1 standard deviations, available for getting unlucky. The existence of this cutoff decreases the percentage of lucky people in your subset. However, its effect is much smaller than the other cutoff. This is because everything is happening on a smaller scale as you drift on out toward the right hand tail of the distribution. When the effects of the two cutoffs are averaged out, you're still left with a subset who got disproportionately lucky on that first test. Test them again, and their average will be somewhat closer to the population mean.

867218[/snapback]

 

But - and here's the key point here - THAT'S NOT REGRESSION TOWARD THE MEAN. You keep insisting it is. This is because you can't distinguish between the error in the test and the normal distribution of the population. This is because you're a !@#$ing idiot.

Link to comment
Share on other sites

But - and here's the key point here - THAT'S NOT REGRESSION TOWARD THE MEAN.  You keep insisting it is.  This is because you can't distinguish between the error in the test and the normal distribution of the population.  This is because you're a !@#$ing idiot.

867286[/snapback]

I got a chuckle out of your post. I can't distinguish between the error in the test and the normal distribution of the population? That's rich.

 

In my earlier post, I wrote that the +1 to +2 SD subset contains a higher percentage of lucky people than of unlucky people. Let's expand on that by considering a very thin slice of the measured distribution, as well as an imperfect test.

 

Consider the group of people who scored between 1.99 and 2.01 SDs above the mean. The test is error-prone; so each person 1 SD above the mean has a chance X to get lucky and be scored inside this small slice. Each person 3 SD above the mean also has a chance X to get unlucky and be scored inside this small slice. The number of lucky people from +1 SD = X * the number of people one SD above the mean. The number of unlucky people from +3 SD = X * the number of people three SD above the mean. Because there are more people at one standard deviation above the mean than at 3 SDs; there will be more lucky 1 SDs present in your thin slice, than unlucky 3 SDs. The same logic also holds if you're comparing the number of lucky 1.5 SDs to the number of unlucky 2.5 SDs. Indeed, it applies to any two points on the normal distribution that are equally far from the center of your thin slice. The point that's closer to the population mean will contribute more people to your slice than the point that's farther away. This means that the average member of your thin slice is closer to the population mean than his or her test score would indicate. Retest the members of your thin slice, and they will, on average, score somewhat closer to the population mean the second time around.

Link to comment
Share on other sites

I got a chuckle out of your post. I can't distinguish between the error in the test and the normal distribution of the population? That's rich.

 

In my earlier post, I wrote that the +1 to +2 SD subset contains a higher percentage of lucky people than of unlucky people. Let's expand on that by considering a very thin slice of the measured distribution, as well as an imperfect test.

 

Consider the group of people who scored between 1.99 and 2.01 SDs above the mean. The test is error-prone; so each person 1 SD above the mean has a chance X to get lucky and be scored inside this small slice. Each person 3 SD above the mean also has a chance X to get unlucky and be scored inside this small slice. The number of lucky people from +1 SD = X * the number of people one SD above the mean. The number of unlucky people from +3 SD = X * the number of people three SD above the mean. Because there are more people at one standard deviation above the mean than at 3 SDs; there will be more lucky 1 SDs present in your thin slice, than unlucky 3 SDs. The same logic also holds if you're comparing the number of lucky 1.5 SDs to the number of unlucky 2.5 SDs. Indeed, it applies to any two points on the normal distribution that are equally far from the center of your thin slice. The point that's closer to the population mean will contribute more people to your slice than the point that's farther away. This means that the average member of your thin slice is closer to the population mean than his or her test score would indicate. Retest the members of your thin slice, and they will, on average, score somewhat closer to the population mean the second time around.

867708[/snapback]

 

And again, for the millionth time, you're seeing the regression of the error towards the mean of the error. Which is not regression of the tested person to the population mean.

 

One is not the other. One might look like the other...but only to a complete tool like yourself who doesn't know what he's talking about.

Link to comment
Share on other sites

And again, for the millionth time, you're seeing the regression of the error towards the mean of the error.  Which is not regression of the tested person to the population mean. 

 

One is not the other.  One might look like the other...but only to a complete tool like yourself who doesn't know what he's talking about.

867718[/snapback]

My example makes it perfectly obvious that even in a thin slice, the average person is closer to the population mean than his or her test score would indicate. (Assuming an imperfect test, of course.) Retest the people in the thin slice, and their test scores will, on average, be somewhat closer to the population's mean. Is there anything in this paragraph with which you disagree?

Link to comment
Share on other sites

My example makes it perfectly obvious that even in a thin slice, the average person is closer to the population mean than his or her test score would indicate. (Assuming an imperfect test, of course.) Retest the people in the thin slice, and their test scores will, on average, be somewhat closer to the population's mean. Is there anything in this paragraph with which you disagree?

867726[/snapback]

 

Start using proper terminology, and I'll discuss it with you. The "average person" is, by definition, AT THE POPULATION MEAN, YOU MORON. THAT'S WHAT "AVERAGE" MEANS.

 

It's impossible to discuss when you abuse the vocabulary.

Link to comment
Share on other sites

×
×
  • Create New...