by ohthatpatrick Wed May 10, 2017 2:20 pm
If we wanted to compare the age of smokers' 1st attack to nonsmokers' 1st, we would need to know when everyone's 1st heart attack was.
But we're only asking people who SURVIVED a 1st heart attack.
We also need the data on people who DIED during their 1st (and only) heart attack.
Consider a smaller (manageable) data set of 5 people each:
Age of 1st attack for nonsmoker survivors
(50, 61, 62, 64, 67) - median age of 62
Age of 1st attack for smoker survivors
(48, 49, 51, 53, 58) - median age of 51
Now consider we consider the ages of people who DIED during their 1st attack.
Age of 1st for nonsmokers who died during attack
(39, 42, 44, 47, 49)
Age of 1st for smokers who died during attack
(45, 49, 56, 60, 61)
When you combine those data sets, you get
AGE OF 1ST ATTACK FOR NONSMOKERS
(39, 42, 44, 47, 49, 50, 61, 62, 64, 67) - median age of 49.5
AGE OF 1ST ATTACK FOR SMOKERS
(45, 48, 49, 49, 51, 53, 56, 58, 60, 61) - median age of 52
With this data, nonsmokers DO NOT tend to have their 1st attack eleven years later.
Obviously I created my own data for the 1st attack for dead smokers/nonsmokers, but the point of (E) is that the missing data could actually make the conclusion dead wrong (no pun intended).
Here's an analogous argument:
Of 2,500 people who were admitted to Harvard Law and made it to their third year, the boys had a median LSAT score of 168 and the girls had a median LSAT score of 173. Thus, girls who were admitted to Harvard Law tend to have an LSAT score that's five points higher than that of boys admitted to Harvard Law.
This conclusion is missing information about the median LSAT scores of boys/girls who were admitted to Harvard but dropped out during their first or second years.