|
We're Not Running Any Faster!
Right from the horses' mouths they tell us "We're
not running any faster!" My friend Marvin points this out, and asks Why
Not? Examining the Triple Crown races, the Kentucky Derby and Belmont winning
times have run flat for 50 years except possibly for a second or so on
the average, with Secretariat's record times of 1973 still standing in
both races. The Preakness winners show a slightly larger improvement, one
percent or so from the 1:57s to the 1:55s, and some say that a timing malfunction
cost Secretariat that course record as well (his unofficial "record" time
equaled twice since). In contrast, the men's world record for the mile has
dropped from Bannister's historic 3:59.04 in 1954 to El Guerrouj's 3:43.13
in 1999, an improvement of 6.8% that continues to occur almost linearly over
the years.
Possible reasons for the difference: 1) Milers have benefited from
equipment changes, notably better shoes and hi-tech synthetic tracks replacing
rutted cinder tracks. 2) Outstanding milers have many years to make their
mark, with four runners Snell, Ryun, Ovett and Coe accounting for nine
of the 19 records of the past 50 years, while a Secretariat or Smarty Jones
gets only one shot as a three-year-old at the Triple Crown. 3) Milers have
benefited from improvements in diet, training, surgery, and - dare we say?
- perhaps drugs designed to foil detection, advantages less available to
horses. 4) Milers have benefited from motivational techniques, mind games,
appearance fees and well-paid pacers for a half-mile or more. You can't
psychoanalyze or prepay $50,000 to a horse. 5) Perhaps most of all, an
apples-and-oranges consideration: milers get many opportunities each year
to run world records against strong competition, while horses have only
the Triple Crown races run at three different distances with no other events
as well-known to the general public. Without a standardized distance, world
records are far less lucrative as an objective.
Do both sports simply lack great performers today? Note that in six
of the last eight years a horse has won both the Derby and the Preakness
but failed at the Belmont, and the mile running record has not dropped in
five years. Once more the case for faster and faster mile times appears
stronger. No change in five years is not unusual; twice since 1954 almost
eight years slipped by with no change. Roger Bannister himself says "Improvement
in the mile record will go on continually. By 2050 I can see the record at
3:30. Someone will always want to beat the record as it stands then." In
contrast to the enormous motivation of a world record as a human goal, how
is a horse to know any greater context that his current winning race? Can
he/she expect any reward beyond a fatter bag of premium oats? It's obvious,
no doubt, that my horse sense fell at the starting gate, but it seems clear
that incentives to clock faster times are much weaker.
Love Those Rankings, but Stick with the Leaders:
differences further down are meaningless!
On April 12 Austan Goolsbee recalled his search for a paper on
Bulgaria's regulatory laws (NYTimes.com):
"The demand for technical working papers on Bulgaria being
what it is, the publication's Amazon sales rank was extremely low (about
2.5 million down on the list). Yet after I bought it, a most amazing thing
happened. My one purchase moved this working paper past almost one million
other books. This happened because once buyers get out of the best sellers
- where the difference in sales can be enormous - almost everything else
is basically tied. The differences in rank are statistically meaningless,
and small blips cause big changes. The same is true of rankings of all sorts
of things. In the New York City Zagat guide, for example, a restaurant that
raised its food rating by three points would pass 50 restaurants if it went
from 25 to 28 (from excellent to extraordinary) but 439 establishments if
it went from 19 to 22."
Consider Richard Florida'a best seller The Rise of the Creative
Class that describes a new social milieu:
"The distinguishing characteristic of the creative class
is that its members engage in work whose function is to 'create meaningful
new forms’. The super-creative core of this new class includes scientists
and engineers, university professors, poets and novelists, artists, entertainers,
actors, designers, and architects, as well as the "thought leadership"
of modem society: nonfiction writers, editors, cultural figures, think-tank
researchers, analysts, and other opinion-makers. Members of this super-creative
core produce new forms or designs that are readily transferable and broadly
useful - such as designing a product that can be widely made, sold, and
used; coming up with a theorem or strategy that can be applied in many cases;
or composing music that can be performed again and again. Beyond this core
group, the creative class also includes 'creative professionals' who work
in a wide range of knowledge-intensive industries such as high-tech sectors,
financial services, the legal and healthcare professions, and business management.
These people engage in creative problem-solving, drawing on complex bodies
of knowledge to solve specific problems."
Florida and others apply data from the US Census and Bureau of Labor
Statistics to what their friends would call massive statistical modeling
efforts to produce creativity rankings of US cities. Use of slightly different
models in the 2002 book and the 2004 paperback, cited as 1999 and 2001
in the table, provides a glimpse of what might be termed inherent city-rank
variability. Each model uses somewhat differing sub-rankings of the percentage
of creative workers in the work force and measures of technology, innovation
and diversity.
Note that 15 cities suffice to pick up the Top Ten for each model,
10 cities are required to pick up the Bottom Five for each since there
is zero overlap, and the 276 cities of 2001 are eight more than in 1999.
Although the top cities remain mostly the same, the exceptions (i.e., San
Diego 3t to 19 and Burlington 37 to 4) show that small changes in model
assumptions and still smaller changes in the data bases will surely yield
sizeable differences in the ranks, the same as with rare books and restaurants.
Clearly, such rankings are highly up-and-down creatures, and small cities
obviously find high rankings almost beyond reach.
What can we conclude about creativity? The estimated number of creative
workers varies from 40% to 17%, and the size of the super-creative core
(not shown in table) varies from 23% (Corvallis, College Station and Champaign
highest) down to 3% or so. Florida avoids faulting many "drudge" cities,
reserving most of his critical comments for Pittsburgh where he's lived since
1987. Though he's worked to foster a more creative image, Pittsburgh suffers
from its decades as a corporate headquarters city, a perception of bias
against young women, minority groups, new immigrants and gays, and an older
populace. His CMU graduates are often eager to leave quickly, seeking exciting,
hip cities with more creative opportunities.
As noted, small changes jolt the ranks. In 2001 Springfield IL numbered
about 42,000 Creative types, 38.5% of its work force, Rank 5, with relatively
small Working and Service class counts. If small job definition shifts dug
up 1,500 more Creatives, Springfield would rank first among all 276 cities
with a 39.9% creative work force. How many of us are poorly classified by
the US agencies? Are all scientists and engineers more creative than all
clever (non-creative?) machinists and food-service workers? Surely not.
| City |
1999 Creativity Index |
2001 Creativity Rank |
Index |
Rank |
| San Francisco |
1057 |
1 |
0.956 |
2 |
| Austin |
1028 |
2 |
0.963 |
1 |
| San Diego |
1015 |
3 tie |
0.865 |
19 |
| Boston |
1015 |
3 tie |
0.934 |
5 |
| Seattle |
1008 |
5 |
0.955 |
3 |
| Raleigh - Durham - Chapel Hill |
996 |
6 |
0.932 |
6 |
| Houston |
980 |
7 |
0.772 |
37 |
| Albuquerque |
965 |
8 |
0.897 |
11 tie |
| Washington - Baltimore |
964 |
9 |
0.897 |
11 tie |
| New York |
962 |
10 |
0.848 |
20 |
| Burlington, VT |
809 |
37 |
0.942 |
4 |
| Portland, OR |
929 |
18 |
0.926 |
7 |
| Madison, WI |
925 |
20 |
0.918 |
8 |
| Boise City, ID |
854 |
30 |
0.914 |
9 |
| Minneapolis - St Paul |
960 |
11 |
0.900 |
10 |
|
|
|
|
|
| Lawton, OK |
107 |
264 |
0.302 |
216 |
| Jacksonville, NC |
105 |
265 |
0.158 |
265 |
| Owensboro, KY |
91 |
266 |
0.315 |
211 |
| Cumberland, MD |
83 |
267 |
0.287 |
222 |
| Enid, OK |
73 |
268 |
0.914 |
257 |
| Youngstown, OH |
253 |
239 |
0.130 |
272 |
| Lima, OH |
222 |
250 |
0.128 |
273 |
| Sumter, SC |
294 |
226 |
0.116 |
274 |
| Joplin, MO |
183 |
254 |
0.095 |
275 |
| Gadsden, AL |
188 |
253 |
0.058 |
276 |
Clinical Trials: Huge Challenges, Long-Term Benefits for All
Cancer therapy is often a two-stage approach, with an initial treatment
given with the hope of inducing remission and a second treatment given "to
prolong the period before relapse and disease progression," with the second
often given only to patients who show remission after the initial treatment.
Protocol 8923 was a double-blind, placebo controlled two-stage
trial run by the Cancer and Leukemia Group B to examine the effects of
infusions of granulocyte-macrophage colony-stimulating factor (GM-CSF)
after initial chemotherapy in 388 elderly patients with acute myelogenous
leukemia (AML). Standard chemotherapy for AML has a myelosuppressive effect,
placing patients at increased risk of death due to infection or bleeding-related
complications. … Later, patients meeting the complete remission criteria
and consenting to further participation in the study were offered a second
randomization to one of two intensification treatments.
79 of the 193 GM-CSF patients (41 %) and 90 of 195 placebo patients (a
larger 46%!) achieved remission and agreed to one of the two intensification
treatments. Of these, 37 GM-CSF and 45 placebo patients received intensification
therapy I with the others (42 and 45) receiving therapy II. Among the resulting
2 x 2 = 4 treatment groups little evidence was found of differences in
average survival times, and though not statistically significant the results
suggest that GM-CSF infusion leads to decreased survival time. A1l 388
patients have been tracked for at least 6.9 years; with many ailing at the
outset 356 of them have died. Will these treatments be studied further?
The authors don't say. Many factors will bear on the decision.
--- Wabed and Tsiatis, Biometrics 60: 124-133 (March 2004)
Consider two papers published side-by-side in the New England Journal
of Medicine in 1985, dealing with the incidence of cardiovascular diseases
in women taking or not taking post-menopausal hormones.
"One study found that the incidence in women taking hormones
was about twice as high as in the control group; the other found that it
was only half as high. The estimate of relative risk in each paper had rather
narrow confidence limits that excluded the no-effect point. Both papers
seemed to be technically sound. Both had been reviewed and approved for
publication, and they even had one reviewer in common. Neither study assigned
treatments to patients by randomization, but both included substantial and
careful consideration of differences in known risk factors between subjects
and controls."
--- Bailar and Mosteller, ed., Medical Uses of Statistics, 1992,
p. 28.
Either study by itself would have been convincing. Why did they disagree
that much? We do not know, beyond the fact that study-to-study variation
often turns out to be far greater than expected at the outset.
These examples illustrate several challenges in performing clinical trials.
- "Double-blind" means that neither doctor nor patient knows
if the test drug or the placebo is being taken, with "doctor blind" variously
referring to oncologist or pathologist or both. The hope is that blinding
eliminates bias: the lack of hope for placebo patients, the desire to make
the drug look effective or ineffective, in general individual viewpoints
intruding on the decision process.
- If test and placebo patients differ before treatment - age,
blood sugar, disease stage, say - then these covariables may influence the
results. Pairs of patients can be matched on probable covariables, with one
patient assigned randomly to the treatment and the other to the placebo.
- Perhaps the greatest ethical challenge, given increasing
though not quite significant evidence that the treatment is indeed effective,
is to create procedures while continuing the clinical trial for making
the treatment available immediately to the placebo patients and others in
dire need. Alternatively, given strong evidence that the treatment is harmful
or ineffective we may need to consider stopping a trial early, also for
ethical reasons. Major breakthroughs in cancer therapy are relatively rare,
and both the disease and possible adverse events can be severe.
Over 1932-72, 399 poor black sharecroppers in Alabama were denied treatment
for syphilis and deceived by physicians of the U.S. Public Health Service.
As part of the Tuskegee Syphilis Study these men were told that they were
being treated for "bad blood." In fact, government officials went to extreme
lengths to insure that they received no therapy from any source. As reported
by the New York Times on 26 July 1972, the Study was called "the longest
nontherapeutic experiment on human beings in medical history." It continues
to cast a long shadow over the relationship between African Americans and
the biomedical professions; it is argued that the Study is a leading cause
for their low participation in clinical trials, organ donation efforts,
and routine preventive care. President Clinton recalled the huge injustice
done to the study participants and formally apologized to survivors in 1997.
Abstract of the Syphilis Study Legacy Committee, Final Report of May
20, 1996
Hot Hands: Fact or Fiction?
Various analyses over the years find no evidence
of the so-called Hot Hand by baseball hitters, basketball players,
or most anyone else on whom performance statistics can be logged. A hitter
with a batting average of around .333, for example, can be expected to kick
off a string of five consecutive hits or more once every 243 at-bats (since
(1/3)^5 = 1/243) on the average, or roughly twice a season. Hence, hitting
streaks that occur that often do not convey evidence of a streaky or hot
bat; they are simply what ought to be expected from someone with a .333
batting average. Similar "no evidence" findings have been published for
basketball players.
The article Bowlers' Hot Hands by Dorsey-Palmateer and Smith in
the February American Statistician presents strong evidence, however,
from Professional Bowlers Association (PBA) tournaments in 2002-03. From
a specified collection of match-play rounds using within-game bowling only,
43 different bowlers had at least 10 runs of four consecutive strikes and
at least 10 runs of four consecutive non-strikes, and clearly many more
PBA bowlers had at least 10 runs of three, two and one consecutive strikes
and non-strikes. The data are as follows:
| Number of consecutive strikes/non-strikes |
1 |
2 |
3 |
4 |
| Number of bowlers |
134 |
111 |
81 |
43 |
| Strike proportion immediately after strikes |
.571 |
.582 |
.593 |
.612 |
| Strike proportion after non-strikes |
.560 |
.546 |
.533 |
.492 |
| Difference |
.011 |
.036 |
.060 |
.120 |
Clearly, the longer the run of strikes the higher the ensuing strike
proportion - .571 to .612 - and the longer the non-strike run the lower
the ensuing strike proportion. After four consecutive strikes the strike
proportion is 24% higher than after four consecutive non-strikes - .612
vs. .492. Using various statistical tests the authors confirm the high statistical
significance of these findings.
Do you enjoy a hot hand in some aspect of your life, say, missing an
irritating red light or waiting in recurring lines less than two minutes?
The authors note that bowling data are almost completely clean of complicating
effects: exactly the same release motion repeatedly on alternating lanes,
with no defensive hand altering your shots, no breaks of 7-8 minutes between
shots versus only a few seconds, no differences in the fastball or slider.
Your life is full, no doubt, of complicating effects! Many of us dare not
trust our own data gathering, say running a just-now-red light to keep the
green streak going. Perhaps you need an appointment with your friendly statistician.
|