Win With Statistics

Keith Kussmaul, Ph.D.
Statistical Consultant



email:
keith@winwithstatistics.com
web:
www.winwithstatistics.com
phone:
512-388-7803
address:
9 Oakmoore Drive
Round Rock, TX 78664

photo of Keith Kussmaul

Overview

Win with Statistics seeks to foster an intelligent and informed citizenry, by the important means of encouraging and promoting the widest possible application of statistical theory and methods to solve problems and meet needs.

I am a self-employed statistical consultant. I was associated with Westinghouse for 27 years as a statistician devoted to research, development and manufacturing efforts. I received my Ph.D. from North Carolina State, and taught eight years in college. I am a Senior Member of the American Society for Quality.

The essays below are from my quarterly newsletter. Please call or email me today if you have any comments and/or exciting ideas for future topics.

--- Keith Kussmaul

Articles

Running Rankings Clinical Trials Hot Hands

We're Not Running Any Faster!

Right from the horses' mouths they tell us "We're not running any faster!" My friend Marvin points this out, and asks Why Not? Examining the Triple Crown races, the Kentucky Derby and Belmont winning times have run flat for 50 years except possibly for a second or so on the average, with Secretariat's record times of 1973 still standing in both races. The Preakness winners show a slightly larger improvement, one percent or so from the 1:57s to the 1:55s, and some say that a timing malfunction cost Secretariat that course record as well (his unofficial "record" time equaled twice since). In contrast, the men's world record for the mile has dropped from Bannister's historic 3:59.04 in 1954 to El Guerrouj's 3:43.13 in 1999, an improvement of 6.8% that continues to occur almost linearly over the years.

Possible reasons for the difference: 1) Milers have benefited from equipment changes, notably better shoes and hi-tech synthetic tracks replacing rutted cinder tracks. 2) Outstanding milers have many years to make their mark, with four runners Snell, Ryun, Ovett and Coe accounting for nine of the 19 records of the past 50 years, while a Secretariat or Smarty Jones gets only one shot as a three-year-old at the Triple Crown. 3) Milers have benefited from improvements in diet, training, surgery, and - dare we say? - perhaps drugs designed to foil detection, advantages less available to horses. 4) Milers have benefited from motivational techniques, mind games, appearance fees and well-paid pacers for a half-mile or more. You can't psychoanalyze or prepay $50,000 to a horse. 5) Perhaps most of all, an apples-and-oranges consideration: milers get many opportunities each year to run world records against strong competition, while horses have only the Triple Crown races run at three different distances with no other events as well-known to the general public. Without a standardized distance, world records are far less lucrative as an objective.

Do both sports simply lack great performers today? Note that in six of the last eight years a horse has won both the Derby and the Preakness but failed at the Belmont, and the mile running record has not dropped in five years. Once more the case for faster and faster mile times appears stronger. No change in five years is not unusual; twice since 1954 almost eight years slipped by with no change. Roger Bannister himself says "Improvement in the mile record will go on continually. By 2050 I can see the record at 3:30. Someone will always want to beat the record as it stands then." In contrast to the enormous motivation of a world record as a human goal, how is a horse to know any greater context that his current winning race? Can he/she expect any reward beyond a fatter bag of premium oats? It's obvious, no doubt, that my horse sense fell at the starting gate, but it seems clear that incentives to clock faster times are much weaker.

Love Those Rankings, but Stick with the Leaders:
differences further down are meaningless!

On April 12 Austan Goolsbee recalled his search for a paper on Bulgaria's regulatory laws (NYTimes.com):
"The demand for technical working papers on Bulgaria being what it is, the publication's Amazon sales rank was extremely low (about 2.5 million down on the list). Yet after I bought it, a most amazing thing happened. My one purchase moved this working paper past almost one million other books. This happened because once buyers get out of the best sellers - where the difference in sales can be enormous - almost everything else is basically tied. The differences in rank are statistically meaningless, and small blips cause big changes. The same is true of rankings of all sorts of things. In the New York City Zagat guide, for example, a restaurant that raised its food rating by three points would pass 50 restaurants if it went from 25 to 28 (from excellent to extraordinary) but 439 establishments if it went from 19 to 22."
Consider Richard Florida'a best seller The Rise of the Creative Class that describes a new social milieu:
"The distinguishing characteristic of the creative class is that its members engage in work whose function is to 'create meaningful new forms’. The super-creative core of this new class includes scientists and engineers, university professors, poets and novelists, artists, entertainers, actors, designers, and architects, as well as the "thought leadership" of modem society: nonfiction writers, editors, cultural figures, think-tank researchers, analysts, and other opinion-makers. Members of this super-creative core produce new forms or designs that are readily transferable and broadly useful - such as designing a product that can be widely made, sold, and used; coming up with a theorem or strategy that can be applied in many cases; or composing music that can be performed again and again. Beyond this core group, the creative class also includes 'creative professionals' who work in a wide range of knowledge-intensive industries such as high-tech sectors, financial services, the legal and healthcare professions, and business management. These people engage in creative problem-solving, drawing on complex bodies of knowledge to solve specific problems."
Florida and others apply data from the US Census and Bureau of Labor Statistics to what their friends would call massive statistical modeling efforts to produce creativity rankings of US cities. Use of slightly different models in the 2002 book and the 2004 paperback, cited as 1999 and 2001 in the table, provides a glimpse of what might be termed inherent city-rank variability. Each model uses somewhat differing sub-rankings of the percentage of creative workers in the work force and measures of technology, innovation and diversity.

Note that 15 cities suffice to pick up the Top Ten for each model, 10 cities are required to pick up the Bottom Five for each since there is zero overlap, and the 276 cities of 2001 are eight more than in 1999. Although the top cities remain mostly the same, the exceptions (i.e., San Diego 3t to 19 and Burlington 37 to 4) show that small changes in model assumptions and still smaller changes in the data bases will surely yield sizeable differences in the ranks, the same as with rare books and restaurants. Clearly, such rankings are highly up-and-down creatures, and small cities obviously find high rankings almost beyond reach.

What can we conclude about creativity? The estimated number of creative workers varies from 40% to 17%, and the size of the super-creative core (not shown in table) varies from 23% (Corvallis, College Station and Champaign highest) down to 3% or so. Florida avoids faulting many "drudge" cities, reserving most of his critical comments for Pittsburgh where he's lived since 1987. Though he's worked to foster a more creative image, Pittsburgh suffers from its decades as a corporate headquarters city, a perception of bias against young women, minority groups, new immigrants and gays, and an older populace. His CMU graduates are often eager to leave quickly, seeking exciting, hip cities with more creative opportunities.

As noted, small changes jolt the ranks. In 2001 Springfield IL numbered about 42,000 Creative types, 38.5% of its work force, Rank 5, with relatively small Working and Service class counts. If small job definition shifts dug up 1,500 more Creatives, Springfield would rank first among all 276 cities with a 39.9% creative work force. How many of us are poorly classified by the US agencies? Are all scientists and engineers more creative than all clever (non-creative?) machinists and food-service workers? Surely not.

City 1999 Creativity Index 2001 Creativity Rank Index Rank
San Francisco 1057 1 0.956 2
Austin 1028 2 0.963 1
San Diego 1015 3 tie 0.865 19
Boston 1015 3 tie 0.934 5
Seattle 1008 5 0.955 3
Raleigh - Durham - Chapel Hill 996 6 0.932 6
Houston 980 7 0.772 37
Albuquerque 965 8 0.897 11 tie
Washington - Baltimore 964 9 0.897 11 tie
New York 962 10 0.848 20
Burlington, VT 809 37 0.942 4
Portland, OR 929 18 0.926 7
Madison, WI 925 20 0.918 8
Boise City, ID 854 30 0.914 9
Minneapolis - St Paul 960 11 0.900 10





Lawton, OK 107 264 0.302 216
Jacksonville, NC 105 265 0.158 265
Owensboro, KY 91 266 0.315 211
Cumberland, MD 83 267 0.287 222
Enid, OK 73 268 0.914 257
Youngstown, OH 253 239 0.130 272
Lima, OH 222 250 0.128 273
Sumter, SC 294 226 0.116 274
Joplin, MO 183 254 0.095 275
Gadsden, AL 188 253 0.058 276

Clinical Trials: Huge Challenges, Long-Term Benefits for All

Cancer therapy is often a two-stage approach, with an initial treatment given with the hope of inducing remission and a second treatment given "to prolong the period before relapse and disease progression," with the second often given only to patients who show remission after the initial treatment.
Protocol 8923 was a double-blind, placebo controlled two-stage trial run by the Cancer and Leukemia Group B to examine the effects of infusions of granulocyte-macrophage colony-stimulating factor (GM-CSF) after initial chemotherapy in 388 elderly patients with acute myelogenous leukemia (AML). Standard chemotherapy for AML has a myelosuppressive effect, placing patients at increased risk of death due to infection or bleeding-related complications. … Later, patients meeting the complete remission criteria and consenting to further participation in the study were offered a second randomization to one of two intensification treatments.

79 of the 193 GM-CSF patients (41 %) and 90 of 195 placebo patients (a larger 46%!) achieved remission and agreed to one of the two intensification treatments. Of these, 37 GM-CSF and 45 placebo patients received intensification therapy I with the others (42 and 45) receiving therapy II. Among the resulting 2 x 2 = 4 treatment groups little evidence was found of differences in average survival times, and though not statistically significant the results suggest that GM-CSF infusion leads to decreased survival time. A1l 388 patients have been tracked for at least 6.9 years; with many ailing at the outset 356 of them have died. Will these treatments be studied further? The authors don't say. Many factors will bear on the decision.
--- Wabed and Tsiatis, Biometrics 60: 124-133 (March 2004)
Consider two papers published side-by-side in the New England Journal of Medicine in 1985, dealing with the incidence of cardiovascular diseases in women taking or not taking post-menopausal hormones.
"One study found that the incidence in women taking hormones was about twice as high as in the control group; the other found that it was only half as high. The estimate of relative risk in each paper had rather narrow confidence limits that excluded the no-effect point. Both papers seemed to be technically sound. Both had been reviewed and approved for publication, and they even had one reviewer in common. Neither study assigned treatments to patients by randomization, but both included substantial and careful consideration of differences in known risk factors between subjects and controls."   
--- Bailar and Mosteller, ed., Medical Uses of Statistics, 1992, p. 28.
Either study by itself would have been convincing. Why did they disagree that much? We do not know, beyond the fact that study-to-study variation often turns out to be far greater than expected at the outset.

These examples illustrate several challenges in performing clinical trials.
  1. "Double-blind" means that neither doctor nor patient knows if the test drug or the placebo is being taken, with "doctor blind" variously referring to oncologist or pathologist or both. The hope is that blinding eliminates bias: the lack of hope for placebo patients, the desire to make the drug look effective or ineffective, in general individual viewpoints intruding on the decision process.
  2. If test and placebo patients differ before treatment - age, blood sugar, disease stage, say - then these covariables may influence the results. Pairs of patients can be matched on probable covariables, with one patient assigned randomly to the treatment and the other to the placebo.
  3. Perhaps the greatest ethical challenge, given increasing though not quite significant evidence that the treatment is indeed effective, is to create procedures while continuing the clinical trial for making the treatment available immediately to the placebo patients and others in dire need. Alternatively, given strong evidence that the treatment is harmful or ineffective we may need to consider stopping a trial early, also for ethical reasons. Major breakthroughs in cancer therapy are relatively rare, and both the disease and possible adverse events can be severe.
Over 1932-72, 399 poor black sharecroppers in Alabama were denied treatment for syphilis and deceived by physicians of the U.S. Public Health Service. As part of the Tuskegee Syphilis Study these men were told that they were being treated for "bad blood." In fact, government officials went to extreme lengths to insure that they received no therapy from any source. As reported by the New York Times on 26 July 1972, the Study was called "the longest nontherapeutic experiment on human beings in medical history." It continues to cast a long shadow over the relationship between African Americans and the biomedical professions; it is argued that the Study is a leading cause for their low participation in clinical trials, organ donation efforts, and routine preventive care. President Clinton recalled the huge injustice done to the study participants and formally apologized to survivors in 1997.   
Abstract of the Syphilis Study Legacy Committee, Final Report of May 20, 1996

Hot Hands: Fact or Fiction?

Various analyses over the years find no evidence of the so-called Hot Hand by baseball hitters, basketball players, or most anyone else on whom performance statistics can be logged. A hitter with a batting average of around .333, for example, can be expected to kick off a string of five consecutive hits or more once every 243 at-bats (since (1/3)^5 = 1/243) on the average, or roughly twice a season. Hence, hitting streaks that occur that often do not convey evidence of a streaky or hot bat; they are simply what ought to be expected from someone with a .333 batting average. Similar "no evidence" findings have been published for basketball players.

The article Bowlers' Hot Hands by Dorsey-Palmateer and Smith in the February American Statistician presents strong evidence, however, from Professional Bowlers Association (PBA) tournaments in 2002-03. From a specified collection of match-play rounds using within-game bowling only, 43 different bowlers had at least 10 runs of four consecutive strikes and at least 10 runs of four consecutive non-strikes, and clearly many more PBA bowlers had at least 10 runs of three, two and one consecutive strikes and non-strikes. The data are as follows:

Number of consecutive strikes/non-strikes 1 2 3 4
Number of bowlers 134 111 81 43
Strike proportion immediately after strikes .571 .582 .593 .612
Strike proportion after non-strikes .560 .546 .533 .492
Difference .011 .036 .060 .120

Clearly, the longer the run of strikes the higher the ensuing strike proportion - .571 to .612 - and the longer the non-strike run the lower the ensuing strike proportion. After four consecutive strikes the strike proportion is 24% higher than after four consecutive non-strikes - .612 vs. .492. Using various statistical tests the authors confirm the high statistical significance of these findings.

Do you enjoy a hot hand in some aspect of your life, say, missing an irritating red light or waiting in recurring lines less than two minutes? The authors note that bowling data are almost completely clean of complicating effects: exactly the same release motion repeatedly on alternating lanes, with no defensive hand altering your shots, no breaks of 7-8 minutes between shots versus only a few seconds, no differences in the fastball or slider. Your life is full, no doubt, of complicating effects! Many of us dare not trust our own data gathering, say running a just-now-red light to keep the green streak going. Perhaps you need an appointment with your friendly statistician.