Skip to main content

National Center for Ecological Analysis and Synthesis

1. Introduction

A grocery store in Santa Barbara has its own line in milk. The label on the milk container provides some 'nutrition facts', one of which is:

'This milk does not contain the growth hormone rBST',

The first time I read the label, I wondered what rBST could be. Should I check every bottle of milk I buy? The cap on the milk bottle says something different:

'Our Farmers Guarantee
MILK from cows not treated with rbst

No significant difference has been shown between milk derived from cows treated with artificial hormones and those not treated with artificial hormones'.

rBST (or rBGH, as it is sometimes labeled) is a genetically engineered growth hormone that can increase milk production from cows. My search on the web revealed that the US FDA concluded in 1985 that milk from rBST-supplemented cows is safe. In November, 1993, the FDA approved an rBST product to be used for milk production in dairy cows. It also appears that trade in products related to rBST has been a contentious issue between the United States and Europe.

Interestingly, I found a reasonably compelling argument that contradicts some of the small print on the milk cap. Samuel S. Epstein, M.D., Professor of Environmental Medicine at the University of Illinois School of Public Health, issued a press release on January 18, 2000, in Vienna saying:

'GM milk, produced by injecting cows with the hormone rBST, is qualitatively and quantitatively different from natural milk. These differences include: contamination of milk by the GM hormone rBST; contamination by pus and antibiotics resulting from the high incidence of mastitis in rBST injected cows; contamination with illegal antibiotics and drugs used to treat mastitis and other rBST induced disease; increased concentration of the thyroid hormone enzyme thyroxin-5'-monodeiodinase; increased concentration of long chain and decreased concentration of short chain fatty acids; reduction in casein levels; and major excess levels of the naturally-occurring Insulin-like Growth Factor, IGF-1, including its highly potent variant.'

Epstein is, according to his web page, an internationally recognized authority on the mechanisms of carcinogenesis, the causes and prevention of cancer, and the toxic and carcinogenic effects of environmental pollutants in air, water, soil and the workplace, and of ingredients and contaminants in consumer products-- food, cosmetics and toiletries and household products. He has published some 260 peer reviewed scientific articles, and has authored or co-authored ten books. But of course, his is just one opinion.

My first reading of the milk label suggested to me that someone in authority (it doesn't say who) thinks rBST milk is okay. There are a few things the label doesn't say. It doesn't say there are no differences between the cows, just between their milk. And it doesn't even say there are no differences, but that there are no 'significant' differences. And it doesn't even say that there are no 'significant' differences, but that no significant differences 'have been shown'. All the authority is willing to admit is that they haven't seen a 'significant' difference.

I have absolutely no idea whether rBST milk is harmful to people or beasts. The day I write this is the first time I've heard about it. And I have nothing against genetically modified organisms, in principle. But a shred of common sense would lead anyone (other than a scientist) to ask, well, how hard have you looked for a difference? And what do you mean by 'significant'?

Effectively, science doesn't have answers to these questions. Technically (theoretically) it does, but in fact, very few working scientists have the faintest idea of how to come up with the answers.

This failing may be best characterized as a disease. It is present to varying extents in the scientific community, is spread by textbooks and editorial conventions, has readily identifiable symptoms, and results in costly and debilitating outcomes.

 

2. Scientists are human

In the 1970's, Kahneman and Tversky began to look at the reasons why people decide to do things. They found that behind the illusion of rational thought lay a psychological pathology of unexpected proportions. People, it turns out, are barely rational, even in life and death circumstances. The observations of Kahneman and Tversky led a cohort of cognitive psychologists to explore the vagaries of decision-making over the next few decades (Zeckhauser and Viscusi 1990, Fischhoff 1995, Morgan et al. 1996). Their work has produced some wonderful generalizations. Some are funny. Others are plain depressing.

The mistakes that people make can be summarized under headings that make a kind of pathology that is identifiable and predictable (and perhaps even treatable). Not everyone reacts like this, and not in all circumstances. But most people do, most of the time. Three of the primary symptoms are:

  • Insensitivity to sample size: people ascribe inferences to their samples that are only possible from much larger samples.
  • Judgment bias: people tend to be optimistic about their ability to predict, and to make 'predictions' that are, in fact, the product of hindsight.
  • Anchoring: people tend to stick close to the number they first thought of, or that someone else said, for fear of seeming unconvincing or capricious.

Kahneman, Tversky and their colleagues found that a great deal of the apparent arbitrariness of decisions that people make could be explained by the way in which the circumstances surrounding the decision were set. Kammen and Hassenzahl (1999) describe a beautiful contradiction that illustrates the importance of context. Two artificial substances were in our food in the 1980s, Saccharin (used to sweeten the taste of things such as diet soda) and Alar (a pesticide used on apple and pear crops).

During the 1970's, the US FDA banned Saccharin because it was a potential human carcinogen. Congress passed specific legislation to make Saccharin legal after considerable public outcry.

In contrast, in the 1980's, the EPA concluded the amount of Alar reaching consumers was too small to warrant banning it. A public interest group released a report that children are particularly susceptible because they weigh less, eat a lot of apples and apple juice, and are more susceptible to toxins than adults. The public outcry that followed the release of the report convinced Uniroyal, the maker of Alar, to withdraw it from the market.

Take the checklist of evidence for Saccharin first:

  • In high doses, it causes cancer in animals such as rats.
  • The US FDA concluded that studies indicated cancer-forming potential in humans, giving a 'very small' (10-5 to 10-7) additional lifetime risk of dying from cancer at 'normal' (average) consumption levels.
  • There is an additional 4.6x 10-4 risk for someone with high exposure, such as someone who drinks a can of diet soda each day.

Next, look at the checklist of evidence for Alar:

  • In high doses, it causes cancer in animals such as rats.
  • The US EPA concluded that studies indicated cancer-forming potential in humans, giving a 'very small' (10-5 to 10-7) additional lifetime risk of dying from cancer at 'normal' (average) consumption levels.
  • There is an additional 3 x 10-4 risk for someone with high exposure, such as someone who consumes a lot of apple products (such as children).

The two chemicals have nearly the same potential to form cancers and appear in the general population with the same kinds of exposures. Yet the extrapolations from high doses to low doses in saccharin were ridiculed, whereas the same extrapolations for Alar were accepted. One chemical was banned and people demanded its return. The other was deemed safe and people demanded that it be withdrawn.

Why?

Cognitive psychologists and risk analysts like Adams (1995) and Morgan et al. (1996) take delight in explaining these apparent contradictions. People were used to Saccharin and could take on the risk knowingly (you don't have to buy diet soda). In contrast, people could only avoid Alar by avoiding apple products altogether. In addition, saccharin has benefits, such as reducing problems for diabetics, and reducing the risk of heart disease in overweight people. The chemical didn't improve apples, except perhaps by making them cheaper. Some growers made claims of Alar-free apples that turned out to be false, so that such claims were not trusted afterwards. In addition, the most susceptible group were children, for whom the risks were entirely involuntary (for more on this topic, see Kammen and Hassenzahl 1999).

Scientists, like other people, are very poor judges of risky and uncertain circumstances. Yet scientists feel that they are immune to the failings that plague ordinary humans. They've been trained to believe they are objective, usually without much training in how to achieve objectivity.

Sympotoms of the disease

The pathology of the scientific method surfaces in scientists in some peculiar ways.

Technical myopia

Take the case of Saccharin and Alar. The decision by the public to accept one and not the other is contradictory, only if you are myopic enough to look just at the technical risks. Very little else was the same: the context, the framing, the degree of control, the prospects of benefits, and the impact groups were completely different.

Most scientists suffer from technical myopia. They are surprised and bemused when people don't do what they tell them to do. Having examined the technical risks in great deal, they say: if you accept A, you must accept B. If you don't, they call you irrational. And their solution to problems like these is to argue that if people just understood the technical details, they would then be rational.

This train of thought turns ugly when scientists take it upon themselves to provide people with not just the information, but also the decisions. They rationalize that people should be rational (like them), and that, to save time and trouble, they'll do the thinking and decide what's safe and what isn't. They create panels of experts (comprised largely of people like themselves), and the priesthood of scientists then decides the questions, collects the data, interprets them, and makes the decisions. The rest of us don't have to worry about a thing. Great, huh? In my view, what happens is that we have imposed on us the values of a bunch of (mostly) middle-aged, middle-class men and women who stand to gain a great deal more than the rest of us from the acceptance or avoidance of the risk.

Professional paranoia

Scientists, like most people, don't like to be criticized. This attitude exists among scientists despite the view from the philosophy of science that suggests the person who criticizes you most is helping you most (in the spirit of Karl Popper, a strong test is better than a weak test). Scientists take particular exception to any general criticisms of their discipline. They become thin skinned and defensive. They take refuge behind the ramparts of rational thought, arguing that if those who criticize them only understood the technical detail, they would agree. Scientists often fail to see that the benefits (perceived or real) to the broader population of some branch of scientific research or technical progress may not be the same as the benefits that may accrue to scientists involved in that research. The symptom is expressed perhaps most forcefully in arguments between scientists themselves. They often take adversarial positions, ignore collateral data, claim unfounded generalities and deny uncertainty. Success in such debates is as much a product of influence networks among people as it is a product of reason.

Optimism

Scientists are heroically optimistic about their ability to predict. Plous (1993) and Fischhoff (1995) document numerous examples of circumstances in which experts are wildly and unjustifiably confident about their ability to guess parameters, even within their field of technical expertise. And it is very difficult to distinguish between a reliable expert and a crank. Krinitzsky (1993), in a study on the use of expert opinion in assessments of earthquake risks, said experts may be '…fee-hungry knaves, time servers, dodderers in their dotage…Yet, these and all sorts of other characters can pass inspections, especially when their most serious deficiencies are submerged in tepid douches of banality'.

Blind monitoring

The purpose of environmental monitoring systems is to protect the environment, society and the economy. They are supposed to tell us (i) there is a serious problem when one exists (thus avoiding over-confidence, called "false negatives") and (ii) there is not a serious problem when there isn't one (thus avoiding false alarms, called "false positives"). The first is crucial for detecting serious damage to environmental and social values, the second for ensuring that the economy is not damaged by unnecessary environmental regulations.

Unfortunately, standard procedures implicitly assume that, if no problem is observed, none exists; that is, they ignore the possibility of over-confidence. But detecting important environmental damage against a background of natural variation, measurement error, and poorly understood biological processes is often difficult. Furthermore, standard monitoring procedures do not attempt to determine whether the intensity of monitoring is excessive, sometimes laying an unnecessary burden upon a proponent. Environmental assessment involves answering some key questions: How likely is it that a survey will detect a threatened or invasive species? Will existing monitoring data reliably detect important trends in water quality, given the observational uncertainty and significant environmental variation? Can the monitoring protocols be made less stringent without running an excessive risk of undetected serious damage to the environment? But they are nevertheless simply not directly addressed in most current systems. Blindness to overconfidence, together with a psychological propensity for overconfidence, results in a system that is designed to fail.

Denial of linguistic uncertainty

A great deal of uncertainty exists because scientists communicate with words and language is inexact. It is problematic because scientists have no training in how to deal with it, and they do not acknowledge that it exists. Regan et al. (2002) outlined a taxonomy of uncertainty that distinguishes between epistemic uncertainty, in which there is some determinate fact, but we are uncertain about its status, and linguistic uncertainty, in which there is no specific fact. Linguistic uncertainty may be decomposed into ambiguity, vagueness (where terms allow borderline cases), indeterminacy (in which theory does not sufficiently define a term), and context dependence. Yet scientific training does little to prepare scientists to deal with these issues. Language-based scientific methods, commonplace in applied ecology, typically assume linguistic uncertainties are trivial or non-existent. The applied arms of ecology, particularly conservation biology, often work in isolation from methods that have arisen and been applied successfully in companion disciplines such as engineering and psychology for decades (e.g., Dubois and Prade 1986, Klir and Wierman 1998).

 

3. Should we ban null-hypothesis testing?

Karl Popper led a revolution in thinking that lead to current acceptance of the notion of hypothesis testing in science. There are other philosophies such as Thomas Kuhn's idea that science moves by adopting paradigms that are over-turned periodically, so that it lurches from one world-view to another. Popper's view dominates current scientific thinking. Students are exhorted to find a hypothesis to test. A thesis that fails to present and test a stark hypothesis is likely to fail. Publishable papers hinge on the result of a test.

A nasty linguistic ambiguity slipped into the picture between about 1930 and 1960, when R. A. Fisher invented much of the mathematical machinery that underlies modern statistical methods. Fisher got into a fight with a couple of other statisticians, Neyman and Pearson, about how to do a statistical test. Unfortunately, these statisticians called it a null-hypothesis test (Hunt 1997, Johnson 1999). Science has adopted the machinery of Fisherian statistics over the last five decades. In doing so, it has equated R. A. Fisher's null-hypothesis test with Karl Popper's hypothesis testing.

The curious phenomenon of one-sided logic

Scientists are trained to deal with measurement errors, observational bias, and natural variation. P-values relate exclusively to Type I errors, but most scientists believe intuitively that they say something about Type II errors as well (Johnson 1999). Scientists will defend the 'fact' that there is no difference between samples, when there must be a difference. The only sensible question is: how big an effect is there? Yet scientific conventions and formal training are sufficiently powerful that the most gifted and insightful scientists trip over their own feet when confronted with this problem. Arguments about effect that are the consequence of variable sample size and statistical power are commonplace. Many papers that use statistical tests misinterpret a lack of a statistically significant effect to be evidence that there is no 'real' effect. This one-eyed view of evidence is only moderately damaging to the progress of science, but it becomes especially important in environmental science where the costs of Type II errors are counted in damage to the environment (Mapstone 1995).

We have created a system in which human activities are considered to be benign, until the time comes that we find out otherwise. There are many examples of this propensity. Species are considered to be extant until we are reasonably sure they are extinct. The first time I mentioned the existence of No Observed Adverse Effect Levels to some colleagues in a Philosophy of Science department, they laughed out loud. The bias that results from one-sided inference emerges particularly strongly in monitoring programs (see 'Blind Monitoring' above).

One of the saddest expressions of the interaction between the psychological pathology and one-sided logic is that new research, and new minds, are stultified by the need to 'test' a hypothesis. And they'd better find something that is falsified, or they'll spend another 2 or 3 years looking. The consequence is that people doing Masters and PhDs by research choose questions that generate guaranteed results, that are risk free, and for which they already know the answer. Where's the excitement, the thrill of discovery, in that? The best remedy may be to ban it altogether.

 

4. The remedies: common sense, caution, and pictures

People are bad at interpreting and deciding what's best, when faced with uncertain information. A blanket of ambiguous and vague language overlies this disability. In science, a curious, one-sided logic has been invented and added to the mix. Taken together, this cocktail leads to irrational interpretations of evidence.

There are many symptoms that reflect a common cause for the pathology. It can be treated. But the first step in treatment is to admit the problem. We can learn from other disciplines that are further down the road than ecology, such as psychology and medicine. Strategies in editorial policy and student training may be adopted to alleviate the problem.

The remedy is to overconfidence is to design monitoring and auditing protocols that report the probability that they will detect important changes if they exist for identified significant aspects of management. Thus a monitoring system should demonstrate that the system would be reasonably certain of detecting unacceptable impacts (for a defined set of indicators, at an agreed level of reliability).

To deal with unacknowledged linguistic uncertainty, we need a new suite of methods that lead us to better decisions. Many take explicit account of non-statistical uncertainties and are structured to deal openly with Type I and Type II errors.

The precautionary principle is an appeal to common sense, emerging from the broader population, and meant to be taken to heart by scientists. It is an informal suggestion to be aware of Type II errors and to weigh them against the costs and benefits of Type 1 errors. One of the best protections against the irrational interpretation of evidence engendered by the system we have inherited is to use figures, rather than numbers, to represent data, in the forms of scatterplots, histograms and confidence intervals, and to interpret the images rather than the numbers.

It may be possible to attempt to treat the symptoms one by one. But the underlying cause lies in the logical tools that we rely on to deal with evidence and to make inferences. They are the wrong tools for the tasks confronting applied ecology. We need new ones, and lots of them.

Acknowledgments

All these opinions are my own, and the people I thank do not necessarily agree with all (or any) of the dialogue. Nevertheless, for their generosity, ideas and comments, I thank Neil Thomason, Fiona Fidler, Rob Buttrose, Geoff Cumming, Jim Reichman, Sandy Andelman and Helen Regan.

Citation format
Burgman, Mark. 2002. Remedies for the Scientific Disease. EcoEssay Series Number 4. National Center for Ecological Analysis and Synthesis. Santa Barbara, CA.

References

Adams, J. 1995. Risk. UCL Press, London.

Dubois, D. and Prade, H. 1986. A set-theoretic view on belief functions: logical operations and approximations by fuzzy sets. International Journal of General Systems 12, 193-226.

Epstein, S. S. 2001. Got (Genetically Engineered) Milk! The Monsanto rBGH/BST Milk Wars Handbook SevenStories Press. www.sevenstories.com/catalog/index.cfm.

Fischhoff, B. 1995. Risk perception and communication unplugged: twenty years of progress. Risk Analysis 15: 137-145.

Hunt, M. 1997. How science takes stock: the story of meta-analysis. Russell Sage Foundation, New York.

Johnson, D. H. 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63, 763-772.

Kammen, D. M. and Hassenzahl, D. M. 1999. Should we risk it? Exploring environmental, health, and technological problem solving. Princeton University Press, Princeton.

Klir, G. and Wierman, M. J. 1998. Uncertainty-based information: elements of generalized information theory. Physica-Verlag, Heidelberg.

Krinitzsky, E. L. 1993. Earthquake probability in engineering - Part 1: the use and misuse of expert opinion. Engineering Geology 33, 257-288.

Mapstone, B. D. 1995. Scaleable decision rules for environmental impact studies: Effect size, Type I and Type II Errors. Ecological Applications 5, 401-410.

Morgan, M.G., Fischhoff, B., Lave, L. and Fischbeck, P. 1996. A proposal for ranking risk within Federal agencies. In: Davies, J.C. (ed) Comparing environmental risks. pp. 111-147. Resources for the Future. Washington. DC.

Popper, K. R. 1959. The logic of scientific discovery. Basic Books, New York.

Plous, S. 1993. The psychology of judgment and decision making. McGraw-Hill, New York.

Regan, H. M., Colyvan, M. and Burgman, M. A. 2002. A taxonomy and treatment of uncertainty for ecology and conservation biology. Ecological Applications 12, 618-628.

Vose, D. 1996. Quantitative risk analysis: a guide to Monte Carlo simulation modelling. Wiley, Chichester.

Zeckhauser, R.J. and Viscusi, W.K. 1990. Risk within reason. Science. 248: 559-564.