Future, Present, & Past:

~~ Giving itself latitude and leisure to take any premise or inquiry to its furthest associative conclusion.
Critical~~ Ready to apply, to itself and its object, the canons of reason, evidence, style, and ethics, up to their limits.
Traditional~~ At home and at large in the ecosystem of practice and memory that radically nourishes the whole person.

Oυδεὶς άμουσος εἰσίτω

Thursday, January 20, 2011

The decline of science / the science of decline

Bruno Latour is sometimes derided, sometimes praised, for having made a science itself the object of study, and for pointing out the inextricably human politics that go into what gets science's imprimatur. Two recent articles have me thinking about some aspects of these politics.

Depending on who you ask, the publication of a peer-reviewed parapsychological study is either a scandal or a refreshing example of free inquiry. The Journal of Personality and Social Psychology, an academic journal with a good reputation (or it was), has printed a paper by Daryl Bem of Cornell University, a name and an institution with some respectability. The study reports two different experiments that, Bem claims, show there is reason to think that events in the future could impact the human mind. One experiment showing a 3.1% better-than-chance results when participants were asked to predict on which of two screens a picture with erotic content would appear. (The control group's non-erotic pictures produced results that stayed within the margins of chance alone). The other experiment asked volunteers to look at a series of words, then gave them a surprise quiz asking them to type in the words they recalled. After this, the computer randomly selected 24 words from the series and asked the subjects to type them again. The words that subjects re-typed (after the recall test) tended to be the words they had done better at recalling.

Now, eerily, even before I read about the critical reaction to Bem's paper, I somehow just knew, you know?, that the Committee for Skeptical Inquiry would have some thoughts on this. I also knew that the CSI would refer to the experiments of J.B. Rhine in the 1930's. It's eerie.

But perhaps my thoughts on Rhine were triggered by a recent New Yorker article, by Jonah Lehrer, on scientific inexactitude. This article is about the "decline effect," the tendency of a number of well-established experimental results across scientific disciplines to trail off with repeated investigation. That is: very well-designed experiments which seem to show robust correlations tend, on repetition, to yield less and less impressive conclusions. Rather than becoming more and more secure,
all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology.
So scientists attempting to replicate results are coming up short; so what, you might say--this happens all the time in science. Failure to replicate is probably the norm, which keeps one-off flukes or unintentionally engineered results from getting widely accepted:
The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.
But this phenomenon is different: this is the replication of already well-established research, research that had already passed the hurdles of scientific respectability, including, peer review and, well, replication.

Among the many complaints that Daryl Bem's results have occasioned is that there must be some problem with the design of the experiment. One comment on The Last Psychiatrist's post on this subject puts it succinctly:
There is a common problem that peer review is specifically designed to avoid. There are often results that seem strange or unexplainable to newcomers to a field that are actually well-known problems of experimental design (i.e. you're not testing what you think you're testing). This is where the experts come in; they have seen these errors before and can point them out before they propagate.
The problem is that Bem's results are not those of a wet-behind-the-ears grad student. One can still say that he (or the Journal of Personality and Social Psychology) should have asked different experts (The Last Psychiatrist thinks it should've been physicists; a lot of commenters have suggested statisticians). See too NPR's Robert Krulwich's musings on this.

The scientific process in a nutshell: you notice a phenomenon that you want to account for. You frame a hypothesis. You construct an artificial circumstance in which the only variable is the mechanism of your hypothesis. If your phenomenon is unchanged when your mechanism changes, and you have rigorously screened out all other possible changes, your hypothesis is disproven. If, on the other hand, your phenomenon changes as you alter your chosen mechanism and nothing else, you may consider your hypothesis validated.

This little synopsis will be modified and stretched and clipped and spun by different philosophers of science, but in essence this is the scientific method, a wonder of parsimony, elegance, and indifference.

Of course there is a snag: the little word "only". How possible is it to alter only one circumstance? This is at least part of what the commenter meant by "you're not testing what you think you're testing." And now it seems to turn out that all sorts of random effects might squeeze into an experiment be it never-so-hermetically-sealed. This is at least one possible reading of the experiment, mentioned in the New Yorker article, which reproduced as minutely as possible the circumstances of a test of the effects of cocaine on mice. Same cocaine. Same dose. Same breed and age of mice. Same time in captivity, same dealer. Same cages. Same bedding material. Same etc., etc., etc. The only difference was location: In Portland and Albany the coked-up mice moved six or seven hundred centimeters more than usual; in Edmonton, Alberta, they moved over five thousand centimeters more. But different tests sent the stats of different labs' mice into outlier region. In other words, it might just be noise, but noise you can't screen out.

Or then again, maybe reality just wants to play tricks. Maybe it adjusts to your findings, in a kind of reversal of Rupert Sheldrake's morphological fields, so that rather than spreading, a breakthrough insight gets canceled out. Or maybe, as per Quentin Meillassoux's hyperchaos, whereby the laws of nature could change at any moment, the laws of nature are in fact changing at every moment. Or maybe what you can't screen out is fundamentally relevant, not noise at all, but either something you can't correct for, or something you'd never think to correct for. Maybe, as Heraclitus said, "Nature loves to hide."

The "decline effect" has been getting attention from Jonathan Schooler, who was frustrated by the difficulty he was having at replicating his own results, results which had made him famous in the world of cognitive psychology in 1990, concerning what he called "verbal overshadowing," or the notion that having described faces in words actually makes faces harder rather than easier to visually recognize. Schooler's initial results were striking: subjects who had watched a video of a bank robbery and then written a description of the robber identified the robber from photos later with an accuracy of about 38%, as opposed to 64% accuracy in those who had not made this written description. This is a significant result, and (assuming the experiment were well-designed in the first place), ought to be replicable. But Schooler himself found his results dwindling; the effect would be there, but less starkly. It dwindled by 30%, then another 30%.

A profoundly troubled Schooler looked into the work of a predecessor: the aforementioned J.B. Rhine, whose investigations into E.S.P. in the 1930's found one test subject who was astoundingly good at guessing (or "seeing", depending on what you believe) the faces of Zener cards. At least, he was good for a while; whereas most of Rhine's subjects were able to guess rightly at about the 20% chance-rate (there are five cards), for a while Rhine's star subject, Adam Linzmayer, would sometimes guess at a shocking near-50% rate. In fact, initially, Linzmayer guessed two different nine-card runs at 100%, and for a very long while his record remained in the upper 30's. Critics like to pooh-pooh Rhine's results with the claim that his experiments were sloppy (and some were), but what is really interesting is the fact that Linzmayer's high results did exactly what other results do, results that no one has dreamed of accusing of being fraudulent: they declined over time. Eventually, Rhine postulated that Linzmayer was bored or distracted; in any case, something was interfering.

in 2004, Schooler designed an experiment "testing" for precognition, but his real quarry was the decline effect. His experiment is structurally very like Bem's. Schooler asked test subjects to identify visual images flashed momentarily before them. The images were shown very quickly and usually did not register consciously, so subjects could not often give a description, but sometimes they could. Half of the images were then randomly chosen to be shown again. The question Schooler asked was: would the images that chanced to be seen twice be more likely to have been consciously seen the first time around? Could later exposure have retroactively "influenced" the initial successes?

The difference between Schholer's and Bem's experiments is not in the design, but in the aim. Schooler "knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect."
“At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”—a standard statistical measure—“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhine’s,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”
Bem, according to the New York Times, has received hundreds of requests for the materials to replicate his study. Since the materials included a good stack of erotic pictures, we must exercise some charity in the surmise as to the motives of researchers. Now here is my prediction: Bem's results will decline just as Schooler's did, and this will tend to validate critics' dismissal of his initial study; they will not ask themselves about the initial findings, just as none of them asked themselves about Schooler's. And we will still not know why the results flatline.
If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe?


  1. skholiast:

    This is a beautifully composed post, as all of the posts I have read of yours. Well done.

    Your invocation of Meillassoux makes me wonder, and, I'm just stumbling around in the dark here, bumping into statues, but I wonder if this decline effect might not have something vaguely to do with Cantor. At its heart, might the very problem with the scientific method itself be the inherent impossibility of accurately measuring the probable and improbable at all? This is the most disturbing aspect of Meillassoux's use of Cantor, for me, but it's also a kind of giddy thought to contemplate, akin to Kandinsky's experience after the decomposition of the atom. It would be a strange world where both the sciences and philosophy would relinquish, not reality, but predictability and probability of that reality in any really accurate way -- a scary, but thrilling, thought, all at once. Though, Meillassoux's thought applies to the universe as a whole, but I still think it might work, as we may not have any accurate idea just what *this* universe is capable of, itself.

    In any case, great food for thought (as well as all of your posts)!

  2. Interesting post. One awaits the immanent arrival of a purveyor of luncheon meat.

    Cher Maitre Bergson's ideas in Matter and Memory provide an ontological underpinning for fugitive psi phenomena. His influence on Meillassoux and Deleuze is well known. Essentially he is trying to steer between the Scylla and the Charybdis of Idealism and Realism as traditionally understood. If in fact our normal conceptual schema involves the concepts of time and space how are we to devise experiments or understand phenomena which flout that Kantian fact. My own slight personal experiences of pre-cognition have been associated with liminal states in which the power of the waking schema is abated.

    You may recall that Flew in an early book A new approach to Psychical Research was of the opinion that unless we accept that there is a world wide conspiracy to fool the public there is an x-factor. (cue tuneless whistle) He started on his investigation as a sceptic in the Humean tradition.

  3. The same thing might be happening in art and love. We first encounter something that totally blows us away, then as time goes on, and we encounter it more and more, we become less and less enchanted and finally it seems so ordinary. It’s rather depressing.

  4. Joseph, welcome to the comments.

    I am pleased you picked up on my mention of Meillassoux, which I sort of buried in there, because this is of course the metaphysical heart of the question. All the psi-stuff, and the case-of-the-leaking-statistics, is interesting, but the real issue is, what is reality?

    Now you are quite right to note that M's considerations have to do, as you put it, with the universe as a whole -- though I think this formulation is problematic, as M's invocation of Cantor (via Badiou) actually makes "whole" a very difficult concept to use. Leaving that aside (though it may be of the essence), the question you raise -- "might the very problem with the scientific method itself be the inherent impossibility of accurately measuring the probable and improbable at all?" -- is one of the issues I have in mind. It might in fact impact the very coherence of the idea of probability.

    This would really be the next step beyond quantum mechanics. Robert Laughlin has made some suggestions in this direction in his book A Different Universe

  5. ombhurbhuva~~
    Yes, this post is of course rank with the smell of the deli.

    My attitude is the sort of thing that sets people at CSI loading their revolvers. People doing psychic research often are heard complaining that the skeptical attitude itself is somehow inimical to certain phenomena manifesting. This is of course seen as an invitation to every table-bumping fraudster to invade the lab. But there is something to it. Scientific skepticism is a very particular mode of inquiry, a sieving consciousness that selects objects of belief of a very particular kind. And the question I have is, is this the best way to be (or at the very least, to be all the time)? One can ask this without selling one's soul to sleight-of-hand con-artists.

    Thanks for reminding me of Flew's Psychical Research book.

  6. Gary, you make a very apt observation. And in fact, if we could see it in politics (& surely we can, in e.g. the way political "common sense" gets established), that would be all four of Badiou's occasions of the event. I think you are on to something.

  7. Skholiast,

    I appreciate your insightful comparison between Lehrer's article on the decline effect (which somehow I had missed) and Bem's recent work on psi, even if you don't draw the same conclusions I would from said comparison (namely, that Bem's psi effect will decline because it probably isn't there).  Specifically, I'd like to push back against the hyperchaos-like thesis that the decline effect is real.  I should also state at the outset that I'm a physicist, thus my gut instinct is, put bluntly, to cheer on team Science and boo the Other Side.  Nevertheless, I try as much as possible to keep an open mind about these things.

    A problem with the thesis that the decline effect is real (a la Meillassoux's hyperchaos), is that the  strength of the decline effect differs depending on the complexity of the object of study.  That is, studies on complex things like mice and humans which purport to show the X effect are often later overturned, whereas experiments on simple stuff like protons and electrons which prove the Y effect are usually not.  It is true that, as Lehrer briefly mentions, physics experiments occur where the strength of gravity is anomalously different in Nevada or the value of the weak coupling ratio changes unexpectedly in time.  However, this is the mere change in the numerical value of a constant over time (or space), not the overturning the "gravity effect" or the "neutron decaying effect."

    Simply put, one can envision two competing explanations for the decline effect's existence: (1) it is an inherent aspect of nature, or (2) it is an inherently sociological phenomenon among scientists.  The advantage of (2) is that it naturally explains why the findings of social and medical science display more of a decline effect than the findings of physics.  Mice are clearly more complex than particles, since the phenomena associated with the former are often "by-products of variables we don't understand" (Lehrer) and therefore it's that much more difficult to, as you put it, "construct an artificial circumstance in which the only variable is the mechanism of your hypothesis."  Thus, on the one hand the inherent variability of mice and humans make such fields of study ripe for the operation of the sociological forces that give rise to the decline effect.  On the other hand, the limited range of "behaviors" that electrons can exhibit constrains said sociological forces such that the decline effect is limited to, for example, the change in the value of the electron's charge, instead of the existence of the Coulomb force.

    Finally, buttressing (2) are the many different sociological effects Lehrer names in his New Yorker piece:  The desire to avoid reporting null results (this literally happened to me yesterday), confirmation bias, an illogical reliance of Fisher's significance test (see, e.g., this Science article), the worsening of scholarship in fashionable subjects, and the different results researchers in the East and the West get from studies on acupuncture.

  8. Grad Student,

    Apologies if your comment took longer to appear than you expected; for some unaccountable reason, it was classified as spam by Blogger. (Whereas, as Ombhurbhuva noted, the post itself is more likely to draw comparisons with baloney).

    My instinct, being that of a good 21st-century westerner, is that the decline effect is almost always a sociological one. I raised the hyperchaos possibility as an interesting intersection between contemporary metaphysical debates and recent scientific disputes, because, well, I think it's interesting to raise the question. But I also am willing to be a little blurry about the exact place we draw the distinction between sociological and 'natural' effects. It's not that this distinction has no pertinence, but the interesting thing about the decline effect is that it forces us to rethink just where the distinction lies. We were very confident it lay well on the other side of a lot of pharmaceutical studies, for instance (and that's not even considering questions about profits to be made). But if noise turned out to be so universal as to be impossible to screen out, this might mean a practical limit on scientific certitude itself.

    On the hypothesis that the decline effect is a sociological phenomenon pure and simple, it seems reasonable to respond the cure for limping science is better, more robust, more stern-with-itself science. But there is another possibility, which is that given phenomena of sufficient complexity, it will always be impossible to establish "laboratory conditions." In this case, "better science" will mean simply chastened science.

    A question: I grant that the change in a physical constant is not the overturning of the effect associated with the constant. But what, pray tell, can the meaning of "constant" be if the constant proves, well, inconstant?

    Lastly: strictly speaking, it seems to me that any story applying Meillassoux's hyperchaos to events within the universe must border on the non-scientific. The idea of the laws of nature changing, be it ever so sophisticated, seems still to approximate the scenario of everything in the universe simultaneously doubling in size. (Coming from me, this isn't meant as a damning objection, obviously).

  9. Yes Psi is shy yet if something exists it ought to be manifest, experiments ought to be able to discover it. The problem is however that we are attempting to establish a paradox and a violation of the principle of non-contradiction. It cannot be the case that we are at one and the same time and place at another time and place unless our conception of time and space is inadequate to the universe in which such anomalies are legal. May I offer the vertiginous suggestion that there is what the existentialist Sartre would call the nausea factor or a fear of falling into the viscous absurd. This affect must get stronger as testing progresses and one would expect a decline in the unacceptable. There may be, and here I speculate, a falling off greater than chance before the happy medium is reached.

  10. Skholiast,

    Sorry for the delay. You asked:

    what, pray tell, can the meaning of "constant" be if the constant proves, well, inconstant?

    You're right of course, if a measured quantity thought to be constant was, in fact, not constant, then that would bring about some change to physical theory. This change could range from a paradigm shift (e.g. constant Newtonian space-time to relativistic space-time) to merely a slight modification to existing theory (e.g. neutrino oscillations, though this phenomenon may still prove to be more radical than currently thought). In either of these scenarios, however, the ensuing change does not mean that the previous theory is useless. For the previous theory to have been accepted in the first place, it must have accurately described nature within appropriate limits. For example, in astrophysics, in the vast majority of cases, Newtonian gravity is still used over general relativity.

    Regarding your last comment, in which you state that the laws of physics changing is analogous to everything in the universe simultaneously doubling in size, I disagree. Many physicists have specifically looked at the possibility of dimensionless constants changing, and I can assure you that the consequences can be detectable. See, for example, the latter part of Sean Carroll's recent post.