Gastric Freezing — Froze 15,000 Stomachs, Then a Sham Freeze Worked Just as Well

In May 1962, University of Minnesota surgical chairman Owen H. Wangensteen announced in JAMA that a duodenal ulcer could be cured without an operation by swallowing a balloon and chilling the stomach to roughly minus-10 degrees Celsius — a bloodless “physiological gastrectomy” — and within two years thousands of Americans had been frozen on refrigeration machines that had never passed a single controlled trial; the gap between that announcement and the 1969 finding that a fake freeze worked exactly as well is the entire case. Gastric freezing was not a fringe quackery. It was launched by one of the most decorated academic surgeons in the United States, published in the country’s leading medical journal, and adopted at scale before anyone tested it against a placebo.

The promise rested on a plausible mechanism and a flattering measure. Wangensteen reasoned that supercooling the gastric mucosa would knock out the acid-secreting cells that drove ulcer disease, achieving by cold what surgeons then achieved by cutting out half the stomach. Early uncontrolled series were spectacular: investigators reported that on the order of 85 percent of patients had prompt relief of pain and apparent healing of ulcer craters. That surrogate — short-term symptom relief, the most placebo-responsive endpoint in all of medicine — was mistaken for cure. The acid suppression was real but transient, returning to baseline within weeks to months, and the symptom relief was, it later emerged, almost entirely the patient’s own expectation.

The reckoning came from the design that the launch had skipped. By 1964 controlled and double-blind studies were appearing, and in July 1969 a multi-institution cooperative trial led by Julian Ruffin reported in the New England Journal of Medicine that patients given a genuine gastric freeze did no better than patients given a sham freeze in which the same balloon circulated fluid that was never chilled. The treatment effect, against a proper control, was zero. Gastric freezing collapsed almost as fast as it had spread. It was never banned and never recalled; it was abandoned — and it survives in textbooks as the canonical demonstration of why a new procedure must be tested against a sham before, not after, it is sold to thousands.

Vertebroplasty for Spinal Fractures — the Cement Fix Two Sham Trials Killed in 2009

In 1987 the French neuroradiologist Hervé Galibert reported injecting acrylic bone cement into a single cervical vertebra eaten away by a hemangioma, and by the mid-1990s that salvage technique had been repurposed into a booming outpatient business — percutaneous vertebroplasty for the painful spinal fractures of ordinary osteoporosis — on the strength of nothing but uncontrolled case series in which patients reported feeling better; the gap between that universal operator conviction that the procedure plainly worked and what a blinded comparison actually showed is the entire case. When the procedure was finally tested against a credible fake in 2009, the cement turned out to do nothing the placebo did not do.

The clinical claim was seductive and mechanically intuitive: drive a needle through the back under imaging, inject polymethyl methacrylate (PMMA) into the collapsed vertebral body, stabilize the fracture, and abolish pain — often, operators said, on the table. By the mid-2000s the operation and its cousin kyphoplasty were a multibillion-dollar global market; U.S. Medicare alone was paying for vertebral augmentation in roughly a fifth to a quarter of compression-fracture patients, on the order of tens of thousands of procedures a year. The evidence underneath was almost entirely uncontrolled. Pain from an acute vertebral fracture improves substantially on its own over weeks, and a needle in the back is a powerful theatrical placebo — two facts the case series could not separate from any true effect of the cement.

On 6 August 2009 the New England Journal of Medicine published, in a single issue, two independent randomized double-blind sham-controlled trials. David Kallmes’s multicenter INVEST trial (131 patients) and Rachelle Buchbinder’s Australian trial (78 patients) both gave control patients the full ritual — the same positioning, the same local anesthetic, the same room, the smell of mixed cement — but no PMMA. Both found no meaningful difference. Pain and disability fell sharply in both arms, by roughly the same amount, at every follow-up point.

No agency banned vertebroplasty and no court enjoined it. It was discredited by its own pivotal trials and then slowly throttled by guidelines, payers, and a 2018 Cochrane review, surviving today only as a restricted, narrow-indication option rather than the routine fracture treatment it had been. It stands as the textbook modern lesson that a procedure can feel like it works to every operator and every patient and still be a placebo — and that the only way to know is the sham control almost nobody wanted to run.

Fetal Nigral Cell Transplants for Parkinson’s — the Brain Graft That Triggered Unswitchable Dyskinesias

In 1987 a team led by neurologist Olle Lindvall and neuroscientist Anders Björklund at Lund University, Sweden, began implanting dopamine-producing cells dissected from aborted human fetuses into the brains of Parkinson’s patients; the open-label results of the 1990s — surviving grafts on PET, patients walking who had been frozen — were celebrated as the first biological cure for a neurodegenerative disease. The gap between that promise and the controlled evidence is the case. Tested the way a drug would be — against sham brain surgery, double-blind — the graft did not beat placebo on its primary endpoint and inflicted a new, largely untreatable harm: persistent involuntary movements that ran on after every drop of levodopa was withdrawn.

Both trials that ended the era were funded by the U.S. National Institutes of Health and built around a placebo arm earlier enthusiasts had called unnecessary. In Curt Freed’s Denver–Columbia trial, published in The New England Journal of Medicine on March 8, 2001, 40 patients aged 34 to 75 were randomized to a fetal-tissue graft or to sham surgery — burr holes drilled, no cells implanted. The graft showed no benefit on the pre-specified global rating; a positive signal appeared only in a post-hoc subgroup aged 60 or younger. Then came the harm: dystonia and dyskinesias in roughly 15 percent of grafted patients (5 of 33), persisting after levodopa was reduced or stopped. The second NIH trial, run by neurologist C. Warren Olanow and published in Annals of Neurology in September 2003, deepened the failure: across 34 patients, no significant effect on the motor UPDRS (p = 0.244) at 24 months, 56 percent with off-medication dyskinesia, and a conclusion that transplantation “currently cannot be recommended as a therapy.”

The case is exemplary because the grafts worked biologically and failed clinically. Fluorodopa uptake rose; dopamine neurons survived robustly and were confirmed at autopsy. The cells lived — but thriving grafts drove a runaway, unregulated release of dopamine the brain could not modulate, leaving a procedure that could not be titrated, withdrawn, or reversed: a worse failure mode than the disease it meant to cure. The field abandoned routine fetal grafting and turned to the problem it had skipped — proving, against placebo, that putting cells in a brain helps the person attached to it.

Insulin Coma Therapy — Dozens of Near-Fatal Comas, Beaten by a Sleeping Pill

In 1933 the Austrian psychiatrist Manfred Sakel announced from Vienna that he could break schizophrenia by injecting patients with enough insulin to crash their blood sugar into deep coma — and reported, on uncontrolled case series with no comparison group, recovery rates of 70 to 80 percent; the distance between that claim and what a controlled trial eventually found is the whole of this case. The procedure that resulted held patients in repeated, deliberately induced hypoglycemic comas — typically a course of 20 to 60 sessions, each lasting up to an hour, terminated by glucose — across asylums in Europe and North America for two decades, and killed on the order of one to two patients in every hundred treated, with some series running higher.

Insulin coma therapy (ICT) did not survive on evidence. It survived on enthusiasm and on a flattering selection effect. As the British psychiatrist Harold Bourne argued in his 1953 Lancet paper “The Insulin Myth,” ICT units selected younger, recently-ill, better-prognosis patients, lavished them with intensive nursing in dedicated wards, and then credited insulin for an improvement that the selection and the attention had largely produced. Bourne’s verdict — that insulin patients were “an elite group sharing common privileges and perils,” and that the coma added nothing specific — was a theoretical demolition that the field initially refused to print; the Journal of Mental Science sat on his manuscript for a year and rejected it, telling him to “get more experience.”

The empirical reckoning arrived in 1957, when Brian Ackner, Arthur Harris and A.J. Oldham published in The Lancet one of the first randomized controlled trials in psychiatric history. Fifty schizophrenia patients were randomly allocated either to insulin coma or to an identical regimen in which the unconsciousness was produced by barbiturates instead — same ward, same nursing, same coma, different agent. There was no difference in outcome. Whatever the regimen achieved, insulin was not the active ingredient. Within a few years ICT had collapsed, helped over the edge by chlorpromazine, which by the late 1950s delivered comparable results without driving anyone into a coma. It was never recalled and never banned; it was simply discredited and abandoned, and it is now taught as the first major therapy retired by a randomized controlled trial — and a textbook illustration of how a selection effect can masquerade as a cure.

Therapeutic Bloodletting — 2,000 Years of Care Pierre Louis Disproved by Counting Corpses

For more than two thousand years, from the Hippocratic physicians of the fifth century BCE through the lancets of Benjamin Rush in 1793 and the basins that drained George Washington in 1799, the deliberate opening of a vein to let blood run was not fringe quackery but the central, prestige-laden therapy of Western medicine; the gap between that universal confidence and the documented absence of benefit — and frequent harm — is the entire case. Bloodletting was prescribed for fever, pneumonia, inflammation, apoplexy, melancholy, and almost everything else, on the strength of a humoral theory that mistook a visible physiological effect (a weaker pulse, a quieted patient) for a cure, and on the authority of an unbroken chain of celebrated practitioners who never once measured whether it worked.

The body count is impossible to total but enormous. In the “age of heroic medicine,” roughly 1780 to 1850, physicians escalated the dose: Rush bled Philadelphia’s yellow-fever patients to syncope and beyond during the 1793 epidemic, and Washington lost on the order of 80 ounces — about 40 percent of his blood volume, roughly 2.4 litres — across multiple bleedings in a single day before dying on December 14, 1799. The intervention reliably produced its surrogate endpoint, a slowed pulse and a sedated patient, while delivering anemia, hypovolemic shock, and accelerated death to people already weakened by disease.

The reckoning came not from a regulator but from arithmetic. In 1835 the Paris physician Pierre-Charles-Alexandre Louis published Recherches sur les effets de la saignée, applying what he called the “numerical method” to 77 pneumonia patients at La Charité. He compared those bled early (days 1–4 of illness) against those bled later (days 5–9) and found the result he himself called “startling and apparently absurd”: the early-bled died at 44 percent versus 25 percent for the late-bled, with no shortening of the disease that survived statistical scrutiny. The therapy that had defined medicine since antiquity was shown, by counting, to confer no benefit and probable harm.

No statute banned bloodletting. It was retired by evidence, by the rise of “therapeutic skepticism,” and by the gradual recognition that most fevers were self-limited and recovered despite, not because of, the lancet. It faded across the second half of the nineteenth century into obsolescence and is now the founding parable of evidence-based medicine: the case that proves the oldest, most universal, most authority-backed treatment can still be worthless once someone bothers to count.