In July 2002, orthopedic surgeon J. Bruce Moseley and a Houston Veterans Affairs team reported in the New England Journal of Medicine that 180 patients with osteoarthritis of the knee, randomized double-blind to arthroscopic débridement, arthroscopic lavage, or a sham operation in which surgeons made skin incisions but inserted no instrument, had identical outcomes — and the gap between that finding and a decade of confident practice is the entire case. By 2002 the scope-and-clean operation for the arthritic knee was being performed on the order of 650,000 times a year in the United States at roughly $5,000 apiece, a multi-billion-dollar standard of care, on the mechanistic premise that flushing out debris and trimming frayed cartilage relieved pain. The trial showed it relieved nothing the placebo did not.
The harm here was not a body count of deaths but of unnecessary operations: hundreds of thousands of patients each year underwent a real surgery — anesthesia, incisions, infection risk, recovery, deductibles — to obtain a benefit indistinguishable from being wheeled into an operating room, cut, and sewn shut. At no point over two years of follow-up did either intervention group report less pain or better function than the sham group; the 95 percent confidence intervals excluded any clinically meaningful difference. The wonder of arthroscopy had been real for torn menisci and loose bodies, but for arthritis pain it was theater.
What makes the episode an exemplar of withdrawal is that it was killed by the right kind of evidence. Surgery had long been treated as exempt from the placebo-controlled standard demanded of drugs, on the assumption that an operation cannot ethically be faked. Moseley’s team did precisely that — and the result was so clean that the Centers for Medicare and Medicaid Services moved within a year to defund the procedure for osteoarthritis. A 2008 Canadian trial led by Alexandra Kirkley confirmed that arthroscopy added nothing to optimized physical and medical therapy, and by 2017 international guideline panels were issuing strong recommendations against it. The operation was never recalled or banned. It was disconfirmed, defunded, and abandoned — a textbook demonstration that a popular surgery can be a placebo, and that without a sham control no one would have known.
In 1987 the French neuroradiologist Hervé Galibert reported injecting acrylic bone cement into a single cervical vertebra eaten away by a hemangioma, and by the mid-1990s that salvage technique had been repurposed into a booming outpatient business — percutaneous vertebroplasty for the painful spinal fractures of ordinary osteoporosis — on the strength of nothing but uncontrolled case series in which patients reported feeling better; the gap between that universal operator conviction that the procedure plainly worked and what a blinded comparison actually showed is the entire case. When the procedure was finally tested against a credible fake in 2009, the cement turned out to do nothing the placebo did not do.
The clinical claim was seductive and mechanically intuitive: drive a needle through the back under imaging, inject polymethyl methacrylate (PMMA) into the collapsed vertebral body, stabilize the fracture, and abolish pain — often, operators said, on the table. By the mid-2000s the operation and its cousin kyphoplasty were a multibillion-dollar global market; U.S. Medicare alone was paying for vertebral augmentation in roughly a fifth to a quarter of compression-fracture patients, on the order of tens of thousands of procedures a year. The evidence underneath was almost entirely uncontrolled. Pain from an acute vertebral fracture improves substantially on its own over weeks, and a needle in the back is a powerful theatrical placebo — two facts the case series could not separate from any true effect of the cement.
On 6 August 2009 the New England Journal of Medicine published, in a single issue, two independent randomized double-blind sham-controlled trials. David Kallmes’s multicenter INVEST trial (131 patients) and Rachelle Buchbinder’s Australian trial (78 patients) both gave control patients the full ritual — the same positioning, the same local anesthetic, the same room, the smell of mixed cement — but no PMMA. Both found no meaningful difference. Pain and disability fell sharply in both arms, by roughly the same amount, at every follow-up point.
No agency banned vertebroplasty and no court enjoined it. It was discredited by its own pivotal trials and then slowly throttled by guidelines, payers, and a 2018 Cochrane review, surviving today only as a restricted, narrow-indication option rather than the routine fracture treatment it had been. It stands as the textbook modern lesson that a procedure can feel like it works to every operator and every patient and still be a placebo — and that the only way to know is the sham control almost nobody wanted to run.
In 1920 the Chicago obstetrician Joseph Bolivar DeLee, in a paper titled “The Prophylactic Forceps Operation,” urged physicians to cut the perineum of laboring women as a routine to spare them the worse damage of a ragged spontaneous tear — and the gap between that protective promise and the eventual evidence is the entire case. By the late twentieth century the operation DeLee reasoned his way into was one of the most common surgical procedures performed on American women, done on the order of a third of all vaginal deliveries (60.9% in 1979) and on a clear majority of first-time mothers, almost none told there was no trial behind it.
The justification was intuitive: a clean, controlled incision must heal better than a jagged laceration, and a pre-emptive cut must protect the pelvic floor against future prolapse and incontinence. The intuition was wrong in the most consequential way. When the procedure was finally tested against the comparator it had skipped for decades — selective use, cutting only on indication — the routine cut did not prevent severe trauma. A midline episiotomy extended the wound straight toward the anal sphincter and rectum, so the prophylactic incision was itself causally linked to the third- and fourth-degree tears it was meant to forestall.
The reckoning was slow because the practice was entrenched, not because the data were ambiguous. A 1983 interpretive review of more than 350 sources spanning 1860 to 1980 found no defensible evidence for routine use; the 1993 Argentine Episiotomy Trial, a randomized study of 2,606 women, showed routine use conferred no benefit and more harm; and the 2005 AHRQ-commissioned systematic review in JAMA closed the question, finding routine episiotomy improved no immediate outcome and prevented no incontinence or prolapse. In April 2006 the American College of Obstetricians and Gynecologists issued Practice Bulletin No. 71, recommending the routine be restricted. The procedure was not banned — it retains narrow, evidence-based indications — but its eighty-year career as a default was abandoned. It stands as obstetrics’ cleanest case of a plausible, near-universal intervention adopted on reasoning and reversed only by the trial that should have come first.
For roughly the first three-quarters of the twentieth century, tonsillectomy was the most frequently performed operation in the United States — a near-compulsory rite of childhood scheduled on the order of a million-plus times a year for sore throats, “mouth breathing,” poor appetite, and the vague proposition that a child would simply be healthier without tonsils; the gap between that universal promise and the evidence is the entire case, because the operation was never shown to deliver the broad benefits claimed, killed a measurable number of the children it was sold to protect, and for a period in the 1940s demonstrably raised the risk of paralytic polio. At its 1959 peak roughly 1.4 million tonsillectomies were performed annually in the U.S., the overwhelming majority on children, and by mid-century an estimated 30 percent of American children had lost their tonsils — many to surgeons who, examined honestly, could not say why.
The procedure rode the “focal infection” theory: the early-twentieth-century belief that lurking pockets of chronic infection in the tonsils seeded disease throughout the body and were best excised pre-emptively. On that theory the indication became, in practice, the mere possession of tonsils. The surrogate that justified the knife was not a measured health outcome but a clinical impression — the tonsils “looked enlarged” — and impressions, it turned out, were nearly random. In 1934 the American Child Health Association sent 1,000 New York schoolchildren through successive examinations and found 61 percent had already been tonsillectomized; of the remaining 39 percent, physicians recommended surgery for all but 65, then for nearly half of those who had just been cleared, and again for nearly half of that residue — a recursive demonstration that the indication lived in the examiner, not the child.
The disconfirming evidence accumulated for forty years before the practice yielded. James Alison Glover’s 1938 study showed English tonsillectomy rates varying by an order of magnitude between districts with no relation to disease — the founding observation of “unwarranted variation,” still called the Glover phenomenon. From 1942 onward, epidemiologists documented that children tonsillectomized shortly before exposure to poliovirus suffered the deadly bulbar form at multiples of the background rate. And in 1984 the first rigorous randomized trial, by Jack Paradise in the New England Journal of Medicine, found a real but narrow benefit only for the most severely and frequently infected children — a tiny slice of those who had been operated on for decades. The operation was not banned. It was restricted, its indications tightened, its volume cut by more than half, retired from routine use by evidence that arrived long after the harm.
On October 30, 1967, in Zurich, the neurosurgeon M. Gazi Yaşargil sutured a scalp artery to a cortical branch of the middle cerebral artery under the operating microscope, rerouting blood around a blocked vessel to feed a starving brain; the operation was elegant, technically dazzling, and — for the prevention of stroke in patients with carotid and middle-cerebral disease — almost entirely unproven, and that gap between surgical beauty and clinical benefit is the entire case. For nearly two decades the extracranial-intracranial (EC-IC) arterial bypass spread on the strength of its own plausibility and on case series reporting open grafts, until a single randomized trial showed it prevented nothing it claimed to prevent.
The operation was never a fraud and never a mass killer in the lobotomy sense. It killed and disabled quietly, at the margins: a procedure with a roughly 12 percent thirty-day rate of stroke or death imposed up front on patients who, the trial would show, were no better protected afterward. The surrogate that sustained it was graft patency — the bypass stayed open in about 96 percent of cases, a number surgeons and angiograms could see and celebrate. A patent vessel looked like a prevented stroke. It was not the same thing, and conflating the two is the mechanism that kept the operation alive.
The reckoning arrived not from a regulator or a court but from an eight-year, NIH-funded randomized controlled trial led by the Canadian neurologist Henry J. M. Barnett of London, Ontario. Published in the New England Journal of Medicine on November 7, 1985, the International EC/IC Bypass Study randomized 1,377 patients at 71 centers in 14 countries and found that surgery added to best medical care did not reduce fatal or nonfatal stroke; two subgroups — patients with severe middle-cerebral stenosis and those with persisting symptoms after carotid occlusion — actually fared worse with the operation. Within a few years the procedure collapsed from a flourishing subspecialty to a narrow, rarely-indicated salvage technique. It was not banned. It was disconfirmed, and it became one of medicine’s foundational lessons in why a trial must precede an operation, not follow it.