Routine Tonsillectomy — a Million Needless Operations a Year, Some Fatal
Summary
For roughly the first three-quarters of the twentieth century, tonsillectomy was the most frequently performed operation in the United States — a near-compulsory rite of childhood scheduled on the order of a million-plus times a year for sore throats, "mouth breathing," poor appetite, and the vague proposition that a child would simply be healthier without tonsils; the gap between that universal promise and the evidence is the entire case, because the operation was never shown to deliver the broad benefits claimed, killed a measurable number of the children it was sold to protect, and for a period in the 1940s demonstrably raised the risk of paralytic polio. At its 1959 peak roughly 1.4 million tonsillectomies were performed annually in the U.S., the overwhelming majority on children, and by mid-century an estimated 30 percent of American children had lost their tonsils — many to surgeons who, examined honestly, could not say why.
The procedure rode the "focal infection" theory: the early-twentieth-century belief that lurking pockets of chronic infection in the tonsils seeded disease throughout the body and were best excised pre-emptively. On that theory the indication became, in practice, the mere possession of tonsils. The surrogate that justified the knife was not a measured health outcome but a clinical impression — the tonsils "looked enlarged" — and impressions, it turned out, were nearly random. In 1934 the American Child Health Association sent 1,000 New York schoolchildren through successive examinations and found 61 percent had already been tonsillectomized; of the remaining 39 percent, physicians recommended surgery for all but 65, then for nearly half of those who had just been cleared, and again for nearly half of that residue — a recursive demonstration that the indication lived in the examiner, not the child.
The disconfirming evidence accumulated for forty years before the practice yielded. James Alison Glover's 1938 study showed English tonsillectomy rates varying by an order of magnitude between districts with no relation to disease — the founding observation of "unwarranted variation," still called the Glover phenomenon. From 1942 onward, epidemiologists documented that children tonsillectomized shortly before exposure to poliovirus suffered the deadly bulbar form at multiples of the background rate. And in 1984 the first rigorous randomized trial, by Jack Paradise in the New England Journal of Medicine, found a real but narrow benefit only for the most severely and frequently infected children — a tiny slice of those who had been operated on for decades. The operation was not banned. It was restricted, its indications tightened, its volume cut by more than half, retired from routine use by evidence that arrived long after the harm.
Timeline
The Organ as Suspect: How Focal Infection Made Every Child a Candidate
Tonsillectomy scaled because it answered a theory rather than a complaint. The focal-infection doctrine held that the tonsils were chronic reservoirs of infection silently poisoning the body, so that the prudent course was removal before trouble announced itself. Under that logic the threshold for surgery collapsed: a child did not need a documented illness, only tonsils that a clinician judged "enlarged," "boggy," or "diseased-looking." Because no objective measure governed the call, the call became a habit of the eye. The 1934 American Child Health Association study made the consequence unmistakable — three rounds of examination, each clearing some children and condemning roughly 45 percent of those just cleared, would have ended with almost no child keeping his tonsils. The indication was not in the patient; it was a reflex of the system, reinforced by the convenience of a quick, lucrative, school-and-clinic operation that parents had been taught to expect. The surrogate endpoint — tonsils that looked the part — was achieved every time, while the true endpoint, a healthier child, was never measured at all.
The Polio Years: When the Routine Operation Became a Vector
The most damning turn was not inefficacy but active harm. Beginning in 1942, epidemiologists studying polio epidemics noticed that children tonsillectomized in the weeks before exposure were stricken with bulbar poliomyelitis — the form that paralyzes the muscles of breathing and swallowing and kills — at rates far above expectation. The raw fresh wound in the throat appeared to open a portal for the virus to the brainstem. In the 1943 Utah outbreak, on the order of 43 percent of bulbar and bulbospinal cases had undergone tonsillectomy within the prior thirty days, and recently operated children carried roughly 2.6 times the general incidence of polio. This was the surgical promise inverted: an operation marketed as protecting a child's health was, during epidemic season, converting survivable infections into fatal ones. The danger was real and documented, though it should not be overstated — it was a seasonal, exposure-dependent multiplier, not a universal effect, and it receded once polio itself was vaccinated away from 1955. But for more than a decade it stood as proof that the routine operation could be lethal in ways its boosters had never weighed.
The Long Reckoning: Variation, Trials, and a Boundary Finally Drawn
Tonsillectomy was not stopped by a scandal; it was eroded by measurement. Glover's 1938 finding — tenfold variation between districts with no link to disease — supplied the conceptual key: if equally sick populations are operated on at wildly different rates, the difference is the doctor, not the diagnosis. That insight, the Glover phenomenon, became the founding case of small-area analysis and the modern study of unwarranted variation. By the 1970s the efficacy literature had caught up, and the 1978 NIH assessment declared the benefit unproven. The decisive instrument was Jack Paradise's 1984 randomized trial in the New England Journal of Medicine, which compared surgery against watchful waiting in children with documented recurrent throat infection and found a genuine but modest advantage confined to the most severely affected — the children meeting strict, counted criteria. The trial did not abolish the operation; it drew a line, and the vast routine population fell on the wrong side of it. Volume more than halved, and the dominant indication shifted to obstructive sleep-disordered breathing, a structurally different and evidence-backed reason to operate. The de-mythologization was quiet but total: the universal childhood rite was exposed as a habit dressed as medicine, its broad benefits unproven, a fraction of it fatal, and its true value confined to a narrow band that decades of indiscriminate cutting had drowned out.
Contributing Factors
Aftermath
The material consequence is a vast, largely undocumented toll: tens of millions of children operated on across the century for indications that would not survive a trial, an unknown but non-trivial number dead from anesthesia and hemorrhage, and a cohort of bulbar-polio cases that recent surgery helped create. The durable ripple is intellectual. Glover's variation finding and the ACHA's recursive-examination study became founding texts of health-services research and the modern campaigns against low-value care; "the Glover phenomenon" still names the problem of treatment driven by provider habit. The 1978 NIH assessment and the 1984 Paradise trial reset the standard from impression to evidence, halving volume and re-pointing the operation at sleep-disordered breathing, where it earns its place. Tonsillectomy was never banned, and for the right child it remains good medicine — which is precisely the lesson. It is the textbook case of a useful procedure ruined by indiscriminate use, the operation that taught medicine that the question is not whether a treatment ever helps but whether the particular patient presenting is one of the few it helps. In the literature it survives as the original byword for overtreatment: the assembly-line cure for a disease most of its patients did not have.
Lessons
- Demand an indication you can measure in the patient, not a theory you can apply to everyone. When the justification for an intervention is a general doctrine about a suspect organ rather than a documented deficit in the individual, the threshold collapses and everyone qualifies. Require a counted, falsifiable reason before each act, or you are treating the theory, not the person.
- Distrust any endpoint defined by the operator's eye. If success or indication rests on a subjective impression that no outcome can contradict, the intervention will always look justified and never look like failure. Anchor the decision to an objective, pre-specified measure that can prove you wrong.
- Measure across providers, because variation exposes what individual judgment hides. A tenfold difference in rates between comparable populations is overuse made visible, even when each local decision seems sound. Audit the spread of practice, not just the defensibility of any single case.
- Weigh the harms an enthusiastic default conceals, including the rare and seasonal ones. A routine operation can quietly kill — through anesthesia, hemorrhage, or, as here, by opening a portal to a circulating virus. Account for the low-probability catastrophe before scale makes it a body count.
- Restrict to the few it helps rather than abolish or universalize. The error was not the operation but its indiscriminate use; the fix was a strict, evidence-defined boundary. When a treatment helps a narrow band, find that band and hold the line there — do not let the band's existence justify treating everyone.