← back to the index
PR-014 Otolaryngology

Routine Tonsillectomy — a Million Needless Operations a Year, Some Fatal

Patients treated
~1.4 million U.S. procedures/yr at the 1959 peak; ~30% of American children operated on by mid-century
Era performed
c.1900s–1970s routine overuse (peak 1930s–1950s)
Disconfirming trial
Glover variation study (1938); Paradise NEJM randomized trial (1984)
Status
Restricted

Summary

For roughly the first three-quarters of the twentieth century, tonsillectomy was the most frequently performed operation in the United States — a near-compulsory rite of childhood scheduled on the order of a million-plus times a year for sore throats, "mouth breathing," poor appetite, and the vague proposition that a child would simply be healthier without tonsils; the gap between that universal promise and the evidence is the entire case, because the operation was never shown to deliver the broad benefits claimed, killed a measurable number of the children it was sold to protect, and for a period in the 1940s demonstrably raised the risk of paralytic polio. At its 1959 peak roughly 1.4 million tonsillectomies were performed annually in the U.S., the overwhelming majority on children, and by mid-century an estimated 30 percent of American children had lost their tonsils — many to surgeons who, examined honestly, could not say why.

The procedure rode the "focal infection" theory: the early-twentieth-century belief that lurking pockets of chronic infection in the tonsils seeded disease throughout the body and were best excised pre-emptively. On that theory the indication became, in practice, the mere possession of tonsils. The surrogate that justified the knife was not a measured health outcome but a clinical impression — the tonsils "looked enlarged" — and impressions, it turned out, were nearly random. In 1934 the American Child Health Association sent 1,000 New York schoolchildren through successive examinations and found 61 percent had already been tonsillectomized; of the remaining 39 percent, physicians recommended surgery for all but 65, then for nearly half of those who had just been cleared, and again for nearly half of that residue — a recursive demonstration that the indication lived in the examiner, not the child.

The disconfirming evidence accumulated for forty years before the practice yielded. James Alison Glover's 1938 study showed English tonsillectomy rates varying by an order of magnitude between districts with no relation to disease — the founding observation of "unwarranted variation," still called the Glover phenomenon. From 1942 onward, epidemiologists documented that children tonsillectomized shortly before exposure to poliovirus suffered the deadly bulbar form at multiples of the background rate. And in 1984 the first rigorous randomized trial, by Jack Paradise in the New England Journal of Medicine, found a real but narrow benefit only for the most severely and frequently infected children — a tiny slice of those who had been operated on for decades. The operation was not banned. It was restricted, its indications tightened, its volume cut by more than half, retired from routine use by evidence that arrived long after the harm.

Timeline

c.1900–1910
Focal-infection theory licenses pre-emptive removal
The doctrine that chronic foci in the tonsils seed systemic disease takes hold; removing healthy-but-"suspect" tonsils becomes prophylaxis, not treatment, and the indication effectively becomes the organ's existence.
1915–1960
Tonsillectomy becomes the most common U.S. operation
For roughly four decades it is the single most frequently performed surgical procedure in the country, performed overwhelmingly on children in clinics, schools, and "tonsil days."
1934
The American Child Health Association exposes the indication's randomness
Of 1,000 New York schoolchildren, 61% are already tonsillectomized; of the rest, surgeons recommend operating on all but 65, then on ~45% of those just cleared, demonstrating the recommendation tracks the examiner, not the disease.
1938
Glover documents unwarranted variation
James Alison Glover shows tonsillectomy rates across England and Wales vary roughly tenfold between districts, unrelated to tonsil disease — and notes girls have more tonsillitis while boys are cut more often. The "Glover phenomenon" is born.
1942
Aycock links recent tonsillectomy to bulbar polio
Epidemiologic work (Aycock and others) reports that children operated on shortly before poliovirus exposure develop the fatal bulbar form far more often than expected, removing a protective barrier at the worst moment.
1943
The Utah epidemic confirms the danger
During the ascending phase of a Utah polio outbreak, on the order of 43% of bulbar and bulbospinal cases are preceded by tonsillectomy within 30 days; recently operated children show roughly 2.6 times the general polio incidence.
1945
Bakwin names the pattern as pathology of medicine
Pediatrician Harry Bakwin cites the ACHA findings as a "convincing demonstration of the absurdity of indiscriminate tonsillectomy," framing the operation as a disorder of the profession.
1959
The American peak
U.S. tonsillectomy volume reaches roughly 1.4 million procedures per year, near-saturation among children, even as the rationale is already crumbling in the literature.
1970s
Rates begin to fall as efficacy evidence accrues
British rates decline through the decade as studies conclude the operation is far less effective for sore throats and other indications than believed; U.S. practice follows.
1978
NIH finds the benefit unproven
A U.S. National Institutes of Health assessment determines there is insufficient evidence that the benefits of routine tonsillectomy outweigh its risks, accelerating tighter guidelines.
1984
Paradise's randomized trial draws the new boundary
In the New England Journal of Medicine, Jack Paradise reports that surgery helps only severely, frequently infected children meeting stringent criteria — validating restriction, not abolition, and indicting the decades of routine use.
1980s–present
Volume more than halves; indications shift
Annual U.S. tonsillectomies fall to roughly 500,000, and the leading indication migrates from infection to obstructive sleep-disordered breathing — a different, evidence-supported reason to operate.

The Organ as Suspect: How Focal Infection Made Every Child a Candidate

Tonsillectomy scaled because it answered a theory rather than a complaint. The focal-infection doctrine held that the tonsils were chronic reservoirs of infection silently poisoning the body, so that the prudent course was removal before trouble announced itself. Under that logic the threshold for surgery collapsed: a child did not need a documented illness, only tonsils that a clinician judged "enlarged," "boggy," or "diseased-looking." Because no objective measure governed the call, the call became a habit of the eye. The 1934 American Child Health Association study made the consequence unmistakable — three rounds of examination, each clearing some children and condemning roughly 45 percent of those just cleared, would have ended with almost no child keeping his tonsils. The indication was not in the patient; it was a reflex of the system, reinforced by the convenience of a quick, lucrative, school-and-clinic operation that parents had been taught to expect. The surrogate endpoint — tonsils that looked the part — was achieved every time, while the true endpoint, a healthier child, was never measured at all.

The Polio Years: When the Routine Operation Became a Vector

The most damning turn was not inefficacy but active harm. Beginning in 1942, epidemiologists studying polio epidemics noticed that children tonsillectomized in the weeks before exposure were stricken with bulbar poliomyelitis — the form that paralyzes the muscles of breathing and swallowing and kills — at rates far above expectation. The raw fresh wound in the throat appeared to open a portal for the virus to the brainstem. In the 1943 Utah outbreak, on the order of 43 percent of bulbar and bulbospinal cases had undergone tonsillectomy within the prior thirty days, and recently operated children carried roughly 2.6 times the general incidence of polio. This was the surgical promise inverted: an operation marketed as protecting a child's health was, during epidemic season, converting survivable infections into fatal ones. The danger was real and documented, though it should not be overstated — it was a seasonal, exposure-dependent multiplier, not a universal effect, and it receded once polio itself was vaccinated away from 1955. But for more than a decade it stood as proof that the routine operation could be lethal in ways its boosters had never weighed.

The Long Reckoning: Variation, Trials, and a Boundary Finally Drawn

Tonsillectomy was not stopped by a scandal; it was eroded by measurement. Glover's 1938 finding — tenfold variation between districts with no link to disease — supplied the conceptual key: if equally sick populations are operated on at wildly different rates, the difference is the doctor, not the diagnosis. That insight, the Glover phenomenon, became the founding case of small-area analysis and the modern study of unwarranted variation. By the 1970s the efficacy literature had caught up, and the 1978 NIH assessment declared the benefit unproven. The decisive instrument was Jack Paradise's 1984 randomized trial in the New England Journal of Medicine, which compared surgery against watchful waiting in children with documented recurrent throat infection and found a genuine but modest advantage confined to the most severely affected — the children meeting strict, counted criteria. The trial did not abolish the operation; it drew a line, and the vast routine population fell on the wrong side of it. Volume more than halved, and the dominant indication shifted to obstructive sleep-disordered breathing, a structurally different and evidence-backed reason to operate. The de-mythologization was quiet but total: the universal childhood rite was exposed as a habit dressed as medicine, its broad benefits unproven, a fraction of it fatal, and its true value confined to a narrow band that decades of indiscriminate cutting had drowned out.

Contributing Factors

01
A theory substituted for an indication
Focal-infection doctrine declared the tonsils inherently dangerous, converting the mere possession of the organ into grounds for its removal. When the rationale for an intervention is a sweeping theory rather than a measured deficit in the individual, the threshold for acting collapses to zero and everyone becomes a candidate.
02
A subjective surrogate that could not be falsified
"Enlarged" or "diseased-looking" tonsils were a clinical impression with no objective anchor, so the operation always appeared justified and never appeared to fail. An endpoint that the operator defines by eye, and that no outcome can contradict, is a license for unlimited intervention.
03
Unwarranted variation hidden in plain sight
Tenfold differences in rates between comparable districts proved the surgery tracked local custom, not disease — but variation is invisible until someone measures across providers. A practice whose volume depends on which provider a patient sees rather than what disease the patient has is overuse by definition, even when each individual decision looks reasonable.
04
Convenience and incentive engineered for mass scale
Quick, repeatable, paid, and culturally expected — performed on "tonsil days" in clinics and schools — tonsillectomy had every structural feature that multiplies a procedure faster than evidence can govern it. Low friction plus aligned financial and social incentives turns an unproven operation into a default.
05
Harm and inefficacy both required external disconfirmation
Neither the polio signal nor the absence of benefit emerged from the operating specialty's own gatekeeping; they came from epidemiologists, child-health surveyors, and eventually a randomized trial. A field with no internal mechanism to retire a discredited default will keep performing it until outside measurement forces the boundary.

Aftermath

The material consequence is a vast, largely undocumented toll: tens of millions of children operated on across the century for indications that would not survive a trial, an unknown but non-trivial number dead from anesthesia and hemorrhage, and a cohort of bulbar-polio cases that recent surgery helped create. The durable ripple is intellectual. Glover's variation finding and the ACHA's recursive-examination study became founding texts of health-services research and the modern campaigns against low-value care; "the Glover phenomenon" still names the problem of treatment driven by provider habit. The 1978 NIH assessment and the 1984 Paradise trial reset the standard from impression to evidence, halving volume and re-pointing the operation at sleep-disordered breathing, where it earns its place. Tonsillectomy was never banned, and for the right child it remains good medicine — which is precisely the lesson. It is the textbook case of a useful procedure ruined by indiscriminate use, the operation that taught medicine that the question is not whether a treatment ever helps but whether the particular patient presenting is one of the few it helps. In the literature it survives as the original byword for overtreatment: the assembly-line cure for a disease most of its patients did not have.

Lessons

  1. Demand an indication you can measure in the patient, not a theory you can apply to everyone. When the justification for an intervention is a general doctrine about a suspect organ rather than a documented deficit in the individual, the threshold collapses and everyone qualifies. Require a counted, falsifiable reason before each act, or you are treating the theory, not the person.
  2. Distrust any endpoint defined by the operator's eye. If success or indication rests on a subjective impression that no outcome can contradict, the intervention will always look justified and never look like failure. Anchor the decision to an objective, pre-specified measure that can prove you wrong.
  3. Measure across providers, because variation exposes what individual judgment hides. A tenfold difference in rates between comparable populations is overuse made visible, even when each local decision seems sound. Audit the spread of practice, not just the defensibility of any single case.
  4. Weigh the harms an enthusiastic default conceals, including the rare and seasonal ones. A routine operation can quietly kill — through anesthesia, hemorrhage, or, as here, by opening a portal to a circulating virus. Account for the low-probability catastrophe before scale makes it a body count.
  5. Restrict to the few it helps rather than abolish or universalize. The error was not the operation but its indiscriminate use; the fix was a strict, evidence-defined boundary. When a treatment helps a narrow band, find that band and hold the line there — do not let the band's existence justify treating everyone.

References