Meta-analysis -- the formal combination of the research results from multiple studies -- is widely used, but with little general understanding of its limitations and uncertainties. There is something quite appealing about collecting all the available research on some question and reducing it to a single figure or a single confidence interval. When properly used, this approach can be useful. However, there is broad evidence that the results of meta-analyses are often not very reliable. LeLorier et al. (1) have shown that many meta-analyses do not agree with the results of subsequent large, randomized trials, and there is little reason to believe that those trials are consistently wrong.
In a review published a few years ago (2) I cited five meta-analyses that produced conclusions that were questionable for a variety of reasons. These included lack of understanding on the part of the meta-analysts of the scientific subject in question or, conversely, lack of understanding on the part of the experts in the scientific subject of the procedure for meta-analysis; failure to consider a host of relevant covariates; and frank bias on the part of the meta-analysis team. Another common problem is lack of homogeneity. When an effect exists, its size may vary substantially from one population to another, such that no combined estimate can have much meaning. (For example, if the rate of some disease is 5 percent among men and 1 percent among women, does it make sense to find that the rate is 3 percent for a person of "average" sex?)
Finally, research studies are not all of high quality, and there is no good way to adjust meta-analyses for variations in quality. Some authors have prepared checklists that can be reduced to a quality score. Studies are commonly weighted according to their quality scores, but the practice is not universal, and even when formal scoring systems are used, poor studies are often weighted too heavily. If some reports are given a quality score of 95 or 100 (of a possible 100), does it make sense for a meta-analysis to include studies scored as 50 and give them 50 percent of the weight given to a nearly perfect study?
Meta-analysis is commonly designed as a series of operations. First, the problem must be stated in terms that can be studied (this sometimes is the hardest step). Second, all the available sources of potentially relevant data must be found and the reports collected. Third, each report is evaluated and an individual summary measure derived (for example, the incidence rate of a disease or an odds ratio). Fourth, the collection of summary measures is interpreted, and a single "best estimate" is derived. Finally, the findings of the meta-analysis are presented. Of these, the fourth step is the most controversial, and because of its limitations, it is sometimes omitted.
In this issue of the Journal, He et al. (3) report a meta-analysis of epidemiologic studies of the relation between coronary heart disease and passive smoking (also known as exposure to environmental tobacco smoke). With regard to this important subject, there is no reliable substitute for epidemiologic research, for several reasons: responses in animals may not be like those in humans, laboratory studies involving human subjects must necessarily be of short duration, and reports of clinical series are subject to a range of serious biases. Can meta-analysis of epidemiologic studies on this topic provide a more reliable conclusion than a thoughtful review of the usual type? There are reasons to think that it cannot.
The first reason is the quality of the data. He et al. (3) found an association between coronary heart disease and environmental tobacco smoke, but most studies of lung cancer and this risk factor have likewise reported a positive association, and those findings have been received with some skepticism because of concern about the quality of the data. Among the reasons for concern are a possible tendency of nonsmokers with lung cancer to look for some external reason (for instance, smoking by a spouse or coworker) for an otherwise inexplicable disease, inaccuracies in the reporting of exposure to environmental tobacco smoke, and reluctance to report a personal history of smoking. He et al. gave little consideration to such possible problems with the quality of the studies they analyzed. Surely not all those studies were perfect.
A second reason for concern is the procedure for meta-analysis itself. The published literature on some topics may reflect the greater likelihood of publication of positive results than of negative results. When study-to-study randomness is considered, the lack of publication of negative studies can sometimes be inferred by analyzing the probability distribution of the results of the studies that have been published. If only the positive part of the probability distribution is represented in the literature, it can be inferred that small negative studies may not have been reported. He et al. (3) examined this matter and obtained a P value that did not indicate statistical significance but that did not exclude the possibility of publication bias. The absence of proof of such bias is not proof of its absence. Analysis of a total of 18 studies, as in this case, can hardly provide much statistical power to detect publication bias.
The authors do not comment on the remarkable uniformity of the findings of the 18 studies, despite the large variations in study design, methods, and populations. For example, if environmental tobacco smoke causes coronary heart disease, why are estimates of this effect from studies that include exposure in the workplace about the same as those from studies that do not? Figure 1 in the report by He et al. shows that study-by-study "best estimates" of the relative risk of coronary heart disease associated with environmental tobacco smoke ranged from slightly over 1.0 to about 2.2. This seems to be a very small range, considering the random variations present in the samples, most of which were small; the large differences in both the methods and the populations examined; the likelihood of confounding, for which there was no adjustment; and the failure to consider the "dosage" of environmental tobacco smoke. A great deal of uniformity among the results of independent studies of a particular phenomenon is not necessarily good: it can suggest consistency in bias rather than consistency in real effects.
Interpretation of Figure 2 in the article is difficult because the reported "linear trend" apparently included analysis of data from persons with zero exposure to environmental tobacco smoke. In view of the potential sources of bias noted above, and in view of the possibility that the never-exposed group had a disproportionally high percentage of persons from population segments generally more careful about health-related behavior (including some religious groups), these data would be more convincing if they showed a significant trend of higher risk with higher degrees of exposure, without including the never-exposed groups.
The authors compared the risk of coronary heart disease in exposed and nonexposed persons in terms of relative risks, but they did not defend their use of that statistical measure or show that it is compatible with their findings. This approach implies a multiplicative model (in which risk factors are multiplied rather than, say, added), but why should we expect a complex biologic relation to follow this type of model rather than a model that is linear, or otherwise not multiplicative? In general, mathematical convenience is a common but weak reason for studying relative risks (or odds ratios, their surrogates) or any other specific mathematical model.
Perhaps the most troubling aspect of these results is the size of the effect reported. Is an increase in the incidence of coronary heart disease of 25 percent associated with passive smoking compatible with the generally reported increase of about 75 percent among active smokers (a threefold difference)? I find it hard to understand how environmental tobacco smoke, which is far more dilute than actively inhaled smoke, could have an effect that is such a large fraction of the added risk of coronary heart disease among active smokers. Some estimates of the relative risk of lung cancer in association with environmental tobacco smoke are also about 25 percent, but the risk among active smokers is increased by about 1200 percent over that among nonsmokers. This finding leads to the more plausible conclusion that the added risk of lung cancer that is due to environmental tobacco smoke may be about 2 percent of the risk associated with active smoking.
The clear effects of active smoking on coronary heart disease give us good reason to think that passive smoking might have a similar but much smaller effect. The meta-analysis reported by He et al. (3) meets the accepted technical criteria for meta-analysis, but it suffers from problems inherent in the method, such as deficiencies in the data analyzed. Therefore, I regretfully conclude that we still do not know, with accuracy, how much or even whether exposure to environmental tobacco smoke increases the risk of coronary heart disease.
John C. Bailar III, M.D., Ph.D.
University of Chicago
Chicago, IL 60637