|
Equivocal equivalence
15/07/98
For the past year we have been submitting the attached paper to all major medical journals.
We have not received specific criticism, but only kind replies making us understand that the issue
is not really interesting.
We respect of the peer review system and are ready to acknowledge that our approach to the
topic may have made it seem unattractive. Nevertheless we believe that the matter has important
implications. Therefore, although we would have preferred to see our opinion challenged and a
debate opened in its more natural context, we are making this paper public on the Internet in the
hope of arousing comments and suggestions.
Vittorio Bertele', MD
Valter Torri, MD
Silvio Garattini, MD
Vittorio Bertele’, MD, Valter Torri, MD, Silvio Garattini, MD.
"Mario Negri" Institute, Milan, and Consorzio "Mario Negri Sud", Santa Maria Imbaro, Italy.
Corresponding author
Silvio Garattini, "Mario Negri" Institute, 20157 Milan, via Eritrea 62
Tel. +39.0239014.350 - Fax: +39.023546277 - E-mail:
Garattini@marionegri.it
Running title
equivalence trials
Key words
randomised clinical trials, equivalence, efficacy, safety, ethics
Progress in certain areas of medicine has reduced morbidity and mortality to such an extent
as to require increasingly large numbers of patients to obtain proof of a better treatment. Thus,
it becomes more and more difficult and expensive to design experiments to demonstrate that new
drugs are better than the ones already available. New drugs - mostly me-too drugs - are
consequently registered on the basis of supposed equivalence, claiming, but generally not proving,
a better profile of adverse reactions.
Equivalent interventions may be useful to patients, who may indeed benefit from the
possibility of choosing among drugs as safe and effective as the standard comparator, but more
tolerable, more practical, or just cheaper. Such treatments would be worthwhile for public health
services too and should therefore merit careful attention from regulatory authorities. Concern has
recently been expressed 1 for the difficulties to evidentiate equivalence between treatments.
However, equivalence mostly raises doubts about its own adequacy as a clinical criterion, about the
appropriateness of its measures and of its applicability.
How can equivalence be assessed?
Principles and methodological requirements of equivalence trials have been carefully defined
2. As absolute equivalence cannot be demonstrated, we can now assess at least whether an observed
difference does not exceed a predefined limit, meaning whether the confidence intervals of the
difference lie within a given range (from +D to -D). This difference, established as equivalence
criterion, must be small enough to assure that the effects of the new treatment and the standard
one are clinically indistinguishable.
Assessing such an effect needs sample sizes large enough to limit the risk either of assuming
an equivalence that in fact does not exist (a error), or of not seeing it when in fact it does
exist (b error). Generally accepted limits of a and b error are 2.5-5.0% (a=0.025-0.05) and 10-20%
(1-b=0.90-0.80).
The small error - confidence interval - required for equivalence trials theoretically
involves populations as large as, or even larger than, those enrolled in efficacy trials. In
practice, however, this it is not so for several reasons. One is that the first documentation of
efficacy often requires more confirmatory studies.
For instance, the test of hypothesis (a) illustrated in the Figure - concerning the potential
of a current standard treatment to reduce mortality from 6.5 to 5.0% - might have involved 20 or 30
thousand patients instead of the 10,000 strictly needed. Also, equivalence trials need smaller
samples since their objective is merely to verify that the new treatment is not worse than the
standard one, not to establish that it is similar to or better (compare the samples required for
the one sided and the two sided tests of hypotheses (b) to (e) in the Figure).
Moreover, if one envisages (or claims) some advantage of the new drug over the old one, the
sample needed becomes even smaller. Compare, for example, hypotheses (b) to (e) in the Figure: the
greater the hoped-for benefit of the test drug the fewer the patients needed to prove that it is
"as effective as" or even "not worse than" the old one.
Under these circumstances the formal test of the alleged superiority would require ten times
more patients (see (f) vs (e) in the Figure).
How far does equivalence assure efficacy?
Most recommended criteria have been already adopted by regulatory authorities. Still,
sometimes their practical application discloses areas of uncertainty and produces questionable
findings.
Let us assume, for instance, that a new drug becomes available for the treatment of our
theoretical disease that with the standard treatment is burdened with 5% mortality. It is supposed
to be a valuable alternative to standard therapy because of its ease of administration and the
alleged clinical equivalence.
So, since standard treatment on its own has reduced absolute mortality by 1.5% with an upper
limit approaching 1.5% (i.e., with conventional statistical significance), the upper limit of
mortality with the new drugs is conservatively set at 6%, that is 0.5% less than the worst result
expected with the standard therapy (see (b) to (e) in the Figure). What better could one want?
However, such a hypothesis considers it acceptable that the new treatment, which at most will offer
the same protection as current therapies, could actually throw away 2/3 of the advantage (1% out of
1.5%) so far ensured by them (see in the Figure the upper limit of (b) to (e) approaching 6%
mortality compared to the reduction - from 6.5 to 5.0% - observed in (a)).
Once approved by the regulatory agency and administered - say - to one million people each
year, the new drug could still save 15,000 lifes, reducing deaths from 65,000 to 50,000 (that is,
from 6.5 to 5.0%), like the old therapy. It might even save more lives than the comparator, but we
shall never know for sure.
What we must realize is that the new treatment might avoid only 5,000 deaths (reducing them
from 65,000 to 60,000, i.e. from 6.5 to 6.0%, or the upper limit of (b) to (e) in the Figure). This
would still be an advantage compared to the worst result with the standard treatment (nearly 65,000
deaths, i.e. the upper limit of (a) approaching 6.5% mortality), but it would nevertheless be
10,000 more deaths than were actually observed in the formal trial of this latter (50,000, i.e. the
5.0% mortality observed in (a)) and as many as 25,000 more deaths compared to the best estimated
benefit attributable to it (deaths limited to 35,000, that is the 3.5% mortality or the lower limit
of (a)).
In addition, sometimes mortality in itself is hardly a satisfactory parameter of equivalence.
If we were dealing, for instance, with a new antithrombotic agent in the treatment of acute
myocardial infarction, an excess of disabling strokes associated with the test drug could unbalance
its equivalence in avoiding deaths; likewise, an excess of rescue coronary angioplasties could
vanify the perspective of cost saving. It can be argued that such effectiveness criteria would
better apply to most comparative trials as well, nowadays; but in equivalence trials they are
essential.
Different patterns of causes of death should be taken into consideration as well, since they
can serve as indicators of morbidity specifically related to the new treatment. For instance, the
death rate with the equivalent treatment may be the same as with the standard one, but it is
important to know whether infection-, stroke- or tumor-related deaths replace (some of) the
coronary deaths occurring with the standard treatment.
How far does equivalence ensure safety?
Even among drugs belonging to the same therapeutic class, small changes in the chemical
structure may change the profile of adverse reactions. But how can equivalence trials - smaller
than those investigating efficacy - detect adverse events usually rarer than those selected as
outcome measures of the study? In the sample trial illustrated in the Figure - take as examples
hypotheses (d) and (e) - adverse events occurring twice as frequently with the test drug as with
the standard one (e.g. 1.0 vs 0.5%) would appear in 30 instead of 15 patients (d) or in 8 instead
of 4 (e).
Such differences might easily be inappropriately attributed to chance, but they are still a
major burden for the trial population, mostly in the absence of clinical advantages, and all the
more so once applied to the general population. If given a marketing authorisation and administered
to one million people each year, the new drug could cause episodes such as bleeding, hypotension,
hepatic or renal dysfunction, etc, in 5,000 more patients with - at best - no proven advantage on
the index pathology.
How far is equivalence acceptable?
Such an unsettled context makes us wonder whether equivalence trials are ethical, in that
they randomize half the patients to a treatment which by definition is assumed to be somewhat worse
than the comparator (the null hypothesis of equivalence trials) in order - it is hoped - to show at
last that this is not the case.
In substance the agreement with patients enrolled in such studies would run as follows: "
Let us treat you with something that at best is the same as what you would have had before, but
might also reduce - though this is unlikely - most of the advantages previously attained in your
condition. It might even benefit you more than any current therapy but, should that actually
happen, we shall not be able to prove it. Nor have we enough chance to let you know whether the new
treatment may somehow bother or even harm you more than the standard one".
Patients would be understandably worried at these prospects and could quite rightly refuse
informed consent: and so should ethical committees.
Who wants equivalence?
Besides any clinical consideration, experience shows that new drugs - even the me-too ones -
are very seldom less costly than available therapies.
Briefly, therefore, equivalence apparently offers scant advantages - if any - to patients and
health services. Presumably, however, equivalence serves the pharmaceutical industry more than
anyone else. Indeed, the right to a slice of the market is just what is needed to sell ideas that
took too long to develop or even were conceived too late. It seems illogical that companies
arriving on the market late should have the right to benefits that by definition cannot be enjoyed
by the company that developed the original innovative drug. By rewarding companies that take
advantage of the fact that less research is needed to develop their drug and to document its
clinical efficacy, equivalence will probably achieve the double negative goal of encouraging
strategies not in the interest of public health and, unavoidably, of discouraging any effort for
innovative research. Thanks to this policy, the market could shortly be flooded with hundreds of
"equivalent" pharmaceutical products but only a few more of the drugs that patients really need.
In conclusion, achievement of and testing equivalence is going to take up time and resources
which might well be better devoted to patients’ interests. Although several methodological issues
have been carefully addressed, many questions still puzzle clinicians and regulators. A vigorous
debate is needed among all parties concerned - patients, clinical investigators, pharmaceutical
industry, regulatory agencies - in order to stimulate everybody’s awareness of the potential and
limits of this fashionable criterion of clinical efficacy.
References
- Ware JH, Antman EM. Equivalence trials. N Engl J Med 1997;337:1159-61.
- Jones B, Jarvis P, Lewis JA et al. Trials to assess equivalence: the importance of rigorous
methods. Br M J 1996;313:36-39.
Figure
Schematic representation of hypotheses addressing equivalence (b) to (e) or superiority (f)
of a new treatment in respect to the standard therapy (a). The sample size required to assess each
hypothesis is shown. Meanings and implications of different approaches to equivalence documentation
are discussed in the text.
|