When is randomization used
Every subject is as likely as any other to be assigned to the treatment or control group. Randomization is generally achieved by employing a computer program containing a random number generator. Randomization procedures differ based upon the research design of the experiment. Individuals or groups may be randomly assigned to treatment or control groups. Some research designs stratify subjects by geographic, demographic or other factors prior to random assignment in order to maximize the statistical power of the estimated effect of the treatment e.
Information about the randomization procedure is included in each experiment summary on the site. What are the advantages of randomized experimental designs? Randomized experimental design yields the most accurate analysis of the effect of an intervention e. By randomly assigning subjects to be in the group that receives the treatment or to be in the control group, researchers can measure the effect of the mobilization method regardless of other factors that may make some people or groups more likely to participate in the political process.
Kalish and Begg [ 57 ] state that the major objective of a comparative clinical trial is to provide a precise and valid comparison.
To achieve this, the trial design should be such that it: 1 prevents bias; 2 ensures an efficient treatment comparison; and 3 is simple to implement to minimize operational errors. Table 1 elaborates on these considerations, focusing on restricted randomization procedures for randomized trials.
Before delving into a detailed discussion, let us introduce some important definitions. Treatment balance and allocation randomness are two competing requirements in the design of an RCT. Restricted randomization procedures that provide a good tradeoff between these two criteria are desirable in practice.
Greater forcing of balance implies lack of randomness. A procedure that lacks randomness may be susceptible to selection bias [ 16 ], which is a prominent issue in open-label trials with a single center or with randomization stratified by center, where the investigator knows the sequence of all previous treatment assignments.
A classic approach to quantify the degree of susceptibility of a procedure to selection bias is the Blackwell-Hodges model [ 28 ]. This quantity is zero for CRD, and it is positive for restricted randomization procedures greater values indicate higher expected bias. Matts and Lachin [ 30 ] suggested taking expected proportion of deterministic assignments in a sequence as another measure of lack of randomness. In the literature, various restricted randomization procedures have been compared in terms of balance and randomness [ 50 , 58 , 59 ].
For instance, Zhao et al. The key criteria were the maximum absolute imbalance and the correct guess probability. Similar findings confirming the utility of the big stick design were recently reported by Hilgers et al. Validity of a statistical procedure essentially means that the procedure provides correct statistical inference following an RCT. In particular, a chosen statistical test is valid, if it controls the chance of a false positive finding, that is, the pre-specified probability of a type I error of the test is achieved but not exceeded.
The strong control of type I error rate is a major prerequisite for any confirmatory RCT. Efficiency means high statistical power for detecting meaningful treatment differences when they exist , and high accuracy of estimation of treatment effects.
Both validity and efficiency are major requirements of any RCT, and both of these aspects are intertwined with treatment balance and allocation randomness. Restricted randomization designs, when properly implemented, provide solid ground for valid and efficient statistical inference.
However, a careful consideration of different options can help an investigator to optimize the choice of a randomization procedure for their clinical trial. Let us start with statistical efficiency. Equal allocation frequently maximizes power and estimation precision. When the primary outcome follows a more complex statistical model, optimal allocation may be unequal across the treatment groups; however, allocation is still nearly optimal for binary outcomes [ 62 , 63 ], survival outcomes [ 64 ], and possibly more complex data types [ 65 , 66 ].
Therefore, a randomization design that balances treatment numbers frequently promotes efficiency of the treatment comparison. As regards inferential validity, it is important to distinguish two approaches to statistical inference after the RCT — an invoked population model and a randomization model [ 10 ].
For a given randomization procedure, these two approaches generally produce similar results when the assumption of normal random sampling and some other assumptions are satisfied, but the randomization model may be more robust when model assumptions are violated; e.
Another important issue that may interfere with validity is selection bias. Some authors showed theoretically that PBDs with small block sizes may result in serious inflation of the type I error rate under a selection bias model [ 69 , 70 , 71 ]. However, for already completed studies with evidence of selection bias [ 72 ], special statistical adjustments are warranted to ensure validity of the results [ 73 , 74 , 75 ].
With the current state of information technology, implementation of randomization in RCTs should be straightforward. Validated randomization systems are emerging, and they can handle randomization designs of increasing complexity for clinical trials that are run globally. However, some important points merit consideration.
The first point has to do with how a randomization sequence is generated and implemented. One should distinguish between advance and adaptive randomization [ 16 ]. While enumeration of all possible sequences and their probabilities is feasible and may be useful for trials with small sample sizes, the task becomes computationally prohibitive and unnecessary for moderate or large samples.
In practice, Monte Carlo simulation can be used to approximate the probability distribution of the reference set of all randomization sequences for a chosen randomization procedure. A limitation of advance randomization is that a sequence of treatment assignments must be generated upfront, and proper security measures e. This is referred to as the Markov property [ 77 ], which makes a procedure easy to implement sequentially.
Some restricted randomization procedures, e. The second point has to do with how the final data analysis is performed. With an invoked population model, the analysis is conditional on the design and the randomization is ignored in the analysis. With a randomization model, the randomization itself forms the basis for statistical inference.
Reference [ 14 ] provides a contemporaneous overview of randomization-based inference in clinical trials. Several other papers provide important technical details on randomization-based tests, including justification for control of type I error rate with these tests [ 22 , 78 , 79 ].
In practice, Monte Carlo simulation can be used to estimate randomization-based p- values [ 10 ]. The design of any RCT starts with formulation of the trial objectives and research questions of interest [ 3 , 31 ]. The choice of a randomization procedure is an integral part of the study design. A structured approach for selecting an appropriate randomization procedure for an RCT was proposed by Hilgers et al. Here we outline the thinking process one may follow when evaluating different candidate randomization procedures.
We start with some general considerations which determine the study design:. For small or moderate studies, exact attainment of the target numbers per group may be essential, because even slight imbalance may decrease study power.
Therefore, a randomization design in such studies should equalize well the final treatment numbers. For large trials, the risk of major imbalances is less of a concern, and more random procedures may be acceptable. The length of the recruitment period and the trial duration. Many studies are short-term and enroll participants fast, whereas some other studies are long-term and may have slow patient accrual.
In the latter case, there may be time drifts in patient characteristics, and it is important that the randomization design balances treatment assignments over time. Level of blinding masking : double-blind, single-blind, or open-label. In double-blind studies with properly implemented allocation concealment the risk of selection bias is low. By contrast, in open-label studies the risk of selection bias may be high, and the randomization design should provide strong encryption of the randomization sequence to minimize prediction of future allocations.
Number of study centers. Many modern RCTs are implemented globally at multiple research institutions, whereas some studies are conducted at a single institution. In the latter case, especially in single-institution open-label studies, the randomization design should be chosen very carefully, to mitigate the risk of selection bias. An important point to consider is calibration of the design parameters. By fine-tuning these parameters, one can obtain designs with desirable statistical properties.
For instance, references [ 80 , 81 ] provide guidance on how to justify the block size in the PBD to mitigate the risk of selection bias or chronological bias.
The calibration of design parameters can be done using Monte Carlo simulations for the given trial setting. Another important consideration is the scope of randomization procedures to be evaluated. This should be done judiciously, on a case-by-case basis, focusing only on the most reasonable procedures. References [ 50 , 58 , 60 ] provide good examples of simulation studies to facilitate comparisons among various restricted randomization procedures for a RCT.
In parallel with the decision on the scope of randomization procedures to be assessed, one should decide upon the performance criteria against which these designs will be compared. Among others, one might think about the two competing considerations: treatment balance and allocation randomness.
These measures can be either calculated analytically when formulae are available or through Monte Carlo simulations. It is also helpful to visualize the selected criteria. Visualizations can be done in a number of ways; e. Such visualizations can help evaluate design characteristics, both overall and at intermediate allocation steps.
Another way to compare the merits of different randomization procedures is to study their inferential characteristics such as type I error rate and power under different experimental conditions.
Sometimes this can be done analytically, but a more practical approach is to use Monte Carlo simulation. The choice of the modeling and analysis strategy will be context-specific.
Here we outline some considerations that may be useful for this purpose:. Data generating mechanism. To simulate individual outcome data, some plausible statistical model must be posited. The form of the model will depend on the type of outcomes e. True treatment effects. Randomization designs to be compared. The choice of candidate randomization designs and their parameters must be made judiciously.
Data analytic strategy. For any study design, one should pre-specify the data analysis strategy to address the primary research question.
Statistical tests of significance to compare treatment effects may be parametric or nonparametric, with or without adjustment for covariates. The approach to statistical inference: population model-based or randomization-based. These two approaches are expected to yield similar results when the population model assumptions are met, but they may be different if some assumptions are violated.
Randomization-based tests following restricted randomization procedures will control the type I error at the chosen level if the distribution of the test statistic under the null hypothesis is fully specified by the randomization procedure that was used for patient allocation. This is always the case unless there is a major flaw in the design such as selection bias whereby the outcome of any individual participant is dependent on treatment assignments of the previous participants.
Overall, there should be a well-thought plan capturing the key questions to be answered, the strategy to address them, the choice of statistical software for simulation and visualization of the results, and other relevant details. In this section we present four examples that illustrate how one may approach evaluation of different randomization design options at the study planning stage.
These 12 procedures can be grouped into five major types. I Procedures 1, 2, 3, and 4 achieve exact final balance for a chosen sample size provided the total sample size is a multiple of the block size. III Procedures 7 and 8 are biased coin designs that sequentially adjust randomization according to imbalance measured as the difference in treatment numbers.
V Procedure 12 CRD is the most random procedure that achieves balance for large samples. We first compare the procedures with respect to treatment balance and allocation randomness. At the other extreme, we have PBD 2 for which every odd allocation is made with probability 0. Different randomization procedures can be compared graphically. Figure 1 is a plot of expected absolute imbalance vs.
Simulated expected absolute imbalance vs. Figure 2 is a plot of expected proportion of correct guesses vs. One can observe that for CRD it is a flat pattern at 0. Rand exhibits an increasing pattern with overall fewer correct guesses compared to other randomization procedures. For the three GBCD procedures, there is a rapid initial increase followed by gradual decrease in the pattern; this makes good sense, because GBCD procedures force greater balance when the trial is small and become more random and less prone to correct guessing as the sample size increases.
Simulated expected proportion of correct guesses vs. The other ten designs are closer to 0,0. Simulated forcing index x-axis vs. BSD 3 seems to provide overall best tradeoff between randomness and balance throughout the study. The procedures are ordered by value of d 50 , with smaller values more red indicating more optimal performance.
Our next goal is to compare the chosen randomization procedures in terms of validity control of the type I error rate and efficiency power. We shall explore the following four models:. This corresponds to a standard setup for a two-sample t-test under a population model. In this model, the outcomes are affected by a linear trend over time [ 67 ]. In this setup, we have a misspecification of the distribution of measurement errors. In this setup, at each allocation step the investigator attempts to intelligently guess the upcoming treatment assignment and selectively enroll a patient who, in their view, would be most suitable for the upcoming treatment.
T3: Randomization-based test based on ranks : This test procedure follows the same logic as T2, except that the test statistic is calculated based on ranks. Figure 5 summarizes the results of a simulation study comparing 12 randomization designs, under 4 models for the outcome M1, M2, M3, and M4 , 4 scenarios for the mean treatment difference Null, and Alternatives 1, 2, and 3 , using 3 statistical tests T1, T2, and T3. The operating characteristics of interest are the type I error rate under the Null scenario and the power under the Alternative scenarios.
Simulated type I error rate and power of 12 restricted randomization procedures. Four scenarios for the treatment mean difference Null; Alternatives 1, 2, and 3. Three statistical tests T1: two-sample t-test; T2: randomization-based test using mean difference; T3: randomization-based test using ranks.
From Fig. In other words, when population model assumptions are satisfied, any combination of design and analysis should work well and yield reliable and consistent results. These results are consistent with some previous findings in the literature [ 67 , 68 ]. As regards power, it is reduced significantly compared to the normal random sampling scenario. The t-test seems to be most affected and the randomization-based test using ranks is most robust for a majority of the designs.
Remarkably, for CRD the power is similar with all three tests. This signifies the usefulness of randomization-based inference in situations when outcome data are subject to a linear time trend, and the importance of applying randomization-based tests at least as supplemental analyses to likelihood-based test procedures.
As regards power, all designs also have similar, consistently degraded performance: the t-test is least powerful, and the randomization-based test using ranks has highest power. Overall, under misspecification of the error distribution a randomization-based test using ranks is most appropriate; yet one should acknowledge that its power is still lower than expected.
For eleven other procedures, inflations of the type I error were observed. In general, the more random the design, the less it was affected by selection bias. These results are consistent with the theory of Blackwell and Hodges [ 28 ] which posits that TBD is least susceptible to selection bias within a class of restricted randomization designs that force exact balance.
Finally, under M4, statistical power is inflated by several percentage points compared to the normal random sampling scenario without selection bias. The magnitude of the type I error inflation is different across the restricted randomization designs; e. For the chosen experimental scenarios, we evaluated CRD and several restricted randomization procedures, some of which belonged to the same class but with different values of the parameter e.
Based on these criteria, we found that BSD 3 provides overall best performance. We also evaluated type I error and power of selected randomization procedures under several treatment response models. We have observed important links between balance, randomness, type I error rate and power.
It is beneficial to consider all these criteria simultaneously as they may complement each other in characterizing statistical properties of randomization designs. In particular, we found that a design that lacks randomness, such as PBD with blocks of 2 or 4, may be vulnerable to selection bias and lead to inflations of the type I error.
Therefore, these designs should be avoided, especially in open-label studies. As regards statistical power, since all designs in this example targeted allocation ratio which is optimal if the outcomes are normally distributed and have between-group constant variance , they had very similar power of statistical tests in most scenarios except for the one with chronological bias.
In the latter case, randomization-based tests were more robust and more powerful than the standard two-sample t-test under the population model assumption. Overall, while Example 1 is based on a hypothetical RCT, its true purpose is to showcase the thinking process in the application of our general roadmap. The following three examples are considered in the context of real RCTs. Selection bias can arise if the investigator can intelligently guess at least part of the randomization sequence yet to be allocated and, on that basis, preferentially and strategically assigns study subjects to treatments.
Although it is generally not possible to prove that a particular study has been infected with selection bias, there are examples of published RCTs that do show some evidence to have been affected by it.
Suspect trials are, for example, those with strong observed baseline covariate imbalances that consistently favor the active treatment group [ 16 ].
In what follows we describe an example of an RCT where the stratified block randomization procedure used was vulnerable to potential selection biases, and discuss potential alternatives that may reduce this vulnerability. Etanercept was studied in patients aged 4 to 17 years with polyarticular juvenile rheumatoid arthritis [ 85 ].
The trial consisted of two parts. During the first, open-label part of the trial, patients received etanercept twice weekly for up to three months. Responders from this initial part of the trial were then randomized, at a ratio, in the second, double-blind, placebo-controlled part of the trial to receive etanercept or placebo for four months or until a flare of the disease occurred.
The primary efficacy outcome, the proportion of patients with disease flare, was evaluated in the double-blind part. Regulatory review by the Food and Drug Administrative FDA identified vulnerability to selection biases in the study design of the double-blind part and potential issues in study conduct.
These findings were succinctly summarized in [ 16 ] pp. While this appears to be an attempt to improve treatment balance in this small trial, unblinding of one treatment assignment may lead to deterministic predictability of three upcoming assignments. While the double-blind nature of the trial alleviated this concern to some extent, it should be noted that all patients did receive etanercept previously in the initial open-label part of the trial.
Chances of unblinding may not be ignorable if etanercept and placebo have immediately evident different effects or side effects. The randomized withdrawal design was appropriate in this context to improve statistical power in identifying efficacious treatments, but the specific randomization procedure used in the trial increased vulnerability to selection biases if blinding cannot be completely maintained.
There were also some patients randomized out of order. We next proceed with a virtual allocation to the recruitment hospitals Sites 1 and 2 , sex male and female , and age band under 20 years, 20—64 years, and 65 years or older as prognostic factors. This study has two groups: a treatment group and a control group.
Assume that the first subject male, years-old was recruited from Site 2. Because this subject is the first one, the allocation is determined by simple randomization. Further, assume that the subject is allocated to a treatment group. In this group, scores are added to Site 2 of the recruiting hospital, sex Male, and the 20—64 age band Table 1. Next, assume that the second subject female, years-old was recruited through Site 2.
Calculate the total number of imbalances when this subject is allocated to the treatment group and to the control group. Add the appropriate scores to the area within each group, and sum the differences between the areas. The score in each factor is 0. The first patient sex male, 52 yr, from site 2 is allocated to the treatment group through simple randomization.
Therefore, site 2, sex male, and the 20—64 years age band in the treatment group receive the score. First, the total number of imbalances when the subject is allocated to the control group is. The total number of imbalances when the subject is allocated to the treatment group is.
Next, the third subject Site 1, Sex male, years-old is recruited. The second patient has factors sex female, 25 yr, and site 2. If this patient is allocated to the control group, the total imbalance is 1. If this patient is allocated to the treatment group, the total imbalance is 5.
Therefore, this patient is allocated to the control group, and site 2, sex female, and the 20—64 years age band in the control group receive the score. Now, the total number of imbalances when the subject is allocated to the control group is.
The subjects are allocated and scores added in this manner. Now, assume that the study continues, and the 15th subject female, years-old is recruited from Site 2. The third patient has factors sex male, 17 yr, and site 1. If this patient is allocated to the control group, the total imbalance is 2. If this patient is allocated to the treatment group, the total imbalance is 4.
Here, the total number of imbalances when the subject is allocated to the control group is. If the total number of imbalances during the minimization technique is the same, the allocation is determined by simple randomization. The 15th patient has factors sex female, 74 yr, and site 2.
If this patient is allocated to the control group, the total imbalance is 3. Although minimization is designed to overcome the disadvantages of stratified randomization, this method also has drawbacks. A concern from a statistical point of view is that it does not satisfy randomness, which is the basic assumption of statistical inference [ 15 , 16 ].
For this reason, the analysis of covariance or permutation test are proposed [ 13 ]. The calculation process is complicated, but can be carried out through various programs. So far, the randomization methods is assumed that the variances of treatment effects are equal in each group. Thus, the number of subjects in both groups is determined under this assumption. However, when analyzing the data accruing as the study progresses, what happens if the variance in treatment effects is not the same?
In this case, would it not reduce the number of subjects initially determined rather than the statistical power? In other words, should the allocation probabilities determined prior to the study remain constant throughout the study? Alternatively, is it possible to change the allocation probability during the study by using the data accruing as the study progresses? If the treatment effects turn out to be inferior during the study, would it be advisable to reduce the number of subjects allocated to this group [ 17 , 18 ]?
An example of response-adaptive randomization is the randomized play-the-winner rule. That is, this method is based on statistical reasoning that is not possible under a fixed allocation probability and on the ethics of allowing more patients to be allocated to treatments that benefit the patients.
However, the method can lead to imbalances between the treatment groups. In addition, if clinical studies take a very long time to obtain the results of patient responses, this method cannot be recommended. As noted earlier, RCT is a scientific study design based on the probability of allocating subjects to treatment groups in order to ensure comparability, form the basis of statistical inference, and identify the effects of treatment.
However, an ethical debate needs to examine whether the treatment method for the subjects, especially for patients, should be determined by probability rather than by the physician. Nonetheless, the decisions should preferably be made by probability because clinical trials have the distinct goals of investigating the efficacy and safety of new medicines, medical devices, and procedures, rather than merely reach therapeutic conclusions.
The purpose of the study is therefore to maintain objectivity, which is why prejudice and bias should be excluded. That is, only an unconstrained attitude during the study can confirm that a particular medicine, medical device, or procedure is effective or safe.
Consider this from another perspective. If the researcher maintains an unconstrained attitude, and the subject receives all the information, understands it, and decides to voluntarily participate, is the clinical study ethical? Unfortunately, this is not so easy to answer. Participation in a clinical study may provide the subject with the benefit of treatment, but it could be risky.
Furthermore, the subjects may be given a placebo, and not treatment. It can be achieved by use of random number tables given in most statistical textbooks or computers can also be used to generate random numbers for us.
If neither of these available, you can devise your own plan to perform randomization. For example, you can select the last digit of phone numbers given in a telephone directory. For example you have different varieties of rice grown in10 total small plots in a greenhouse and you want to evaluate certain fertilizer on 9 varieties of rice plants keeping one plot as a control.
You can number each of the small plots up to 9 and then you can use series of numbers like 8 6 3 1 6 2 9 3 5 6 7 5 5 3 1 and so on. You can then allocate each of three doses of fertilizer treatment call them doses A, B, C. Now you can apply dose A to plot number 8, B to 6, and C to 3. Then you apply dose A to 1, B to 2 because dose B is already used on plot 6 and so on.
Blinding is commonly employed in clinical research setting and used to further eliminate bias. There are two types of blinding as under:. Bias is the most unwanted element in randomized controlled trials and randomization give researchers an excellent tool to reduce or eliminate bias to maximum.
Absence of bias means more reliable the results of study are and gives legitimacy to both research and researchers as well. Retrieved Nov 11, from Explorable.
0コメント