In randomized controlled clinical trials, covariates exist in addition to the treatment variable. Without effective control during the study design stage or without proper adjustment during statistical analysis, the covariates may lead to reduced testing power or a biased estimation of the treatment effect. Therefore, the handling of covariates in randomized controlled clinical trials should be carefully considered.

In this guideline, covariates refer to those that are observed prior to the intervention (usually prior to randomization) and are expected to be associated with the primary endpoint. The purpose of covariate adjustment is that, for any subject, the expected treatment effect, should the subject be randomized to the test or control group, should be independent of the observed value of the covariate. Due to randomization, the probability distribution of each covariate in a randomized controlled trial is the same in the control group and the test group, and hence, any observed imbalance between treatment groups should be attributed to random sampling error. Therefore, the primary objective of covariate adjustment in randomized controlled trials is to reduce the redundant variability within the endpoint variables that is unrelated to the treatment variable, hence providing a more accurate estimate of the treatment effect. Covariates may be continuous, ordinal, or categorical. Demographic indicators (such as age or body weight), disease characteristics (such as disease progression or severity), prognostic factors, pathological results, physiological factors, genetic factors, sociological factors (such as economic status, occupation, education level) and research centers or investigators may all be covariates. Additionally, the baseline value of the primary efficacy variable may be a very important covariate.

In clinical trials, to ensure the representativeness of enrolled subjects to the target population, the value of the covariates of trial subjects often varies within a certain range. In particular, when the covariate has a large effect on the endpoint variable, the degree of variability of the endpoint variable will increase, resulting in an increase in the error of the efficacy estimation and a decrease in the power of the corresponding tests of the hypothesis. Therefore, identifying and controlling potential covariates and scientifically and appropriately analyzing the relationship between the treatment variable and study endpoints are key issues in clinical trials.

This guideline aims to illustrate the principles for handling covariates in confirmatory randomized controlled clinical trials and provide recommendations on the handling and interpretation of important covariates in trial design, statistical analysis and clinical trial reporting.

2. CONSIDERATION OF COVARIATES IN TRIAL DESIGN

In clinical trials, the consideration of covariate adjustment starts at the trial design stage and needs to be specified in advance in the study protocol. A number of covariates may be associated with the primary study outcome in a clinical trial. Therefore, important and biologically and clinically meaningful covariates need to be identified during the study design, controlled at randomization, and adjusted during statistical analysis.

2.1 Common Important Covariates

2.1.1 Covariates Associated with the Outcome Measures

If there is a strong association between covariates and the primary endpoint, the variation and sampling error in covariates are more likely to impact the endpoint variable, resulting in an increase in the error of treatment effect estimation and a corresponding decrease in statistical power. Therefore, it is often necessary to include these covariates in the statistical model of efficacy analysis to improve the accuracy of estimation of the treatment effect. For example, an assessment of the disease condition is a continuous variable that reflects the severity of a participant's illness and is observed both at baseline and after the intervention. Whether the efficacy evaluation is based on the actual value of the assessment at the end of treatment or its change from baseline at the end of treatment, the result of evaluation will have a strong association with the baseline value. As such, the baseline value of the disease assessment should be included in the statistical analysis model to adjust for the corresponding covariates during the estimation of the treatment effect.

In particular, when the covariates have large variability, the estimation of the treatment effect may be less precise or biased. Therefore, the covariates with expectedly large variability can be introduced into the statistical analysis model for treatment effect estimation to adjust for the potential impact of covariates on efficacy evaluation.

2.1.2 Center Factors

In multicenter randomized controlled clinical trials, different degrees of differences may exist in clinical practice, trial conditions and baseline characteristics of enrolled subjects among different research centers. These factors may be related to the endpoint; therefore, center factors in multicenter clinical trials are often selected as covariates for adjustment. In particular, in international multiregional clinical trials, subjects in different regions may differ in race, culture, dietary habits, clinical practice, etc. Since region as a factor generally encompasses these characteristics and information in a comprehensive manner, country or region can be used as a center factor for adjustment. When the number of study centers is large, the number of subjects expected at a single center may be very limited. In this situation, using individual centers as a covariate for adjustment often introduces challenges in model estimation and results interpretation. Therefore, it may be reasonable not to adjust for center factors or to predefine the pooling of centers (by country or region) for adjustment.

2.2 Stratification Factors for Randomization

In randomized controlled clinical trials, for covariates that are strongly associated with the primary endpoint variables, stratified randomization can be used to assign subjects into different treatment groups to further reduce the between-group imbalance of covariates and control bias. The number of stratification factors should not be large, and the stratification factors usually need to be adjusted for in the statistical analysis model.

2.3 Control of the Number of Covariates

If too many covariates are included in the statistical analysis model, especially those without strong association with the endpoint variable or with high mutual correlation among themselves, the sample size in certain combinations of these covariate values can be small. In such cases, the covariate-adjusted estimate of the treatment effect may be biased, with reduced statistical power, and may even lead to other issues, such as model overfit and singular model information matrices. These introduce challenges in the scientific validity, reliability, and interpretability of statistical analysis results. Therefore, at the trial design stage, key covariates with clinical meaning and strong association with trial endpoint variables should be identified to control the number of covariates included in the statistical analysis model. In fact, in randomized controlled clinical trials, other than covariates that are routinely adjusted for (e.g., the stratification factors), the number of covariates to be included in the model for statistical analysis is recommended to be as small as possible.

3. ANALYSIS METHODS FOR COVARIATE ADJUSTMENTS

In randomized controlled clinical trials, depending on the type of endpoint variable, different statistical methods are often used for covariate adjustment. For example, for continuous efficacy endpoints, linear models are usually used; for time-to-event endpoints, the Cox proportional hazard model is usually used; for binary efficacy endpoint variables (such as effective/ineffective), the summary statistic for individual treatment groups can be a rate (such as effective rate); therefore, the statistics used to evaluate between-group differences can be the rate difference (RD), rate ratio (RR), or odds ratio (OR), and correspondingly, different statistical models of covariate adjustment may be used for different summarized statistics of the treatment effect. For example, logistic regression models can be used to adjust for covariates when the treatment effect is measured using OR.

Statistical models used for covariate adjustment usually rely on a set of assumptions; therefore, attention should be given to the model’s suitability requirements and the validity of the model assumptions. For example, an analysis of a covariance model requires residual analysis and assessment of homogeneity of variance, while a Cox model needs to consider whether the assumption of the proportional hazards is valid. Invalid assumptions for the analysis model may lead to incorrect estimation of the treatment effect.

4. REPORTING AND INTERPRETATION OF RESULTS

In addition to the use of stratified randomization to control covariates at the time of study design and the use of appropriate methods to adjust for covariates at the time of data analysis, thorough discussion should be considered in the study summary report to correctly interpret the impact of covariates on the results of the primary analyses and to evaluate the robustness of the primary conclusion.

4.1 Characteristics of Baseline Variables

In randomized controlled trials, it is generally necessary to report baseline variable characteristics for each treatment group. Due to randomization, the between-group difference in baseline variables should be due to random error. Therefore, the analysis and reporting of baseline data is generally based on descriptive statistics without hypothesis testing or statistical inference.

In the unexpected situation where a baseline variable is clearly unbalanced between treatment groups, the estimation of the treatment effect may be impacted. In this case, additional supplementary analyses beyond the prespecified statistical analysis plan can be considered to adjust for that specific baseline variable to further assess the robustness of the conclusions of the primary analysis.

4.2 Impact of the Covariate Adjustment Method on Results Interpretation

Adjustment for covariates is based on specific statistical models; therefore, the conclusions of the analysis should be interpreted in conjunction with the validity of the model assumptions. If meaningful deviations from the assumptions of the model are observed, they should be described in the study summary report, and supplementary analyses should be performed with other models to assess the robustness of the conclusions of the primary analysis.

4.3 Analyses with and without Covariate Adjustment

In randomized controlled clinical trials, according to the trial objectives and the type of covariates, covariate-adjusted analysis methods are often predefined as the primary analysis method, whereas methods without adjusting for covariates may be considered for sensitivity analysis. When the results with and without covariate adjustment are inconsistent, further exploration will be necessary.

4.4 Exploration of Interactions Between Covariates and Treatment Variables

In general, the primary objective of confirmatory clinical trials is to measure the overall treatment effect in the entire target population. The covariate-by-treatment interaction term is not usually included in the primary analysis but could be considered in the sensitivity analysis.

In fact, unless purposely considered in the study design, clinical trials often do not have sufficient power to test for the interaction between covariates and treatment variables. Therefore, a lack of statistical significance in the test of such an interaction does not sufficiently demonstrate the consistency of the treatment effect across different strata or subgroups. On the other hand, if there exists a statistically and clinically meaningful interaction between covariates and the treatment variable, a possibly different treatment effect across the stratified population is suggested. In such cases, the potential sources of interaction and the impact on the results of the primary analysis need to be assessed from the clinical perspective, and the conclusion based on the primary analysis needs to be carefully interpreted.

REFERENCES

[1] Altman D, Dore C. Randomization and baseline comparisons in clinical trials. The Lancet, 1990, 335(8682):149-53.

[2] Beach M L, Meier P. Choosing covariates in the analysis of clinical trials. Controlled Clinical Trials, 1989, 10(4):161-175.

[3] Committee for Proprietary Medicinal Products (CPMP), Points to consider on adjustment for baseline covariates. Stat Med. 2004; 23: 701-709.

[4] D. Tu, K. Shalay, and J. Pater, Adjustment of treatment effect for covariates in clinical trials: statistical and regulatory issues. Drug Inf J. 2000; 34: 511-523.

[5] EMA. Guideline on adjustment for baseline covariates in clinical trials. 2015

[6] FDA. Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biologics with Continuous Outcomes. Guidance for Industry (DRAFT GUIDANCE). 2019

[7] G. Raab, S. Day, and J. Sales, How to select covariates to include in the analysis of a clinical trial. Control Clin Trials 2000; 21: 330-342.

[8] S. Assmann, S. Pocock, L. Enos, and L. Kasten, Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet. 2000; 255: 1064-1069.

[9] Senn S. Covariate imbalance and random allocation in clinical trials, Statistics in Medicine, 1989, 8(4):67–75.

[10] Tukey J W. Use of Many Covariates in Clinical Trials. International Statistical Review, 1991, 59(2):123-137.

[11] Zhao N., Chen F. Baseline and Covariate. See Chen F., Xia J. Editor-in-Chief. Clinical Trial Statistics (Chapter XIII). People's Medical Publishing House. 2018. 202-210.

Appendix 1: GLOSSARY

Covariate: A variable that is observed prior to an intervention (usually prior to randomization) and is expected to be associated with the primary study outcome.

Multiregional clinical trial (MRCT): A clinical trial conducted in multiple regions under a single protocol.

Stratified randomization: Randomization within groups (strata) defined by the classification of study subjects based on key factors (such as age, gender, ethnicity, disease status, etc.). Stratified randomization can effectively improve the balance of distribution in key factors or study subjects in subgroups of special interest. The factors used to define stratification are called stratification factors.

Overfitting: A fit or match of a dataset in the analysis of data (e.g., modeling) that produces unnecessarily accurate results in an analysis that does not match additional observations or cannot reliably predict future observations.

Interaction: When the effect of a factor (covariate) on the outcome variable changes with another factor, these two factors are considered to have an interaction.

Appendix 2: Chinese-English Vocabulary

Chinese	English
过度拟合	Over-fitting
交互作用	Interaction
分层随机	Stratified randomization
协变量	Covariate
抽样误差	Sampling error
偏倚	Bias
敏感性分析	Sensitivity analysis

点击此处，查看原文附件

Guideline on covariate adjustment in clinical trials (final)