Beibei Guo1* and Rui Zhang2
1Department of Experimental Statistics, Louisiana State University Baton Rouge, USA
2Department of Radiation Oncology, Baylor College of Medicine Houston, USA
*Corresponding author: Beibei Guo, Department of Experimental Statistics, Louisiana State University Baton Rouge, LA 70803, USA
Submitted: September 01, 2025; Published: September 23, 2025
ISSN: 2578-0247 Volume4 Issue 1
Immunotherapy has transformed cancer treatment, providing durable responses and long-term survival benefits across a broad spectrum of malignancies. However, designing clinical trials for immunotherapeutic agents presents unique challenges distinct from those encountered with conventional chemotherapy and radiotherapy. These include atypical and delayed response patterns, low rates of dose-limiting toxicities, immune-related adverse events, and the critical role of immune biomarkers. As a result, traditional trial designs and endpoints may fall short in accurately evaluating immunotherapy. In response, a range of innovative trial designs have emerged to better capture the complex dynamics of immunotherapeutic agents. This review summarizes the evolving landscape of early-phase clinical trial design in immuneoncology, with a focus on toxicity assessment, endpoint selection, dose optimization, and biomarker integration. By addressing key methodological challenges and highlighting recent advances, we aim to guide researchers, clinicians, and trialists in the development of more efficient and informative trials that accelerate the safe and effective translation of immunotherapies into clinical practice. We conclude by discussing current limitations and outlining future directions for advancing immunotherapy trial design.
Keywords:Bayesian adaptive design; Biomarker; Immune response; Immunotherapy; Low-grade toxicities; Progression-free survival
The advent of immunotherapy has revolutionized oncology, offering durable responses and long-term survival for patients with various malignancies once considered intractable [1,2]. Unlike cytotoxic therapies that directly target tumor cells, immunotherapies harness the patient’s own immune system to recognize and eliminate cancer, with the potential for sustained control even after treatment ends. Approaches such as immune checkpoint inhibitors, adoptive cell therapies, and cancer vaccines have delivered outcomes rarely achieved with traditional cytotoxic therapies, reshaping the landscape of cancer care and accelerating the development of novel immunotherapeutic agents [3,4]. Despite these advances, the clinical evaluation of immunotherapy poses unique methodological and operational challenges. In contrast to chemotherapy, where Dose-Limiting Toxicities (DLTs) are common and serve as a primary guide for dose escalation and selection, immunotherapy typically induces Low-Grade Toxicities (LGTs), while DLTs are relatively rare [5]. As a result, traditional phase I trial designs that rely solely on DLTs and ignore LGTs may be ill-suited for determining appropriate dosing in this context. Moreover, the assumption that both efficacy and toxicity increase monotonically with dose, which underlies many traditional designs, often does not hold for immunotherapy [6]. Beyond a certain threshold, higher doses may not yield additional clinical benefit and may even increase the risk of adverse events. Accordingly, dose-finding efforts in immune-oncology increasingly focus on identifying the Optimal Biological Dose (OBD), the dose that achieves a favourable balance between clinical efficacy and safety, rather than simply the Maximum Tolerated Dose (MTD).
Adding to the complexity, immunotherapy often produces delayed and heterogeneous treatment effects. This heterogeneity is influenced by differences in tumor biology, immune microenvironment, and biomarker status (e.g., PD-L1 expression, tumor mutational burden, or other predictive signatures), which can markedly affect both efficacy and toxicity outcomes. Failure to account for such variability may obscure meaningful treatment effects in biomarker-defined subgroups and complicate dose selection. Unlike chemotherapeutics that typically produce rapid tumor shrinkage, immunotherapies often work by delaying disease progression and extending survival. Some patients may achieve long-term durable responses even in the absence of an immediate reduction in tumor burden [7,8]. As such, traditional endpoints like Objective Response Rate (ORR) may be insufficient for early-phase evaluation, and alternative endpoints, such as Progression-Free Survival (PFS) or immune-related response criteria, may offer a more accurate assessment of clinical benefit [9,10].
A further defining feature of immunotherapy is its ability to elicit specific immune responses that reflect the biological activity of the agent. Immune responses such as CD8+ and CD4+ T-cell proliferation or cytokine production can be measured early in treatment and are often predictive of downstream clinical outcomes [11-13]. In addition to these established correlates, emerging biomarkers may also provide valuable information for guiding safe and effective translation into clinical practice. For example, Sirtuin 1 (SIRT1) has been implicated in immune regulation, inflammation, and cell survival, with potential relevance to autoimmune and inflammatory disorders. Early evaluation of plasma SIRT1 levels may help anticipate treatment-related toxicities and adverse immune reactions, given its role in pathways associated with programmed cell death [14-16]. Incorporating both established immune responses and biomarkers into dose-finding strategies could enhance the precision of dose selection and support the development of more effective regimens.
Taken together, these unique characteristics challenge the applicability of traditional trial designs and endpoints, highlighting the need for innovative approaches tailored to the distinct properties of immunotherapy. In response, a variety of novel designs have been developed, emphasizing adaptive, flexible, and biomarker-driven strategies that can more effectively capture the complex therapeutic dynamics of immunotherapeutic agents. This review provides an overview of the evolving landscape of clinical trial designs in immune-oncology, with a primary focus on earlyphase and exploratory designs where methodological innovation is most urgently needed. We examine phase I designs focused solely on toxicity, phase II designs centered on efficacy, phase I/II designs that jointly evaluate both toxicity and efficacy, designs spanning multiple indications, and marker-strategy designs for evaluating predictive biomarkers. By synthesizing recent methodological advances and identifying persistent gaps, this review aims to inform future directions in the design and implementation of immunotherapy trials.
Traditional phase I trial designs that rely solely on DLTs are often inadequate for immunotherapies, which frequently induce LGTs and rarely result in DLTs. To address this limitation, Jiang et al. [17] proposed the MC-Keyboard design, which incorporates both DLT and LGT information through multiple toxicity constraints to guide dose escalation, de-escalation, and MTD determination. Extending the original Keyboard design by Yan, Mandrekar, and Yuan [18], which was based solely on DLTs, the MC-Keyboard design framework integrates LGTs into its decision-making process to provide a more comprehensive safety assessment. At the end of the trial, the MTD is defined as the lower of two candidate doses: One with a DLT rate estimate closest to the prespecified target DLT rate, and the other with an LGT rate estimate closest to the target LGT rate. This conservative approach ensures that both severe and moderate toxicities are considered in dose selection, which is especially important for immunotherapies where LGTs may have clinical significance. As a model-assisted design, MC-Keyboard retains the simplicity and transparency of algorithm-based methods while incorporating the flexibility and superior performance of model-based inference. A key practical advantage is that it provides pretabulated, rule-based decision tables that can be fully specified before the trial begins. These tables link observed toxicity outcomes to dose escalation or de-escalation decisions, eliminating the need for complex real-time modeling during the trial and facilitating regulatory review, protocol writing, and on-site implementation. One limitation of phase I trial designs for immunotherapy is that they do not incorporate efficacy or immune response data, which are often critical for identifying the OBD in immunotherapy. As a result, the selected MTD may not reflect the most therapeutically beneficial dose, especially in cases where efficacy does not increase monotonically with dose.
Conventional phase II trial designs that rely on a single, rapidly ascertainable binary endpoint, such as OR, are often inadequate for immunotherapy, where treatment responses are typically delayed and multiple endpoints may be clinically justified. To address these challenges, Lin, Coleman, and Yuan [19] proposed the Time-to-Event Bayesian Optimal Phase II (TOP) design, which accommodates both simple and complex endpoints within a unified, flexible framework. The TOP design enables real-time “go/no-go” interim decisions by incorporating all available patient data, including partial and pending outcomes due to late-onset responses. It is statistically efficient, maximizing power to detect truly effective treatments while maintaining strict control of the type I error rate. The authors illustrate the versatility of the TOP design through three immunotherapy trial examples: (1) delayed binary response (e.g., OR), (2) co-primary efficacy endpoints (e.g., OR and PFS), and (3) joint modeling of efficacy and toxicity endpoints. Simulation studies demonstrate that, compared to other Bayesian designs, the TOP design can shorten trial duration by 4-10 months and increase power to detect effective treatments to as high as 90%.
Incorporating immune response into phase II evaluation is also critical in immune-oncology, where early immune responses may predict long-term benefit. The BLITE design [20] introduces a Bayesian phase II framework that jointly models longitudinal immune response and time-to-event efficacy outcomes. Recognizing that a subset of patients may achieve durable benefit, the design incorporates a cure fraction into its modeling. Longitudinal immune responses are captured via hierarchical nonlinear mixedeffects models, with separate trajectory specifications for the cured and susceptible patients. For patients in the susceptible group, time-to-event efficacy outcomes are modelled conditional on the immune response trajectory via a time-dependent Coxtype regression model [21]. Treatment desirability is quantified through a physician-elicited utility function that integrates both immune response and clinical outcomes. A two-stage adaptive design is used to guide treatment allocation and decision-making. Simulation studies demonstrate that BLITE achieves favorable operating characteristics and outperforms alternative designs that do not incorporate immune response information, particularly in identifying effective treatments with durable benefit.
Despite important advances, these phase II designs for immunotherapy face several limitations. They prioritize the detection of efficacy signals but often overlook the trade-off between efficacy and toxicity, which is crucial for identifying the OBD rather than merely demonstrating activity. As a result, they may promote doses that are effective in the short term but poorly tolerated or inadequate for achieving sustained clinical benefit. Moreover, they typically fail to account for patient heterogeneity by biomarker status, potentially obscuring meaningful treatment effects within specific subgroups.
Unlike phase I designs that focus exclusively on toxicity and phase II designs that primarily assess efficacy, phase I/II designs simultaneously evaluate both outcomes to better inform dose selection. The BOIN12 design [22] is a phase I/II approach developed to identify the OBD by maximizing the risk-benefit tradeoff. This design employs a utility-based framework, where utility scores quantify the clinical desirability of each possible toxicityefficacy outcome. In the simplest case with binary toxicity and efficacy endpoints, there are four possible outcomes: no toxicity and efficacy; no toxicity and no efficacy; toxicity and efficacy; and toxicity and no efficacy. A utility score of 100 is assigned to the most desirable outcome (no toxicity and efficacy) and 0 to the least desirable outcome (toxicity and no efficacy). These two anchor points guide the elicitation of utility scores for the remaining intermediate outcomes. Based on these scores and the estimated probabilities of each outcome, the mean utility of each dose is calculated. The OBD is then defined as the dose with the highest mean utility, provided it also meets acceptable toxicity and efficacy criteria. During the trial, BOIN12 adaptively assigns patients to the dose with the most favorable utility-to-risk profile. Like the MC-Keyboard design, the decision rules for dose escalation and de-escalation in BOIN12 are pre-specified and can be fully incorporated into the trial protocol.
The TSNP design [23] is a two-stage nonparametric phase I/ II design to identify the OBD for immunotherapy. This design addresses two major limitations that have hindered the practical use of many existing designs. First, most existing designs rely on complex Bayesian modeling frameworks, which are often viewed as nontrans-parent or difficult to interpret by the clinical community. Second, many of these designs are based on parametric models that require strong assumptions about the dose-response relationship and the joint distribution of toxicity and efficacy. In the context of early-phase trials, where sample sizes are typically small, these assumptions are difficult to validate and may lead to unreliable results. The TSNP design overcomes these challenges by adopting a fully nonparametric strategy, providing closed-form estimates of joint toxicity-efficacy probabilities and a simple, transparent dosefinding algorithm. User-friendly software is available to facilitate simulations and real-time implementation.
The SCI design [24] is tailored for immunotherapy trials that jointly considers DLT and PFS. It aims to address two key challenges in such trials. First, disease progression often leads to treatment discontinuation, precluding further toxicity observation, a semicompeting risks scenario where progression precludes toxicity but not vice versa. Second, PFS is typically a late-onset outcome requiring long follow-up. The SCI design tackles these complexities by factorizing the joint toxicity-efficacy probability into marginal and conditional components and re-constructing the likelihood function based on each patient’s actual follow-up time. A curve-free dose-finding algorithm is then applied to identify the OBD using a toxicity-efficacy trade-off function, without relying on parametric dose-response relationships.
To leverage immune response data, several phase I/II designs have been developed to determine the OBD by jointly accounting for immune response, toxicity, and efficacy. Liu & Yuan [25] employed an Emax model to characterize the marginal distribution of the immune response, and, conditional on this response, used a latent variable approach to jointly model binary toxicity and ordinal efficacy. The model incorporates a mechanistic assumption that severe immune-related toxicities occur only when the immune response exceeds a predefined threshold. The OBD is defined as the dose that maximizes a desirability function reflecting the riskbenefit trade-off. Building on this framework, the SPIRIT design [26] extends the framework to accommodate PFS as the efficacy endpoint. In SPIRIT, the OBD is defined based on the restricted mean survival time, with PFS serving as the primary endpoint for dose selection and the immune response used as an auxiliary marker to rapidly eliminate ineffective doses. Toxicity is continuously monitored throughout the trial to ensure patient safety.
A common limitation of the aforementioned designs is the assumption of a homogeneous patient population, following a “onedose- fits-all” approach to dose assignment and OBD selection. This assumption is often unrealistic, as patient heterogeneity is common in clinical settings. For example, numerous studies have shown that PD-L1 expression is a predictive biomarker for response to checkpoint inhibitor-based immunotherapy. Patients with PDL1- positive tumors tend to experience higher response rates and improved progression-free and overall survival compared to PDL1- negative patients [27-32]. To address this, Guo & Zang [33,34] developed phase I/II designs to identify subgroup-specific OBDs. These models are deliberately parsimonious yet sufficiently flexible to support information borrowing across both outcomes and patient subgroups, an important consideration given the limited sample sizes typical of early-phase trials. More recently, Guo et al. [35] and Lin et al. [36] extended these designs to evaluate immunotherapy in combination with radiotherapy.
Despite their advantages, most current phase I/II designs for immunotherapy rely on simplified assumptions about the joint distribution of toxicity and efficacy or adopt fixed utility structures that may not generalize across different therapeutic contexts or patient populations. The elicitation of utility scores, while conceptually appealing, can be subjective and may not fully capture the nuances of clinical judgment.
Traditional clinical trials are typically designed to assess the safety and efficacy of an investigational drug within a single disease indication. In contrast, some immunotherapy trials now enrol patients across multiple indications simultaneously, reflecting a shift toward biomarker-driven, tissue-agnostic approaches. For instance, pembrolizumab, an anti-PD-1 therapy, has been approved by the U.S. Food and Drug Administration (FDA) for the treatment of unresectable or metastatic solid tumors that exhibit DNA mismatch repair deficiency or microsatellite instability-high, regardless of tumor origin. Similarly, larotrectinib received FDA approval for the treatment of patients with NTRK gene fusion-positive cancers, spanning a wide range of tumor types (Drilon et al., 2018).
The “shotgun” design [37] offers an efficient framework for such multi-indication trials. It begins with an all-comer dosefinding phase to identify the MTD or recommended phase II dose, followed seamlessly by indication-specific cohort expansions. Patients enrolled in the dose-finding phase are rolled over into the appropriate expansion cohorts, enhancing efficiency by contributing both safety and preliminary efficacy data. Meanwhile, patients enrolled into the cohort expansions continue to inform the evolving safety and tolerability profile. Interim analyses are conducted within each cohort to allow for early termination of indications with insufficient efficacy or unacceptable toxicity. To improve the efficiency and robustness of these interim decisions, a clustered Bayesian hierarchical model is used to adaptively borrow information across indications while preserving the integrity of indication-specific evaluations. This is achieved by first clustering indications into subgroups, then borrowing information within each subgroup to minimize potential bias and control type I error inflation.
Building on this framework, the “shotgun-2” design [38] introduces a utility-based, two-stage Bayesian basket trial design that targets identification of the OBD rather than the MTD. Unlike the original shotgun design, shotgun-2 allows for indication-specific dose optimization by constructing a utility function that balances efficacy and safety, guiding dose-finding and OBD selection within each indication. This approach enables more personalized and efficient dose selection across heterogeneous disease types, while retaining the flexibility to adapt to differing efficacy and toxicity profiles across indications.
While multi-indication trials offer increased efficiency and broaden patient access, they introduce statistical and logistical complexities. These may include controlling type I error across multiple cohorts, ensuring adequate power for each indication, and addressing the challenges of rare biomarker-defined subgroups, which often have limited sample sizes and high uncertainty.
Immunotherapy often benefits only a subset of patients, underscoring the importance of identifying predictive biomarkers, biological features that can indicate which individuals are more likely to respond to treatment [39,40]. For example, pembrolizumab has been approved by the FDA for the treatment of advanced melanoma and metastatic squamous and non-squamous nonsmall cell lung cancer, but only for patients whose tumors express programmed death ligand-1 (PD-L1), that is, PD-L1-positive patients. It is important to distinguish predictive biomarkers, which indicate treatment response, from prognostic biomarkers, which are associated with overall disease outcomes regardless of therapy. Only predictive biomarkers can guide treatment selection, making their evaluation critical in personalized immunotherapy. The Marker-Strategy Design (MSD) is a classical trial framework for evaluating and validating predictive biomarkers [41]. In MSD, patients are randomized to one of two treatment strategies: A marker-based strategy, where treatment assignment depends on the patient’s biomarker status (e.g., biomarker-positive patients receive the experimental therapy, biomarker-negative patients receive the control), or a non-marker-based strategy, where treatment is assigned regardless of biomarker status, mimicking a standard randomized clinical trial. The effectiveness of the marker is typically assessed by comparing clinical outcomes (such as response rates or survival) between these two strategies. This approach allows investigators to assess whether using the biomarker to guide treatment leads to better patient outcomes than treating all patients the same way.
However, Zang & Yuan [42] demonstrated that this betweenstrategy comparison often suffers from low statistical power to detect a true predictive effect and is only valid under the restrictive condition that the treatment allocation within the non-markerbased strategy mirrors the biomarker prevalence in the population. To address this limitation, they proposed an alternative wald test that is valid under general conditions and achieves greater statistical power. They further developed an optimal MSD that selects the best randomization ratios between strategies and treatment arms to maximize power for detecting a predictive biomarker effect. Han et al. [43] further argued that the between-strategy effect estimated in MSD does not necessarily reflect the true predictive effect of the biomarker and can be misleading if used for that purpose. To better evaluate the predictive utility of a biomarker, they introduced novel testing procedures tailored specifically to this goal: One for binary response endpoints and another for time-to-event outcomes. Simulation studies showed that these tests are both statistically valid and substantially more powerful than traditional betweenstrategy comparisons. MSDs, especially when paired with improved statistical testing procedures, provide a valuable framework for evaluating the clinical utility of predictive biomarkers in immunotherapy. However, MSDs also present practical and ethical challenges. Randomizing patients to the non-marker-based strategy may assign some biomarker-positive individuals to treatments that are likely to be ineffective, raising ethical concerns, particularly when there is already strong preliminary evidence supporting the biomarker’s predictive role. In addition, MSDs can be logistically complex and require large sample sizes to achieve adequate power, especially when the biomarker is rare or when treatment effects differ only modestly across biomarker-defined subgroups. As such, MSDs are most appropriate in the early stages of biomarker validation, before a biomarker is fully established for treatment selection.
The rapid evolution of immunotherapy has catalyzed a parallel transformation in the design of early-phase oncology trials. Unlike conventional cytotoxic agents, immunotherapies exhibit delayed and variable responses, immune-related toxicities, and complex mechanisms of action, all of which challenge traditional trial designs. In response, numerous innovative designs have been proposed to better accommodate these unique features. These novel frameworks aim to improve the precision, efficiency, and ethical rigor of earlyphase trials by leveraging modern statistical methods and clinical insights. This review highlights a range of such designs, including those for dose-finding, efficacy evaluation, biomarker validation, and multi-indication trials. However, due to space constraints, many other promising approaches could not be covered in depth. Still, the examples discussed reflect key directions in the evolving landscape of immune-oncology trial methodology. Despite recent advances, several important limitations and challenges remain. First, many novel trial designs are built upon complex Bayesian or hierarchical modeling frameworks. While these methods offer considerable statistical flexibility, greater efficiency, and the ability to incorporate prior knowledge, they are often perceived as opaque or nontransparent by clinical investigators and decision-makers. This perception can hinder interdisciplinary communication and slow clinical adoption. Additionally, regulatory agencies may be less familiar or comfortable with these methods, particularly when adaptive decisions are driven by partially observed or time-toevent outcomes.
Second, most existing designs assume a homogeneous patient population, relying on a “one-dose-fits-all” strategy. This approach neglects substantial inter-patient heterogeneity in drug metabolism, immune status, tumor biology, and genetic or molecular characteristics, factors that are especially relevant in immune-oncology. Such assumptions can lead to suboptimal dosing or treatment strategies for certain subgroups. To address this, future designs should explore stratified or personalized trial frameworks that allow for covariate-adjusted dose finding or subgroup-specific efficacy evaluations. Incorporating baseline biomarkers, immune profiling, or genomic data could enable more individualized treatment approaches, enhancing both efficacy and safety. Third, many designs that incorporate immune response assume that it is a valid surrogate for long-term clinical benefit, such as progression-free or overall survival. However, this assumption may not always hold. Immune response can show early activity without translating into durable benefit, and some patients with minimal initial immune response may later achieve meaningful outcomes. Relying too heavily on immune endpoints may thus yield misleading conclusions about treatment efficacy. To mitigate this risk, validation studies should be conducted to assess the strength and consistency of associations between immune responses and long-term outcomes. Trial designs can also include both immune and clinical endpoints, either as co-primary endpoints or within a hierarchical testing framework, to better capture treatment effects.
Fourth, while biomarker-driven dose-finding is a promising direction, its application remains limited and faces substantial practical hurdles. Biomarkers are often assumed to have strong, stable associations with clinical efficacy or toxicity, yet in practice, they can be noisy, platform-dependent, and highly context-specific. This variability raises concerns about reproducibility and external validity, particularly across different cancer types or trial settings. Addressing this challenge requires rigorous biomarker validation across multiple platforms and independent cohorts. Despite these challenges, several promising opportunities exist to enhance the design, implementation, and impact of early-phase immunotherapy trials. First, there is an increasing demand for transparent, userfriendly software platforms to support the implementation of innovative trial designs. When paired with clear documentation and visual aids, these tools can bridge the gap between methodological innovation and clinical adoption by making advanced statistical methods accessible to investigators without extensive programming or modeling expertise. Integration with electronic data capture systems and trial management platforms could further streamline adoption in real-world trial settings. Second, the development of subgroup-specific OBD identification strategies represents a critical step toward more personalized immunotherapy. While most existing designs rely on a “one-sizefits- all” approach, emerging methods based on parsimonious Bayesian models or covariate-adjusted utility functions offer a way to tailor dose recommendations to patient-specific characteristics. Incorporating machine learning algorithms or dynamic treatment regimens into the dose-finding process may further refine these efforts by learning and adapting to individual patient responses over time, thereby improving both efficacy and tolerability.
Finally, the integration of Real-World Data (RWD) into early-phase immunotherapy trial design presents an emerging opportunity to enhance both efficiency and generalizability. RWD sources, such as electronic health records, patient registries, and claims databases, can be leveraged to inform prior distributions, refine patient eligibility criteria, and validate surrogate endpoints in a broader clinical context. When combined with adaptive and Bayesian designs, RWD can also help recalibrate interim decision rules or provide external control arms, especially in rare or heterogeneous indications. While regulatory and methodological challenges remain, careful use of RWD has the potential to complement traditional trials and accelerate the evaluation of promising immunotherapeutic strategies. In conclusion, the field of immuno-oncology trial design has made significant strides in adapting to the unique properties of immunotherapy. However, continued innovation is needed to address limitations in model complexity, patient heterogeneity, endpoint selection, and biomarker integration. By fostering collaboration between statisticians, clinicians, and regulators, future designs can become both scientifically rigorous and practically feasible, ultimately accelerating the safe and effective translation of immunotherapies into clinical care.
This work was supported by the American cancer society (RSG- 22-030-01-CTPS), Louisiana Board of Regents (BoR) Research and Development (R&D) research competitiveness subprogram.
© 2025 Beibei Guo. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.