Insights and Analysis

FDA’s first Digital Health AdComm meeting mulls promises & perils of Generative AI

medical imaging
medical imaging

Amid continued rapid development of novel software products that present new opportunities in the health care space, FDA has yet to grant marketing authorization to a medical device incorporating artificial intelligence (AI) that continuously learns in the field to create derived audiovisual content, otherwise known as generative AI (GenAI). However, the technology industry does not stand still, and manufacturers are eager to find ways to introduce continually learning algorithms to the medical device space in the U.S. Recognizing the simultaneous existence of significant risks and notable potential associated with GenAI, FDA dedicated the inaugural meeting of its Digital Health Advisory Committee on November 20-21, 2024, to discussing Total Product Lifecycle (TPLC) considerations for GenAI-enabled devices. Consistent with the TPLC approach emphasized by the agency, the meeting was structured in three parts to address first, pre-market considerations associated with GenAI-enabled devices, then risk management, and finally post-market considerations.

In his opening statement, FDA Commissioner Dr. Robert Califf stressed the need to address “disparities rooted in race, ethnicity, education, and geography” that are leading the United States to fall behind other countries in terms of health outcomes, and asserted that AI – on the important condition that it be developed and maintained responsibly – could revolutionize health care delivery, access, and quality. The Director of FDA’s Center for Devices and Radiological Health (CDRH), Dr. Michelle Tarver, echoed the sentiment that there is enormous unmet clinical need and health inequity in this country that could be closed, at least in part, with the help of GenAI.

Unique challenges

Unique characteristics associated with GenAI technology introduce uncertainty around the application and applicability of existing regulatory frameworks for providing oversight and necessary guardrails during product development, market entry, and post-market surveillance. Such characteristics were addressed by the Committee and presenters throughout the discussion, with highlights noted below:

  1. Uncertainty in Output: The open-ended nature of large-language models and other GenAI produces expansive possibilities for device output. Foundation models, for example, often operate across diverse tasks without tailoring them specifically toward single-use cases, making it difficult to obtain detailed information about attributes/architecture and training methodologies.
  2. Application Scope: Appropriately scoping GenAI to specific use cases or indications will be a key to pre-market performance evaluation, particularly in the context of current FDA policies/guidances that identify specific uses as low risk. 
  3. Risk Management: As with any new technology, implementation of GenAI capabilities requires careful consideration of benefits and risks. Industry should implement effective risk management strategies throughout each GenAI-enabled product’s lifecycle, including:
  • Mechanisms for maintaining control over underlying models—even when third-party foundation models are utilized—to ensure compliance with safety standards.
  • Continuous post-market monitoring to evaluate real-world performance effectively.
  • Establishing feedback mechanisms through which users can report issues to help identify potential risks early.
  1. Evidence Generation: Traditional methods for evaluating safety and effectiveness may need adaptation given the open-ended input/output formats typical in GenAI applications and the unique risks associated with such design. For example:
  • Erroneous outputs generated by AI models (hallucinations) can complicate understanding device behavior and reliability. 
  • The inherent variability in outputs raises questions about consistent performance metrics over time and how to calibrate evaluation of pre-market performance to an algorithm that continues to evolve.
  • Because GenAI produces human-like content, there is a risk the outputs may be misinterpreted as human judgment. This necessitates clear communication regarding device limitations and proper use.
  • There is a risk of malicious prompts leading to unintended model outputs (jailbreaking).
  • Stakeholder biases affect interpretations and clinicians may over-rely on AI-generated outcomes, undermining their professional judgment.

Questions addressed

Much of the DHAC committee discussion was centered on addressing three key questions, summarized below: 

  1. Premarket Performance Evaluation: What specific information related to GenAI should be available to FDA – and included in a premarket submission – to evaluate the safety and effectiveness of Gen AI-enabled devices, considering that foundation models leveraged by these devices will change over time and that limited information may be available on the underlying training data? What, if any, specific information must be conveyed to health care professionals, patients, and caregivers to help improve transparency and/or control unique risks associated with GenAI?
  2. Risk Management: What new opportunities, such as new intended uses or new applications in existing uses, have been enabled by GenAI for medical devices, and what new controls may be needed to mitigate associated risks?
  3. Post-Market Performance Monitoring: What aspects of post-market monitoring and evaluation will be critical to maintaining the safety and effectiveness of GenAI devices in particular (versus other AI-enabled devices)?

Premarket Performance Evaluation 

The committee emphasized several critical aspects necessary for evaluating Gen AI-enabled devices in premarket submissions. It was noted that clarifying specific indications for use as well as target populations and usage settings will be crucial for developing products in preparation for premarket performance evaluation. This development process should include careful consideration of limitations of the proposed use, including explicitly reporting cases that may introduce uncertainty, providing demographic information for the intended population, and specifying where devices will be used (e.g., decentralized health care settings). FDA will also expect to understand whether the GenAI is dynamic (adaptive) or static, whether the algorithms operate autonomously or require human intervention (human-in-the-loop vs. human-out-of-the-loop), and details about the modeling and training of the device. Performance review of GenAI devices will direct particular focus to the following requirements: 

  • Understanding usability risks compared to non-generative systems; 
  • Characterizing device performance across different populations/settings relevant to intended use; 
  • Assessing metrics related to sensitivity/specificity as well as repeatability, reproducibility, uncertainty estimates, hallucination rates, error rates; and 
  • Establishing performance metrics to monitor the risks of data drift, acknowledging that this could affect device safety/performance over time. 

The Committee discussed potential evaluation strategies for addressing these requirements, including benchmarking performance data with standardized datasets, including expert evaluation for clinical relevance, and model-based evaluation using other models with human oversight to evaluate the subject GenAI device. Data sheets were also explored as a potential tool that may be used to standardize information presented to the agency. Much of the discussion also concentrated on performance from the perspective of the anticipated user of the GenAI device – e.g., how to enable users to make better informed clinical decisions and how to mitigate risks associated with user interfaces.

While GenAI presents a host of unique risks, some themes that emerged from Committee discussion echoed the agency’s thoughts on approaches to regulating AI/ML technologies more broadly. In line with the general emphasis on TPLC, the Committee suggested that post-marketing plans should be included in pre-marketing discussions. There was also a focus on how risks to patient harm underscore governance needs, and the Committee stressed the shared responsibility among manufacturers, physicians, and health systems deploying Gen AI-enabled devices.

Risk Management 

As noted previously, Gen AI presents novel opportunities such as enhanced diagnostic capabilities but also introduces unique risks that require new types of monitoring. For example, the Committee discussed how the need to correct AI-generated content after the creation of an initial draft, particularly for text drafts or interpretation of images, will require the development of new control mechanisms. Auditing will also be a key aspect of risk management, and other potential targets for tracking and monitoring for companies to consider were mentioned as well.

Post-Market Performance Monitoring 

The discussion focused on what specific strategies and tools can be implemented to monitor and manage the performance and accuracy of a GenAI-enabled device across multiple sites, ensuring consistency while addressing potential regional biases in data variation compared to the authorized device. Specific strategies discussed included:

  • Assessing the percentage of misinterpretations at local institutions.
  • Utilizing audits with synthetic data or quality reviews using locked datasets and varied prompts.
  • Tracking the number of queries or flagged outputs and identifying their areas in local evaluations.
  • Conducting evaluations of local datasets in comparison to training datasets to assess similarity.
  • Considering variability in doctors' judgments and patient outcomes as an additional aspect to monitor.

Additionally, the Committee discussed the specific capabilities that should be included in post-market performance monitoring, which largely fell into three categories, as summarized below:

  • Human-AI Interaction:
    • There must be significant human oversight during the post-marketing period.
    • Watermarking should be implemented across all outputs to indicate that Gen AI was involved, ensuring transparency.
    • Existing data standards should be utilized and expanded in collaboration with the Consumer Technology Association (CTA) and others for GenAI-enabled devices.
    • A low-cost monitoring mechanism is desirable, such as a registry or reporting system. 
    • Facilitation of error reporting to both manufacturers and end users (clinicians and patients) is essential, ideally through a centralized database that is low-cost and low effort. 
    • Some suggestions were made to modernize the Manufacturer and User Facility Device Experience (MAUDE) Database to have transparent industry insight into GenAI development and risks.
  • Usability and Risk Mitigation:
    • Adverse event reporting should focus on accuracy (did it provide the expected output?), safety (did anyone get hurt?), and bounded use (was the answer significantly off from intention?).
    • The likelihood of user error reporting relates directly to usability and safety considerations.
    • Internal periodic reviews of interactions (e.g., chat transcripts) should document product or implementation failures.
    • Education regarding nomenclature related to data drift is necessary for effective risk reporting.
    • Resubmitting information to FDA is required if there are significant changes in the foundational model being used.
  • Reporting Changes from Baseline Functionality – PCCPs can enable “pre-approval” of changes and should include (1) a description of planned modifications that specifies whether such changes are manual or automated and whether they will be implemented globally or locally; (2) a modification protocol that details how the proposed changes will be verified/validated, analyzed, implemented, and then monitored in the real-world, with an emphasis on transparency regarding how users will be informed about updates; and (3) an impact assessment to evaluate the balance of benefits and risks, ensuring that risks have been identified and adequately mitigated before implementing proposed modifications. 

The Committee discussed some considerations for leveraging PCCPs for adaptive (GenAI-enabled) algorithms. Establishing boundaries or guardrails within PCCPs to define permissible ranges for automatic updates was one point of interest. It was also noted that monitoring post-market performance over time must ensure maintained or improved device effectiveness while accommodating local adaptations across multiple sites. Additionally, panelists stated that updating labeling when modifications occur is crucial, so users receive timely information about device functionality. Shortly after the Committee meeting, FDA issued its final guidance on PCCPs for AI-enabled device software functions; see our discussion of this guidance here.

Conclusion 

DHAC's discussions highlighted both the opportunities and challenges of integrating GenAI into health care. While these technologies have transformative potential — enhancing diagnostics and treatment options — the associated risks necessitate careful consideration from industry and regulators alike, and there are still many questions to be answered. Continued dialogue will help shape effective regulatory strategies that foster innovation while ensuring public trust in emerging technologies within health care settings.

Hogan Lovells has been assisting clients in navigating the FDA regulatory process for AI-enabled devices throughout its evolution over the last decade. If you have questions or would like us to help you evaluate an issue related to AI or GenAI in the context of medical device regulation, please contact one of the authors of this alert or the Hogan Lovells attorney with whom you normally work. 

 

Authored by Kelliann Payne, Suzanne Levy Friedman, and Evelyn Tsisin.

Search

Register now to receive personalized content and more!