The United States Food and Drug Administration (FDA), Health Canada and the United Kingdom Medicines and Health Products Regulatory Agency (MHRA) have jointly identified 10 guiding principles that can inform the development of good practice in machine learning (GMLP). These guiding principles will help promote safe, efficient and high-quality medical devices that use artificial intelligence and machine learning (AI / ML).
Artificial intelligence and machine learning technologies have the potential to transform healthcare by extracting new and important information from the vast amount of data generated every day in the delivery of healthcare. They use software algorithms to learn lessons from actual use, and in some situations may use that information to improve product performance. But they also present unique considerations due to their complexity and the iterative, data-driven nature of their development.
These 10 guiding principles aim to lay the foundation for the development of good machine learning practices that take into account the unique nature of these products. They will also help cultivate future growth in this rapidly evolving field.
The 10 Guiding Principles identify areas where the International Forum for Medical Device Regulators (IMDRF), international standards organizations and other collaborative bodies could work to advance GMLPs. Areas of collaboration include research, the creation of educational tools and resources, international harmonization and consensus standards, which can help inform regulatory policies and regulatory guidance.
We believe that these guiding principles can be used to:
- adopt good practices that have been proven in other sectors
- adapt practices from other sectors so that they are applicable to medical technology and the healthcare sector
- create new practices specific to medical technology and the healthcare industry
As the field of AI / ML medical devices evolves, so do GMLP best practices and consensus standards. Strong partnerships with our international public health partners will be essential if we are to empower stakeholders to advance responsible innovations in this area. As such, we expect that this initial collaborative work may inform our broader international commitments, including with IMDRF.
We appreciate your continued comments through the public registry (FDA-2019-N-1185) on Regulations.gov, and we look forward to working with you in these efforts. The Center of Excellence for Digital Health is spearheading this work for the FDA. Contact us directly at [email protected], [email protected] and [email protected]
- Multidisciplinary expertise is leveraged throughout the product lifecycle: a deep understanding of the intended integration of a model into the clinical workflow, as well as the desired benefits and associated risks to patients, can help ensure that ML compatible medical devices are safe and effective and meet clinically meaningful needs throughout the device lifecycle.
- Good software engineering and security practices are implemented: the design of the model is implemented taking into account the “fundamentals”: good software engineering practices, data quality assurance, data management and cybersecurity practices robust. These practices include methodical risk management and a design process that can appropriately capture and communicate decisions and rationale for the design, implementation and risk management, as well as ensuring the authenticity and data integrity.
- The participants in the clinical study and the data sets are representative of the target patient population: data collection protocols should ensure that the relevant characteristics of the target patient population (for example, in terms of age, gender, gender, race and ethnicity), use, and measurement inputs are sufficiently represented in an adequate sample size in the clinical study and training and testing data sets, so that the results can be be reasonably generalized to the population of interest. This is important to manage any bias, promote appropriate and generalizable performance across the target patient population, assess usability, and identify circumstances in which the model may underperform.
- The training data sets are independent of the test sets: the training and test data sets are selected and maintained to be independent of each other in an appropriate manner. All potential sources of dependency, including patient, data acquisition and site factors, are considered and addressed to ensure independence.
- The selected reference datasets are based on the best available methods: the accepted best available methods for developing a reference dataset (i.e. a reference standard) ensure that clinically relevant and well characterized data are collected and the reference limits are understood. If available, benchmark data sets accepted in the development and testing of models that promote and demonstrate the robustness and generalization of the model across the target patient population are used.
- The model design is responsive to the available data and reflects the intended use of the device: The model design is responsive to the available data and supports active mitigation of known risks, such as overfitting, performance degradation and security risks. The clinical benefits and risks of the product are well understood, used to derive clinically meaningful performance targets for testing and to support that the product can safely and effectively achieve its intended use. Considerations include the impact of overall and local performance and uncertainty / variability in device inputs and outputs, expected patient populations, and conditions of clinical use.
- The focus is on the performance of the Human-AI team: when the model has a ‘human in the loop’, human factors considerations and human interpretability of model results are addressed with an emphasis on the performance of the Human-AI team, rather than just the performance of the model in isolation.
- Tests demonstrate device performance under clinically relevant conditions: Statistically sound test plans are developed and executed to generate clinically relevant device performance information independent of the training data set. Considerations include target patient population, important subgroups, clinical environment and use by the Human-AI team, measurement inputs, and potential confounders.
- Users receive clear and essential information: Users benefit from easy access to clear and contextually relevant information that is appropriate for the intended audience (such as healthcare providers or patients), including: use intended product and indications for use, model performance for the appropriate subgroups, characteristics of the data used to train and test the model, acceptable inputs, known limitations, user interface interpretation, and integration of the model’s clinical workflow. Users are also notified of device changes and updates through real-world performance monitoring, the basis for decision-making when available, and a way to communicate product concerns to the developer. .
- Deployed models are monitored for performance and recycling risks are managed: Deployed models have the ability to be monitored in the “real world” with an emphasis on maintaining or improving safety and performance . Additionally, when models are periodically or continuously trained after deployment, appropriate controls are in place to manage the risks of overfitting, unintentional bias, or model degradation (e.g., dataset drift) that may impact the safety and performance of the model. it is used by the Human-AI team.