TW-2025-003ML Safety
100%
Tomco USA · Technical Whitepaper · TW-2025-003ML SafetyVol. 1, No. 3Sep 2025

Evidence Generation for Neural Networks in Safety-Critical Embedded Systems

A Dual-Compliance Strategy for IEC 62304:2015 AMD 1 and EU AI Act High-Risk Requirements

  1. ¹Tomco USA, Detroit MI
Abstract — Certifying neural networks under IEC 62304 Amendment 1 and the EU AI Act simultaneously requires a structured evidence generation strategy that satisfies both frameworks without doubling engineering effort. This paper presents a dual-compliance methodology covering dataset governance, V&V strategy, coverage criteria, and technical documentation, with worked examples from medical imaging and industrial quality inspection.
Index Terms — IEC 62304, EU AI Act, neural network verification, dataset governance, coverage criteria, medical device software, high-risk AI, conformity assessment, ML safety, IEC 62443.

1. Introduction

Neural networks are being integrated into safety-critical embedded systems at a rate that outpaces the maturity of applicable certification guidance. Medical imaging systems, industrial inspection platforms, and autonomous robotic systems all rely on learned models to perform functions that directly affect patient safety, product quality, and physical integrity. Two regulatory frameworks impose concurrent obligations on these deployments: IEC 62304:2006/AMD 1:2015 [1], which governs medical device software lifecycle processes, and the EU AI Act (Regulation 2024/1689) [2], which classifies AI systems used in medical devices, critical infrastructure, and high-consequence industrial automation as high-risk.

IEC 62304 AMD 1 extended the original standard's scope to address Software of Unknown Provenance (SOUP) — a category that encompasses pre-trained neural network models imported from third-party repositories. The amendment requires manufacturers to document SOUP items, assess their functional suitability and known anomalies, and verify that their use does not introduce unacceptable risk. The EU AI Act's Article 10 mandates data governance practices for training, validation, and test datasets; Article 9 requires a documented risk management system; and Annex IV specifies technical documentation content that substantially overlaps with a 62304-compliant technical file [2].

2. Regulatory Overlap and Compliance Strategy

A systematic mapping of IEC 62304 AMD 1 and EU AI Act requirements reveals substantial overlap in four areas: (1) risk management — 62304 Clause 4.2 risk management activities align closely with EU AI Act Article 9; (2) software development planning — 62304 Clause 5.1 maps to EU AI Act Article 17 (quality management system); (3) configuration management and traceability — 62304 Clause 8 maps to EU AI Act Annex IV paragraph 2 (technical documentation); (4) problem resolution — 62304 Clause 9 maps to EU AI Act Article 72 (post-market monitoring and serious incident reporting). Where the two frameworks diverge, the EU AI Act is more prescriptive about data governance and the 62304 is more prescriptive about software architecture classification [1][2].

Table I. Compliance overlap matrix: IEC 62304 AMD 1 clauses mapped to EU AI Act articles. 'Direct' = same evidence satisfies both; 'Partial' = evidence addresses both but needs augmentation; 'Unique' = no overlap.
TopicIEC 62304 AMD 1EU AI ActOverlap Class
Risk managementClause 4.2Article 9Direct — shared risk file
Software development planningClause 5.1Article 17 (QMS)Direct — QMS covers both
Dataset governanceSOUP documentation (AMD 1)Article 10 (mandatory)Partial — 62304 covers SOUP; Act adds bias assessment
V&V strategyClause 5.7 (software testing)Article 9(6) V&V methodsPartial — 62304 V&V plan needs AI-specific extension
Configuration managementClause 8Annex IV §2 (technical docs)Direct — CM log satisfies both
Post-market monitoringClause 9 (problem resolution)Article 72 (incident reporting)Partial — 62304 is reactive; Act requires proactive monitoring
ExplainabilityNot addressedArticle 13 (transparency)Unique to EU AI Act
Human oversightNot addressedArticle 14 (mandatory for high-risk)Unique to EU AI Act

3. Dataset Governance

Dataset quality is the primary determinant of neural network safety in the absence of formal verification. EU AI Act Article 10 requires that training, validation, and test datasets be: (a) relevant and representative of the intended ODD; (b) free from known errors and complete with respect to the intended purpose; (c) assessed for biases that could affect safety-relevant outputs; (d) governed by documented data management practices. IEEE 2801-2022 [4] provides a recommended practice for medical AI datasets that is substantially compatible with Article 10 and can be adopted as a dual-compliance artefact.

For a medical imaging application (e.g., retinal OCT segmentation for diabetic macular oedema grading), the dataset governance record documents: dataset provenance (clinical site, scanner model, acquisition protocol), demographic distribution (age, sex, disease severity), ground truth labelling protocol (grader qualification, inter-rater agreement), known acquisition artefacts and their prevalence, and train/validation/test split rationale. The record is version-controlled alongside the model weights to maintain traceability throughout the software lifecycle.

4. Verification and Validation Strategy

IEC 62304 AMD 1 Clause 5.7 requires software testing appropriate to the safety class. For Class C software (where failure could result in death or serious injury), comprehensive coverage of software requirements is required. For neural networks, traditional structural coverage criteria (statement, branch, MC/DC) are not meaningful at the source code level — the 'code' is effectively the trained weights. Coverage must instead be defined at the model behaviour level, drawing on neural network coverage criteria from the research literature [6][7].

4.1 Neuron Activation Coverage

Pei et al.'s DeepXplore [6] introduced neuron coverage (NC) as the fraction of neurons activated above a threshold across a test suite. NC has been extended to k-multisection neuron coverage (KMNC), neuron boundary coverage (NBC), and strong neuron activation coverage (SNAC), each targeting different regions of the activation space. For safety-critical applications, a minimum KMNC of 85 % across the test dataset — combined with adversarial testing covering the ODD boundary conditions defined in the safety contract — provides a defensible V&V claim.

4.2 Scenario-Based and Adversarial Testing

Scenario-based testing validates network performance across the ODD by constructing a test matrix covering all combinations of ODD parameters at their boundary values. For a retinal grading model, this matrix includes scanner models not seen in training, image quality grades (excellent / acceptable / degraded), demographic subgroups, and co-morbidity combinations. Adversarial testing using FGSM (Fast Gradient Sign Method) and PGD (Projected Gradient Descent) perturbations at magnitudes calibrated to the sensor noise floor provides evidence that the model is not sensitive to physically plausible input variations.

Table II. V&V evidence requirements by IEC 62304 software class and EU AI Act risk level. Class C / High-Risk requires the full evidence set.
Evidence Type62304 Class A62304 Class B62304 Class C / EU AI Act High-Risk
Unit testing (module level)RequiredRequiredRequired + coverage target documented
Neuron coverage reportNot requiredRecommendedRequired — KMNC ≥ 85 % recommended
Adversarial robustness testingNot requiredRecommendedRequired — bounds defined in safety contract
Scenario matrix test reportNot requiredRequiredRequired — full ODD coverage matrix
Bias and fairness assessmentNot requiredNot requiredRequired — EU AI Act Art. 10(2)(f)
Human oversight verificationNot requiredNot requiredRequired — EU AI Act Art. 14
Post-market monitoring planReactive onlyReactive + periodic reviewProactive — continuous performance monitoring

5. Technical Documentation and Conformity Assessment

EU AI Act Annex IV specifies eight categories of technical documentation required for high-risk AI systems. For a medical device manufacturer also complying with IEC 62304, the existing technical file structure (design history file, software development plan, risk management file, V&V report) provides a foundation that covers approximately 70 % of Annex IV content. The remaining 30 % consists of AI-specific documentation: (a) a general description of the AI system including its intended purpose within the device (Annex IV §1), (b) a description of the elements of the AI system and of the process for its development including algorithm selection rationale (§2), (c) a detailed description of monitoring, functioning, and control of the AI system (§5), and (d) a description of the changes made to the system over its lifecycle (§8) [2].

Conformity assessment for high-risk AI systems in the EU requires involvement of a Notified Body. For medical devices, the Notified Body assessing the Medical Device Regulation (MDR) technical file is the natural choice for the AI Act conformity assessment — reducing duplication and enabling joint audit planning. Manufacturers should initiate pre-submission dialogue with their Notified Body at the architectural design phase, not at submission, to surface AI-specific evidence gaps early.

6. Conclusion

A dual-compliance strategy for IEC 62304 AMD 1 and EU AI Act high-risk requirements is achievable without doubling engineering effort, provided the compliance mapping is established at the project planning phase. The four highest-leverage actions are: (1) adopt IEEE 2801-2022 as the dataset governance standard — it satisfies both 62304 SOUP documentation and EU AI Act Article 10 simultaneously; (2) extend the 62304 V&V plan with neuron coverage and adversarial testing criteria; (3) structure the technical file to include Annex IV sections 1, 2, 5, and 8 as addenda; (4) engage the Notified Body early for joint audit planning [4][2].

As neural networks migrate from Class B to Class C medical device functions — and as the EU AI Act's full enforcement takes effect — the engineering burden of evidence generation will become a competitive differentiator. Organisations that build compliance evidence into their ML development pipeline from the outset — through automated dataset provenance tracking, continuous coverage measurement, and integrated risk management — will certify faster and maintain certification through model updates at a fraction of the cost incurred by organisations that treat compliance as a post-development activity.

References

  1. [1] IEC 62304:2006/AMD 1:2015 – Medical device software – Software life cycle processes. International Electrotechnical Commission, Geneva, 2015.
  2. [2] European Parliament. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (EU AI Act). Official Journal of the European Union, L 2024/1689, 2024.
  3. [3] FDA. 'Artificial Intelligence and Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD): Action Plan.' U.S. Food and Drug Administration, Silver Spring MD, January 2021.
  4. [4] IEEE 2801-2022 – Recommended Practice for the Quality Management of Datasets for Medical Artificial Intelligence. IEEE Standards Association, Piscataway NJ, 2022.
  5. [5] ISO/IEC 25010:2023 – Systems and Software Engineering – Systems and Software Quality Requirements and Evaluation (SQuaRE) – Product quality model. ISO/IEC JTC 1/SC 7, 2023.
  6. [6] Pei, K., Cao, Y., Yang, J., Jana, S. 'DeepXplore: Automated Whitebox Testing of Deep Learning Systems.' Proceedings of SOSP'17, ACM, 2017.
  7. [7] Sun, Y., Huang, X., Kroening, D. 'Testing Deep Neural Networks.' arXiv:1803.04792v3, 2018.
  8. [8] Wicker, M., Huang, X., Kwiatkowska, M. 'Feature-Guided Black-Box Safety Testing of Deep Neural Networks.' Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Springer, 2018.
  9. [9] ISO/IEC TR 24028:2020 – Information technology — Artificial intelligence — Overview of trustworthiness in artificial intelligence. ISO/IEC JTC 1/SC 42, 2020.
  10. [10] IMDRF. 'Machine Learning-enabled Medical Devices: Key Terms and Definitions.' International Medical Device Regulators Forum, Document N66, 2022.
  11. [11] Kurd, Z., Kelly, T., McDermid, J. 'Establishing Safety Criteria for Artificial Neural Networks.' Proc. 11th Australian Workshop on Safety Critical Systems and Software (SCS), 2006.