Close
RegTech

From AI Sprint to AI Live Testing – FCA Leads the Way

From AI Sprint to AI Live Testing – FCA Leads the Way


The UK Financial Conduct Authority (FCA) recently published its Proposal For AI Live Testing engagement paper outlining a 12-month pilot inside the regulator’s AI Lab. This is designed to enable firms with production-ready AI models to trial them in controlled live-market conditions, under close FCA technical and supervisory support, exploring output-driven validation, model-risk safeguards and other evaluation metrics. The engagement paper invites market comments on the pilot’s scope, eligibility criteria and safeguards, all aimed at accelerating safe, responsible AI adoption that delivers demonstrably positive outcomes for UK consumers and markets. This comment period closes in June and the pilot is expected to be ready for service later in the summer.

This latest proposal follows the two-day AI Sprint the FCA held in January that brought together 115 experts from financial services, technology, and academia with the objective of identifying barriers to safe, responsible, and scalable AI adoption across the regulated sector.

Four common themes emerged during the sprint (see AI Sprint Summary).

Regulatory clarity: Participants asked the FCA to map existing obligations (Consumer Duty, SM&CR, outsourcing, data protection, etc.) onto AI use-cases and publish targeted guidance so firms can innovate without second-guessing the rules. They proposed a single, publicly available “AI rulebook” or FAQ that pulls dispersed requirements into one place, cutting the time compliance teams spend interpreting fragmented mandates.

Trust & risk awareness: Adoption hinges on evidence of safety – boards and consumers need hard proof of testing, bias controls and human-in-the-loop escalation before green-lighting new AI products. Delegates urged the FCA to spell out minimum assurance metrics (fairness, explainability, robustness) and share exemplar documentation firms can reuse in audits.

Collaboration & coordination: Stakeholders called for a permanent multi-agency forum – regulators, firms, vendors, academia – to co-design data standards and circulate sandbox learnings. They also highlighted cross-border engagement with the Bank of England and international standard-setters as vital to avoid duplicative testing regimes and regulatory divergence.

Safe AI innovation through sandboxing: The Sprint endorsed a ‘supercharged sandbox’ offering compute, synthetic datasets and real-time supervisory feedback for production-ready models to be trialled in a ring-fenced setting.

The AI Sprint was a practical examination focused on gaps. Data quality and explainability emerged as key issues and areas for concern since they underpin the viability of real-time AI deployments, the trust required for wide-scale adoption and accountability under the Senior Managers & Certification Regime (SM&CR), among others.

Fragmented data foundations

Participants agreed that scalable and safe AI innovation is not possible without robust foundational infrastructure. Key requirements include clean data pipelines, consistent lineage, interoperable cloud systems, and traceable model inputs and outputs. Several teams emphasised the need to “get the basics right” before launching more advanced use cases. Without these fundamentals in place, financial institutions risk building high-impact systems on unstable, opaque foundations.

The Sprint made it clear that poor data quality, weak lineage, and inconsistent schemas can quickly become a supervisory issue. These weaknesses compromise audit trails, undermine SM&CR accountability, and increase the likelihood of enforcement where model decisions result in poor outcomes.

Lack of shared data standards

Sprint participants highlighted the absence of common data taxonomies as a major obstacle to collaboration and oversight. Without harmonised field definitions, industry-wide benchmarks become unreliable, and testing within sandboxes lacks comparability. Firms called for sector-level alignment on data standards to enable efficient validation, reduce duplication, and facilitate cross-firm comparisons during regulatory review.

Lack of shared data models slows everything – from internal testing to regulatory review. Standardisation reduces rework, lowers integration costs, and makes AI initiatives more portable and resilient.

Bias & representativeness in training data

Another key theme was the risk of bias in historical datasets. If training data reflects discriminatory patterns, AI models can replicate or even amplify those harms. Participants flagged bias mitigation and synthetic data generation as essential areas for further testing. For firms subject to the Consumer Duty or anti-discrimination rules, this will be a critical area of model validation and documentation.

Explainability and transparency emerged as central concerns across all use cases. The consensus was that end-users – whether customers or compliance teams – must be able to understand how an AI model reaches its decisions. In cases involving financial products or eligibility outcomes, this may include the ability to escalate to a human decision-maker. The FCA is expected to assess not only the availability of explainability artefacts, but also the firm’s ability to demonstrate consistent internal understanding of the model’s logic.

Opaque AI logic and black-box models are no longer acceptable. Documentation, visualisation, and escalation procedures must be embedded into AI workflows from the outset. Compliance teams should stress-test whether model decisions can be explained to non-specialist stakeholders and to regulators on request.

Input–output validation blind spots

The sprint raised questions around input and output validation. How can firms prove that the data used to train and run AI systems is accurate, representative, and complete? What metrics demonstrate that model outputs meet regulatory expectations? Compliance teams need clearly defined thresholds, alerting mechanisms, and documentation to assure both internal stakeholders and external supervisors.

The FCA has signalled interest in output-driven validation, where firms monitor live behaviour using clearly defined performance and fairness metrics. Several Sprint teams recommended the use of dashboards, triggers, and automated stop-controls to manage drift or unexpected outputs. Assurance needs to happen at runtime, not just in retrospective audits or annual reviews.

Finally, the Sprint confirmed that trust is not an abstract principle – it is a core operational enabler. Firms and consumers alike will struggle to adopt or approve AI systems without reliable assurances over data quality, model transparency, and governance. Senior managers under SM&CR cannot sign off on model use without confidence in the systems and controls that support them.

Subscribe to our newsletter



Source link