The Saifr sponsored whitepaper – From Caution to Action: How Advisory Firms are Integrating AI in Compliance – published in November, had several key themes surrounding the adoption of generative AI (GenAI) enabled technologies for compliance functions by advisors and wealth management companies. We recently covered the theme of in-house versus vendor-supplied solutions in an interview with Saifr CEO and Co-founder, Vall Herard, and examined reticence towards GenAI adoption in an interview with Strategic Risk Advisor, Jon Elvin.
In this third interview with Harsh Pandya, Vice President of Product Management at Saifr, we explore some of the key implementation considerations and risks of adopting GenAI and large language models (LLMs) for compliance within advisory and wealth management firms.
RegTech Insight: Perhaps we could start by describing the attributes of an appropriate use case for large language models (LLMs) in compliance?
Harsh Pandya: Sure, I think it’s important to differentiate between developing GenAI and fine-tuning large language models for a specific use case versus using off-the-shelf models to accomplish tasks that are currently manual or effort intensive. If you’re developing, especially in risk and compliance, you’re likely working in a sandbox environment. At this stage, you’re probably not ready to announce the productization of GenAI because you need to consult with auditors, examiners, and supervisors before announcing a new GenAI capability.
You’re looking for low-hanging fruit—areas with rich, historical data that can be leveraged to evaluate outcomes. Ideally, you want to focus on back-office or middle-office use cases, not customer-facing ones, because those introduce significant risks. These use cases should involve some room for creativity since GenAI generates content, rather than performing a structured task that could be more accurately handled by robotic process automation (RPA). GenAI is for complex tasks where personalization and creative output are possible.
When it comes to procuring GenAI and LLMs, firms are balancing being competitive with being conservative. They’re looking for low-risk opportunities where there’s significant cost reduction potential. Why take on the risk unless you’re saving millions or tens of millions? But I also think we need to be careful with the messaging around AI because people fear being replaced by technology.
One common but problematic use case I’ve seen is trying to replace first-line screening with AI—automating risk disposition and case management. The issue is that this approach assumes integrity in every step that precedes it. For example, if you’re using an AI solution off-the-shelf, it’s critical to understand how the data was generated before AI steps in. How was the policy designed to generate the data that triggers alerts? If you’re relying on a vendor that promises 98% reduction in false positives, it’s important to ask how they achieved that.
For AI to work effectively, you need to understand the entire data generation process, so it’s important to take a holistic view when deciding on the right use cases. Efficiency is great, but we also need to ensure the solution is effective.
RegTech Insight: It’ll be interesting to see how agentic AI evolves. GenAI could potentially elevate RPA to the next level by adding intelligence around business process management (BPM) or RPA.
Harsh Pandya: That’s true, but there’s risk in stacking models on top of each other. You have input, process, and output. If the input is biased, that bias carries through to the output, which then becomes input for another model. This can cascade bias throughout the system. That’s why it’s crucial to work in data-rich environments where you can thoroughly assess the data’s representativeness and understand the attributes of your inputs.
By quantifying biases, whether human cognitive bias or selection bias, you can assess whether the output aligns with real-world expectations. If the data matches up, you’re in a good place. From there, model monitoring becomes crucial, tracking data drift and assessing whether you have a robust AI solution or something that might degrade over time.
RegTech Insight: Given the nature of GenAI, should we think about setting goals and KPIs differently? How can we effectively measure the performance of a deployed solution?
Harsh Pandya: It really depends on your desired outcome. If your goal is to assess reasoning, that’s a tricky thing to measure. But the key is that reasoning should serve as a foundation for something—maybe content generation. As long as it helps the marketer, analyst, or customer-facing person build on that foundation, it’s doing its job.
For other use cases, like detection rates or case resolution, those are binary outcomes, and you can evaluate them using the same KPIs you’d use for machine learning and AI, such as confusion matrices and false positive rates. When evaluating GenAI or LLM solutions, it’s important to model the dependent variable correctly. If you’re running an evaluation, it should be designed to measure effectiveness relative to the dependent variable.
For example, when evaluating our adverse media screening technology, SaifrScreen, one test might focus on recall (how many bad actors we find), and another on precision (how accurate the alerts are). The second test is straightforward. But the recall test can be misleading because it ignores false positives. If they give us 200 samples, 100 of which are bad actors, and we identify 80, we don’t know if that’s good or bad without context. If the true positive rate is 2% in the broader population, finding 80 bad actors might actually be a success, indicating a significant improvement over the current process.
This highlights how important data literacy is, even for sophisticated buyers. Setting up evaluations and comparing systems correctly is a complex task. It’s like giving someone a calculator—they need to know how to use it and interpret the results. Without that understanding, decisions can be based on qualitative assessments that may be great for a power user but not best for the organization.
RegTech Insight: Many firms are concerned about explainability, data security, and hallucinations with GenAI. What additional guardrails might be needed to help mitigate these concerns, and how does Saifr address them?
Harsh Pandya: GenAI is still a relatively new area in terms of holistic risk management frameworks. I see a desire among organizations to better understand the technology. AI decisions should be made with input from cybersecurity, data privacy, and data governance experts, as well as data scientists who understand model risks. Transparency and interpretability are often cited as complaints, so it’s important to have the right people involved in the decision-making process.
On the risk side, concerns about data privacy—like personally identifiable information (PII) leaks or data being leaked into open-source models—are real, but they’re manageable. At Saifr, we offer dedicated instances of our SaaS solutions, or we can deploy models on a customer’s cloud. We’ve partnered with Microsoft Azure to offer model-as-a-service options, where system integrators can host our models locally. This ensures that customers maintain full control of their data, which helps build confidence and address security concerns.
RegTech Insight: Many smaller and medium-sized firms find the idea of a holistic regulatory risk management framework and self-assessment daunting. How do you recommend they get started, and what does a realistic roadmap look like?
Harsh Pandya: There are some broad frameworks for AI risk management that are fairly recent. The National Institute of Standards and Technology (NIST) released one in 2024, and financial institutions, along with others in compliance and risk management, are adopting versions of these. The AI-specific elements are incremental changes from what already exists with data science and machine learning, with added considerations for privacy, security, and data reversibility.
RegTech Insight: We’ve covered most of the key points so is there anything else you’d like to share in closing?
Harsh Pandya: A solid understanding of the data generation process and reliance on general-purpose models is key. If you have a solid grasp on these areas, you’ve already covered ~60-70% of what you need for an AI risk management framework. If you’re using cloud-based solutions, much of the risk will be managed by your vendors. But you still need to do your due diligence and ensure the right stakeholders are involved to minimize risk.
RegTech Insight: That’s great, thanks for sharing your insights today.
Harsh Pandya: Happy to help.
Subscribe to our newsletter