Sanctions screening threshold calibration: Meeting regulatory expectations with evidence

Regulators increasingly expect their supervised entities to demonstrate not only that sanctions screening controls exist, but that threshold calibration decisions are documented, risk-based, and defensible. 

Why threshold calibration is now a regulatory expectation

For years, screening threshold decisions were made largely at the operational layer, where compliance teams adjusted sensitivity settings to manage alert volume with limited visibility into the downstream effects on detection capability. That approach is becoming increasingly difficult to defend. 

The European Banking Authority’s Guidelines on Restrictive Measures, which came into force at the end of 2025, are clear that the calibration of screening parameters is now a regulatory expectation. 

The EBA guidelines state that: 

“Calibration should be neither too sensitive, causing a high number of false positive matches, nor insufficiently sensitive, leading to designated persons, entities and bodies not being detected or free-format information not used for other restrictive measures.” 

They further require that entities to “calibrate the degree of fuzzy matching in their screening system.” 

Entities must be able to:  

  • Document how threshold settings align with their stated risk appetite 
  • Demonstrate the methodology behind tuning decisions 
  • Show that those decisions have been reviewed independently of system vendors and operational teams. 

The same expectations run through FATF guidance and have been a consistent theme in supervisory findings globally. Regulators want to understand how a screening system has been configured, and why. 

What makes threshold calibration so difficult to get right?

Alert thresholds tuned to operational capacity rather than documented risk tolerance remain one of the most commonly observed weaknesses in independent sanctions screening reviews.  

Raising a system threshold reduces the number of returned alerts; lowering it increases alert volume, at a cost to analyst time. Somewhere between those two points lies a defensible optimum, the setting at which a financial institution achieves its maximum practical hit rate without generating noise that degrades review quality.  

Finding that optimum requires evidence. The compliance challenge is that the consequences of a poorly calibrated system are largely invisible until something goes wrong.  

Screening failures typically stem not from system errors, but from name variation that pushes similarity scores below detection thresholds. The sanctioned party exists in the dataset. The system operates as configured. No alert is generated. 

How Threshold Analyser provides the evidence regulators expect

AML Analytics’ Threshold Analyser is designed to make that evidence available, without requiring multiple rounds of system testing. 

Working from the output file of a previously tested screening system, Threshold Analyser: 

  • Links each test record back to the matched name and the threshold score at which the match was generated.  
  • Calculates match rates across control records, manipulated records, and Clean IDs at every threshold level. 
  • Delivers a complete picture of how system performance shifts across the threshold range, and how those shifts translate into missed detections, false positives, and analyst workload.  

Threshold Analyser also allows you to model resource implications: enter your own cost and capacity assumptions to estimate what a threshold change would mean in terms of analyst time and operational expenditure. 

The result is an interactive, risk-based view of where the threshold should sit, and what the consequences of that choice look like across effectiveness, efficiency, and resource allocation, precisely the documentation regulators now require.  

Speak to the team

Find out how Threshold Analyser can support your next threshold calibration review and help you build the documented, evidence-based case your regulator expects.