In this article, we share how we built a global real-time anti-fraud system that significantly enhanced our fraud detection capabilities while reducing costs.
A real-time Anti-Fraud system is crucial for identifying and preventing fraud quickly, especially in fast-moving environments like food delivery. By using machine learning, data analysis, and instant data processing, the system can immediately examine how users behave and what their transactions look like. The system usually includes several parts such as an orchestrator to manage data flow, a real-time feature store to calculate and maintain critical data points, an ML inference server to identify anomalies and predict fraud, and a rule engine to enforce predefined business rules. Together, these parts allow the system to make fast decisions whether to approve, reject, or further investigate transactions helping protect against fraud while maintaining a smooth user experience.
The Challenge of Fraud in Food Delivery
Fraud in food delivery services can originate from various sources, each posing unique challenges:
- Customers: Some customers create multiple accounts to exploit new-customer vouchers, or falsely claim compensation for damaged or missing food.
- Vendors: Certain vendors may engage in fraudulent activities, such as falsely claiming “customer no-show” for cash-paid pick-up orders to avoid penalties.
- Riders: Riders might collect cash from customers and then cancel the order, pocketing the money and leaving the customer without their food.
- Internal Actors: Payout fraud can occur within the company, where insiders manipulate the system to issue unauthorized payments to riders or themselves.
Customers can sometimes exploit the system through various fraudulent activities which can broadly be grouped into the following.
- Identity Fraud: Creating new accounts to abuse new customer vouchers.
- Compensation Fraud: Falsely claiming compensation for damaged or spilled food.
- Payment Fraud: Placing Cash on Delivery (COD) orders without the intention of paying or using stolen online payment methods like credit cards or e-wallets to place orders.
When handling payment fraud, the system needs to quickly figure out the right action for each transaction. For online payments, it must decide whether to accept, reject, or challenge the transaction using 3D Secure (3DS) authentication. For cash transactions, the system should determine whether to approve or decline the order. Because there are so many transactions happening and decisions need to be made instantly, a real-time system is crucial. It must analyze and act within milliseconds to effectively spot and prevent fraud.
Background
At Delivery Hero, our Anti-Fraud Service initially relied on a third-party fraud prevention platform, which provided rule management, machine learning infrastructure, and fraud case investigation capabilities. However, through our experience with the platform, we identified several limitations that hindered our future growth plans.
Key challenges included:
- Lack of on-premise deployment options
- High hosting costs
- Reliance on third-party engineering for customization and investigations
- Product feature limitations
We realized that having full ownership of the platform would allow us to address these issues. By building our own solution, we could prioritize features based on our specific needs, gain greater flexibility, enhance scalability, and significantly reduce costs. Taking full control of the Anti-Fraud Service would enable us to create a more tailored and future-proof product.
Our Approach
With this in mind, our goal was to set up a new Anti-Fraud system that should be able to evaluate transactions for all fraud types such as payment, voucher, refund, rider, and vendor fraud.
Some high-level technical requirements for the new system included:
- Real-time fraud evaluation with low latency
- Ability to calculate complex features on historical data
- Machine learning infrastructure for models created by Data Scientists
- Interfaces for creating and managing fraud rules and for investigating individual fraud cases
The diagram below illustrates the fraud prevention process that occurs when a customer attempts to pay for their order. The system is designed to evaluate the transaction in real-time, using a combination of machine learning models, rule-based logic, and behavioural data analysis to determine whether the payment should be approved, blocked, or subjected to additional verification through mechanisms like 3D Secure (3DS).
Orchestrator
In the fraud prevention system, the Orchestrator serves as the central hub that coordinates the entire transaction evaluation process. When a customer attempts to make a payment, the Orchestrator first receives the transaction request and initiates the fraud detection workflow. It pushes order-related events to the Feature Store, where predefined features – such as transaction history and customer behavior – are calculated and updated with the latest data. These features are crucial for assessing the risk of the transaction. Next, the Orchestrator calls the Machine Learning Engine, which uses these features to generate a fraud score, indicating the likelihood of fraudulent activity. This score is then passed to the Rule Engine, where it is evaluated against a set of predefined rules tailored to specific fraud scenarios. The Rule Engine considers both the machine learning score and the outcomes of these rule checks to determine the transaction’s risk level. Based on this assessment, the Orchestrator makes a final decision: it either approves the transaction, blocks it if deemed too risky, or routes it through 3D Secure (3DS) for additional verification. This decision-making process ensures that the platform balances security with a smooth user experience, preventing fraud while allowing legitimate transactions to proceed efficiently.
Feature Store
The Feature Store in the diagram below is a central component where data scientists define and manage features crucial for fraud detection. Using a user interface (UI), data scientists specify which features will be used to evaluate transactions – these could include metrics like the average transaction amount, frequency of orders, or device usage patterns. Once defined, the Feature Store calculates the real-time value of these features by integrating data from the current order and historical events, ensuring that the features reflect the most up-to-date and accurate information. These calculated features are then utilized by the Machine Learning Engine and Rule Engine to assess the transaction, helping to determine whether it should be approved, blocked, or escalated for further verification.
Machine Learning Engine
The Machine Learning Engine in the diagram below is responsible for scoring transactions in real-time by leveraging features derived from both the current transaction request and those calculated by the Feature Store. When a payment attempt is made, the engine uses these real-time features to assess the likelihood of fraud, generating a prediction or fraud score. This score is then stored in a database along with the corresponding real-time features from the Feature Store, creating a rich dataset that supports continuous model retraining.
Over time, the stored predictions are matched with feedback from the operations team – who may validate whether a transaction was actually fraudulent or legitimate. This feedback loop is crucial for measuring the model’s performance, allowing for ongoing adjustments and improvements to the model’s accuracy and effectiveness in detecting fraud.
Rule Engine
The Rule Engine in the diagram below allows the Business Intelligence (BI) team to define specific business rules that determine how transactions should be handled based on the Machine Learning score and features provided by the Feature Store. These rules are crafted to align with the risk management policies and can dictate whether an order should be blocked, approved, or sent for further verification through 3D Secure (3DS). For instance, a rule might state that any transaction with a high fraud score and a certain set of feature criteria – such as an unusually high transaction amount or a flagged device – should be automatically blocked. Conversely, low-risk transactions might be approved immediately, while those with moderate risk are routed through 3DS for additional checks. This combination of real-time data, machine learning insights, and rule-based logic enables the platform to make nuanced, policy-driven decisions to protect against fraud while maintaining a seamless customer experience.
Monitoring and Alerting
Monitoring and Alerting in this system is crucial for ensuring real-time performance and security. The system uses OpenTelemetry data – such as metrics, traces, and logs – to capture essential performance indicators like requests per second (RPS), error rates, and latency. These metrics are continuously monitored, and any anomalies or performance degradation can trigger alerts, enabling engineers to quickly respond and mitigate potential issues before they impact the system. Additionally, this monitoring framework can detect significant trends in transaction decisions, such as an unusually high number of transactions being blocked, approved, or signs of a major fraudulent attack. When such patterns are identified, alerts are triggered in real-time, allowing engineers to take immediate action to address potential fraud or system failures, thus ensuring the platform remains secure and operational.
Tools and Technology
- AWS Kafka: It serves as the data backbone, streaming real-time events for order creation, updates, payment status changes, and chargebacks to specific topics. These topics are then consumed by other microservices, enabling seamless, real-time coordination across the platform. Its scalability and reliability ensure smooth operations even under high transaction volumes.
- AWS DynamoDB: It stores timestamp-based feature values calculated from Kafka events as and when new events are pushed. Its high performance enables fast, reliable access to both real-time and historical data, which can be utilized by Machine Learning Engine and Rule Engine.
- Camunda Rule Engine: It enables the definition and management of business rules based on real-time features from the Feature Store. This capability facilitates automated decision-making to determine actions such as blocking, approving and reviewing transactions based on specific criteria. By using the Decision Model and Notation (DMN), Business Ops can easily create and modify decision tables, ensuring that decisions align with business policies.
- Kubeflow: It enables continuous retraining of machine learning models in the Anti-Fraud system by automating the model development workflows, which are then scheduled to run at regular intervals. Each workflow generates model artifacts which are passed to the ML inference server.
- Grafana and OpsGenie: It not only provides comprehensive visibility into application performance and infrastructure metrics but also enables real-time monitoring of transaction rules, tracking how transactions are blocked, reviewed or approved based on fraud risk. If the rate of blocked transactions exceeds a predefined threshold, an alarm is triggered and sent via OpsGenie, ensuring that the relevant team members are notified immediately.
Achievements
Scalability
In 2024, We evaluated upwards of 150 million fraud detection requests every month, demonstrating the scalability and efficiency of our Anti-Fraud System.
PS: We are currently live on Foodpanda, Foodora, PedidosYa, Yemeksepeti, Hungerstation, Talabat and Glovo.
If you like what you’ve read and you’re someone who wants to work on open, interesting projects in a caring environment, check out our full list of open roles here – from Backend to Frontend and everything in between. We’d love to have you on board for an amazing journey ahead.