Fairness, transparency, integrity and competition are essential for managing public funds. We rely on departments to choose the best value from the private sector.

Efficient public procurement improves services, infrastructure, and the economy. It must also be accountable to the public by protecting financial loss from fraud, waste, abuse, and error.

One thing is true: fraudsters target opportunities where the lack of oversight gives them the upper hand in getting away with their crime. Public procurement systems often lack internal control rigor that would leverage continuous risk monitoring using advanced analytical processes to identify and prevent financial loss. Fraud is easy to detect if you know what to look for. But how can fraud be detected when the transactions are rare or unusual, or perhaps hiding in plain sight?

AI can bring numerous benefits to public procurement acquisition processes. When it comes to identifying rare, unusual, or hidden suspicious events, AI can help using machine learning and synthetic data creation using generative AI (GenAI). These must-have techniques will become essential for anomaly detection and fraud prevention.

AI’s evolution to GenAI and synthetic data

While AI has played a crucial role in detecting and preventing fraud with machine learning and natural language processing, GenAI offers even greater capabilities, like content generation, known as synthetic data.

Gartner defines synthetic data as data generated by applying a sampling technique to real-world data or by creating simulation scenarios where models and processes interact to create completely new data not directly taken from the real world. Gartner goes on to say that synthetic data will overshadow real data in AI models by 2030. It can help overcome the challenges that real-world data presents.

One challenge that is often encountered when analyzing procurement data containing invoice and payment information is the inability to identify rare suspicious behavior that is unknown or unprecedented. Some examples of this type of behavior in the data include the following scenarios:

  • An employee accepts two invoices from the same vendor on the same day, each just below the approval limit. This rare behavior may indicate an attempt to avoid detection.
  • Splitting multiple invoices into a higher number of intervals and amounts to cover up unexplained cash payments to a vendor.
  • Change order anomalous behavior where an unrealistic low price replaces the original with a change order after contract.

Not only are rare events like these often hard to identify, but when they are identified, not enough real data exists to train and validate machine learning models so you can create predictive models that then detect similar behavior in new transactions.

Synthetic data has a role in government procurement fraud detection by increasing the signals from these rare events which has the potential to improve efficiency, reduce financial loss, and maintain public trust by enabling quicker and more accurate fraud detection.

From anomalies to action

Machine learning enables anomaly detection, while synthetic data helps predict and model rare event behavior with greater precision.

A common machine learning technique used to find these rare events is Isolation Forest. If you can identify behavior that is out of the norm and then validate that it should not be happening, you now have an example of data that you can “label” as suspicious behavior.

Once data elements are detected as anomalous and identified as suspicious, GenAI techniques can up-sample these data points and create a new synthetic data representation of the original data. This boosts the suspicious signals of anomalous transactions with more representation in the synthetic dataset.

The process ensures that the synthetic data maintains the statistical properties and patterns of the non-fraud original data along with more suspicious data examples that can support predictive models to be built from the new data. This bridge from anomaly detection to predictive modeling with GenAI synthetic data creation can offer an option for increasing the detection capability of rare events.

A technique for synthetic data creation

One GenAI technique for generating synthetic data is the synthetic minority oversampling technique (SMOTE). With SMOTE, it’s possible to create synthetic data points that represent the identified rare events. This process involves generating new data points that are similar to the rare events but not identical, thereby increasing the representation of the suspicious behavior, also known as the minority class. In this way, SMOTE balances the data set with an improved representation of suspicious and non-suspicious behavior.

Synthetic data enhances the representation of rare events, which helps move from anomaly detection to predictive modeling. With more data labeled as suspicious, predictive modeling techniques will have a better representative sample to use as the target variable for predictive model development. This allows for quicker deployment of predictive capability that can be used to identify similar behavior in new data. By using synthetic data, organizations can create robust models that identify potential fraud in real time, preventing financial losses and improving overall detection capabilities.

Future of fraud detection in procurement

GenAI and synthetic data are accelerating the improvement of fraud detection in public procurement. These technologies can create more robust, accurate and efficient ways to detect and prevent fraud. Ultimately, this will not only save public funds but also enhance trust in public procurement processes.

The synergy between synthetic data and emerging techniques will enable the creation of even more sophisticated and realistic synthetic data sets, further pushing the boundaries of what is possible.

Moreover, the advancement of AI to GenAI with synthetic data offers scalability and flexibility that allows for continuous improvement and adaptation to new fraud tactics.

Learn more about data and AI for procurement

Check out the e-book Public procurement integrity

Share

About Author

John Stultz

Government Fraud Solutions Specialist, Security Intelligence, SAS

John Stultz is a Government Fraud Solutions Specialist for the Security Intelligence practice at SAS. His experience includes implementing architectures to deploy comprehensive data pre-processing, data validation and data management strategies that support the fusion of data and the ability to use advanced analytics to detect fraud. When he is not working with the Federal Government he can be found traipsing through the Blue Ridge Mountains with his wife and three boys.

Leave A Reply

Back to Top