5 minute read 13 Jul 2021
Low Angle View Of Fairy Lights

How do you combine quality and quantity to make the most of big data?

By Jean-Noël Ardouin

Partner, Consulting, Risk & Actuarial in Financial Services | EY Switzerland

Committed to delivering exceptional client service. Passionate about teaming and coaching. Husband, father and avid trail runner.

5 minute read 13 Jul 2021

Artificial intelligence is a strategic opportunity for financial institutions – but they need to invest in data quality.

In brief
  • Artificial intelligence like machine learning can help banks improve data quality – for better insights.
  • Data quality management can help at every stage of the data lifecycle.
  • Banks should seek to discover AI use cases across their operations.

As the volume of data explodes, big data remains a huge topic on banks’ board agendas. Data-driven technology promises greater efficiency, cost savings and more customized solutions. With this in mind, embracing AI technologies is no longer a choice but a strategic imperative for the financial industry. 

The paradigm shift in the way banks work is not without its challenges, though. Technological readiness is a key prerequisite for the applications that are transforming the banking landscape. CEOs need to invest generously now to prepare for the future, including in data quality management. After all, any added value from data – like a predictive model allowing a company to optimize its business parameters – hinges on the quality of the underlying data. Even the largest data set is useless if it’s riddled with unaddressed issues. Good data quality management is built on proper classification and understanding of the possible quality issues, meticulous inspection of data quality among the different data sources, and intelligent approaches to fix gaps and errors at their source.

Amid all the euphoria at the potential of data and analytics, it’s important to remember that quality is just as important as quantity.
Jean-Noël Ardouin
Partner, Financial Services Consulting, Risk & Actuarial | Switzerland

Let’s start with the basic question of what makes good quality data. While it ultimately depends, all applications reply on data that is fit for (its) purpose. To meet this requirement, data needs to be accurate, relevant, consistent and accessible.

  • Accurate

    Accuracy refers to the intrinsic data quality. The data should be correct and complete.

  • Relevant

    Relevance is about the contextual data quality. The data should be useful for the purpose, i.e. data quality must be considered within the context of the task at hand.

  • Consistent

    Consistency relates to representational data quality. In other words, the data should be presented in such a way that it can be interpreted, easily understood and standardized.

  • Accessible

    Accessibility means that the required data is available to the right users at the right time.

Any breach of these data quality principles results in a data quality issue. An appropriate data quality management must detect and, if possible, remediate any such issues as they appear at any stage in the data lifecycle. 

Data quality issues are not always easy to differentiate from outliers that are attributable to a specific business reason. That’s why it’s important to have solid background knowledge of the business before attempting any data quality management exercise. In our experience, data quality issues tend to fall into the following categories:

  • Missing data, i.e., data gaps with random, completely random or non-random causes
  • Global duplicates, i.e.,  double entries for what should be one distinct entry
  • Local (field-related) quasi-duplicates, i.e.,  unintentionally near-identical input in free text fields
  • Local (field-related) outliers, i.e., values which deviate significantly from what we can reasonably expect
  • Global outliers/anomalies, i.e., atypical data points other than those with an underlying business cause

Artificial intelligence (AI) promises exciting potential not only in mining data to deliver valuable insights but also in addressing the quality of data. One subset of AI is machine learning (ML), an algorithmic system that can recognize patterns and learn without the need for explicit programming. Machine learning is a useful way to address data quality problems and support active data quality management at every stage of the data lifecycle.

At the creation phase, for example, data quality machine learning (DQ-ML) methods may be applied to onboarding when the basic data (i.e., client name, client type, client gender, address, country, client specific features) is collected and entered into the financial institution’s IT system. If the missing values corresponding to required information should normally not be allowed by the process, machine learning methods can be applied to detect inconsistencies in the client profile (e.g., between the provided addresses and countries), therefore improving the quality of data for any subsequent aspect of customer relationship management or flagging any suspicious/erroneous client data.

Later in the data lifecycle, DQ-ML methods, using various unsupervised and supervised techniques as well as natural language processing, may be developed in order to detect inconsistencies, duplicates and missing values in transaction data. This enables more accurate monitoring of suspicious transactions, for example, and can help financial institutions flag potential compliance issues in areas like money laundering, market manipulation or insider trading.

Another example is the area of credit risk models, where the discriminatory power of a model computing the probabilities of default can be improved by applying DQ-ML methods. By detecting (and potentially also remediating) data quality issues, organizations benefit from significantly improved model performance and can also, for example, calculate their regulatory capital more accurately.

AI enables computers to perceive, learn, reason. Today, the prerequisite of good data quality remains but the possibility of addressing quality issues directly at source using AI is increasing. 

Summary

For all the euphoria at the potential of data and analytics, it’s important to remember that quality is just as important as quantity. In our era of big data, proper data quality management should be high on the board agenda. Many financial institutions still face various data quality issues in datasets across their front-to-back value chain. We explore the most common issues and look at how to tackle them.

Acknowledgements

We thank Thibaut Vernay for his valuable contribution to this article. 

About this article

By Jean-Noël Ardouin

Partner, Consulting, Risk & Actuarial in Financial Services | EY Switzerland

Committed to delivering exceptional client service. Passionate about teaming and coaching. Husband, father and avid trail runner.