EY helps clients create long-term value for all stakeholders. Enabled by data and technology, our services and solutions provide trust through assurance and help clients transform, grow and operate.
At EY, our purpose is building a better working world. The insights and services we provide help to create long-term value for clients, people and society, and to build trust in the capital markets.
Tech Trends: Synthetic data: artificial data; real solutions
In this podcast, Alexy Thomas, EY India Technology Consulting Partner, talks about synthetic data, which is being considered the future of the artificial intelligence space.
Podcast host Silloo Jangalwala, Associate Director, BMC, speaks to Alexy Thomas from Tech Consulting at EY India, addressing predominant questions surrounding synthetic data, its potential to become the future of AI, its capability to solve privacy concerns and whether it is a one-stop solution for all AI data needs.
Background: industries need large amounts of high-quality data to train new AI models. Because of emerging data privacy concerns and stringent regulations on data sharing, gathering, and accessing real and high-quality data is becoming difficult. Synthetic data is generated artificially, with or without the help of real data sets, for the purpose of training AI modules. This may address some of these problems faced with real data.
Key takeaways
While actual data may lack quality, volume, or variety, synthetic data can overcome these limitations and be generated in all the permutations and combinations of any given condition. Real data may also be unavailable for unseen conditions and events.
Synthetic data can better train AI models and test systems and help build better prototypes than real data sets.
It can also provide faster turnaround for AI testing, which requires large amounts of iterations and inputs. In the coming years, synthetic data is going to overshadow real data in AI models.
In sectors like financial services, synthetic data can help to evaluate market behavior and to develop new and innovative products, which is what large and small financial services organizations are trying to do.
Synthetic data comes with significant risks and limitations since the quality of synthetic data generated depends on the quality of the model that created it. So, if the input has errors or biases, the data generated using it will lead to false insight generation and, automatically, to erroneous decision-making.
Digitally generated data has the same predictive power as real data, as it replicates the statistical characteristics of the existing dataset. It can be generated for unseen conditions and events. Where actual data lacks quality, volume, or variety, synthetic data overcomes these weaknesses, as it is generated for unseen conditions.
Alexy Thomas
EY India Technology Consulting Partner
For your convenience, a full text transcript of this podcast is available below:
Silloo Jangalwala: Hello, this is Silloo welcoming you to a new episode of the EY Podcast series. As we have concluded the series on ESG, we are now moving to our next topic, technology. Here, we look at the most trending technology topics that India Inc. needs to know in its digitization journey. Today, we will look at synthetic data, which has become a favorite buzzword in the artificial intelligence space.
There is no doubt that the industry needs quality data to train new AI models. But with data privacy concerns and stringent regulations on data sharing, accessing real quality data is very difficult. Synthetic data tries to address these problems. Today, we are going to explore more about synthetic data. What is synthetic data and why it is called the future of AI? Will it solve the privacy concerns? Is it a one-stop solution for all your AI data needs? To answer these questions, we have with us today, Alexy Thomas, Technology Consulting Partner on data analytics and ESG at EY India, joining me.
Welcome to the podcast, Alexy. You are in the hot seat today.
Alexy: Thanks, Silloo, for the wonderful introduction. This is a very interesting subject, so I will try to answer your questions as much as I can.
Silloo: Great, Alexy. So, I would say that synthetic data is a relatively new term and not many are familiar with it. Can you briefly explain the concept and why we need synthetic data?
Alexy: Absolutely. With new AI-based technologies coming in, there is no doubt that our data needs are so huge that we need to train these AI modules well to get the best benefits out of them. But accessing quality real data is very tedious and time-consuming, and often, it is a very costly affair. In many scenarios, for example, self-driving automobiles and so on, use cases may be new and real data would not be available at all. Even if they are available, they might not be accessible because of data privacy rules. In such a case , synthetic data comes very handy to train AI models. Synthetic data is, by definition, generated artificially with or without the use of real data for training AI modules. It serves a real purpose; it serves the purpose of real data and is sometimes even better. And it has all the granular elements of the original dataset that you would want.
Silloo: So, tell me Alexy; what does synthetic data offer?
Alexy: Digitally generated data has the same predictive power as real data, as it replicates the statistical characteristics of the existing dataset. It can be generated for unseen conditions and events. When actual data lacks quality, volume, or variety, synthetic data overcomes these weaknesses, as it is generated for all the different unseen conditions in all the permutations and combinations of a given situation. Thus, it can better train AI models, test systems better, and build better prototypes than even actual datasets. Also, as they can be created for any specific data requirements, they do not involve the same concerns as real data from a privacy point of view. It will also a provide faster turnaround for AI testing. The number of iterations required is sometimes very large. In coming years, synthetic data is going to overshadow real data in AI models.
Silloo: That is interesting, Alexy. In your view, which sectors can make use of synthetic data?
Alexy: Silloo, synthetic data can be very helpful in many sectors ranging from manufacturing and mobility to retail and natural-language processing where many of the use cases might be new – for example, training virtual AI assistants in Indian regional languages. It is also a savior in industries like healthcare and pharmaceuticals, where data privacy is a huge concern. In sectors like financial services, synthetic data can really help evaluate market behavior and develop new and innovative products, which is what large and small financial services organizations are trying to do.
Silloo: But will it solve all the data-related problems? Is it a silver bullet?
Alexy: No, it certainly is not a silver bullet, Silloo. While synthetic data provides many benefits, it is not really a one-stop solution for all the data-related problems. Just like any other technology, synthetic data comes with significant risks and limitations, since the quality of synthetic data generated largely depends on the quality of the model that created it. So, if the input has errors or biases, the data generated using it will lead to false insights. Just like they say – garbage in, garbage out.
Silloo: Thanks a lot, Alexy, for explaining all these concepts so clearly.
Alexy: Thanks, Silloo, I too really enjoyed the talk.
If you would like to listen to our podcasts on the go:
Discover how EY's analytics consulting services can help you apply analytics throughout your organization to help grow, protect and optimize your business.
We collaborate with insurers on technology transformation programs and the deployment of digital tools. From concept to implementation, we work with you to develop strategies that optimize performance, drive efficiency and enhance quality.