Tech Trends: Synthetic data: artificial data; real solutions

In this podcast, Alexy Thomas, EY India Technology Consulting Partner, talks about synthetic data, which is being considered the future of the artificial intelligence space.

Key takeaways

While actual data may lack quality, volume, or variety, synthetic data can overcome these limitations and be generated in all the permutations and combinations of any given condition. Real data may also be unavailable for unseen conditions and events.
Synthetic data can better train AI models and test systems and help build better prototypes than real data sets.
It can also provide faster turnaround for AI testing, which requires large amounts of iterations and inputs. In the coming years, synthetic data is going to overshadow real data in AI models.
In sectors like financial services, synthetic data can help to evaluate market behavior and to develop new and innovative products, which is what large and small financial services organizations are trying to do.
Synthetic data comes with significant risks and limitations since the quality of synthetic data generated depends on the quality of the model that created it. So, if the input has errors or biases, the data generated using it will lead to false insight generation and, automatically, to erroneous decision-making.

Digitally generated data has the same predictive power as real data, as it replicates the statistical characteristics of the existing dataset. It can be generated for unseen conditions and events. Where actual data lacks quality, volume, or variety, synthetic data overcomes these weaknesses, as it is generated for unseen conditions.

Alexy Thomas

EY India Technology Consulting Partner

For your convenience, a full text transcript of this podcast is available below:

Silloo Jangalwala

Hello, this is Silloo welcoming you to a new episode of the EY Podcast series. As we have concluded the series on ESG, we are now moving to our next topic, technology. Here, we look at the most trending technology topics that India Inc. needs to know in its digitization journey. Today, we will look at synthetic data, which has become a favorite buzzword in the artificial intelligence space.

There is no doubt that the industry needs quality data to train new AI models. But with data privacy concerns and stringent regulations on data sharing, accessing real quality data is very difficult. Synthetic data tries to address these problems. Today, we are going to explore more about synthetic data. What is synthetic data and why it is called the future of AI? Will it solve the privacy concerns? Is it a one-stop solution for all your AI data needs? To answer these questions, we have with us today, Alexy Thomas, Technology Consulting Partner on data analytics and ESG at EY India, joining me.

Welcome to the podcast, Alexy. You are in the hot seat today.

Alexy

Thanks, Silloo, for the wonderful introduction. This is a very interesting subject, so I will try to answer your questions as much as I can.

Silloo

Great, Alexy. So, I would say that synthetic data is a relatively new term and not many are familiar with it. Can you briefly explain the concept and why we need synthetic data?

Alexy

Absolutely. With new AI-based technologies coming in, there is no doubt that our data needs are so huge that we need to train these AI modules well to get the best benefits out of them. But accessing quality real data is very tedious and time-consuming, and often, it is a very costly affair. In many scenarios, for example, self-driving automobiles and so on, use cases may be new and real data would not be available at all. Even if they are available, they might not be accessible because of data privacy rules. In such a case , synthetic data comes very handy to train AI models. Synthetic data is, by definition, generated artificially with or without the use of real data for training AI modules. It serves a real purpose; it serves the purpose of real data and is sometimes even better. And it has all the granular elements of the original dataset that you would want.

Silloo

So, tell me Alexy; what does synthetic data offer?

Alexy

Digitally generated data has the same predictive power as real data, as it replicates the statistical characteristics of the existing dataset. It can be generated for unseen conditions and events. When actual data lacks quality, volume, or variety, synthetic data overcomes these weaknesses, as it is generated for all the different unseen conditions in all the permutations and combinations of a given situation. Thus, it can better train AI models, test systems better, and build better prototypes than even actual datasets. Also, as they can be created for any specific data requirements, they do not involve the same concerns as real data from a privacy point of view. It will also a provide faster turnaround for AI testing. The number of iterations required is sometimes very large. In coming years, synthetic data is going to overshadow real data in AI models.

Silloo

That is interesting, Alexy. In your view, which sectors can make use of synthetic data?

Alexy

Silloo, synthetic data can be very helpful in many sectors ranging from manufacturing and mobility to retail and natural-language processing where many of the use cases might be new – for example, training virtual AI assistants in Indian regional languages. It is also a savior in industries like healthcare and pharmaceuticals, where data privacy is a huge concern. In sectors like financial services, synthetic data can really help evaluate market behavior and develop new and innovative products, which is what large and small financial services organizations are trying to do.

Silloo

But will it solve all the data-related problems? Is it a silver bullet?

Alexy

No, it certainly is not a silver bullet, Silloo. While synthetic data provides many benefits, it is not really a one-stop solution for all the data-related problems. Just like any other technology, synthetic data comes with significant risks and limitations, since the quality of synthetic data generated largely depends on the quality of the model that created it. So, if the input has errors or biases, the data generated using it will lead to false insights. Just like they say – garbage in, garbage out.

Silloo

Thanks a lot, Alexy, for explaining all these concepts so clearly.

Alexy

Thanks, Silloo, I too really enjoyed the talk.

Presenters

Alexy Thomas

Partner, Technology Consulting, EY India

If you would like to listen to our podcasts on the go:

Amazon music

YouTube Music

Spotify

SoundCloud

Podcast

Episode 03

Duration

6m 24s

Data Analytics Consulting services

Analytics consulting services at EY offers data analytics insights, analytics-as-a-service, strategy & data consulting to drive growth & manage risk.
Read more
Digital transformation in insurance

EY partners with insurers to build digital-first transformation strategies, deploying tools that boost efficiency, performance, and customer experience
Read more
Artificial Intelligence Consulting Services

Our Consulting approach to the adoption of AI and intelligent automation is human-centered, pragmatic, outcomes-focused and ethical.
Read more

In this series

EY refers to the global organization, and may refer to one or more, of the member firms of Ernst & Young Global Limited, each of which is a separate legal entity. Ernst & Young Global Limited, a UK company limited by guarantee, does not provide services to clients.

Insights

Highlights

Services

Spotlight

Industries

Case studies

Careers

Spotlight

About us

Top news

Tech Trends: Synthetic data: artificial data; real solutions

Key takeaways

Silloo Jangalwala

Alexy

Silloo

Alexy

Silloo

Alexy

Silloo

Alexy

Silloo

Alexy

Silloo

Alexy

Presenters

Alexy Thomas

In this series

Presenters