For this client, the raw data was numerous state contracts written with little or no consistency in language, wording, topical organization or formatting. The words used to write contractual requirements may be similar, but they are often organized and combined differently.
EY teams used NLP to custom-train an algorithm to extract requirements that would then populate a central database, enabling the company’s analysts to rapidly search keywords, phrases and characters by both state and business function. The searchable business areas include claims centers, call centers, appeals and grievances, and others with compliance implications.
In the first phase, the EY teams identified common, relevant paragraph and sentence structures, flagging phrases such as “In the case of” and “is required to.” They also classified characters such as bullets and Roman numerals. Since these special characters can vary by state contract, different combinations were necessary for the NLP algorithm to learn the individual contracts and requirements. In some cases, the state contracts were only available in PDF format, which required the application of optical character recognition software to convert pictures of words to readable characters before NLP could be taught what to find.
The teams then took a deep dive into the syntactical and semantic language to build a dictionary of about 450 critical search terms for the algorithm to zoom in on. The keywords are primarily related to formal obligations and requirements. Words like “required,” “shall” and “must” denote obligation, while “within” signals a quantitative requirement, as in “within 48 hours.” Verbs such as “comply,” “fulfill” and “reside” figured prominently in the NLP application dictionaries and libraries built specifically for the MCO’s state contracts.
The NLP algorithm took about four months to develop and fine-tune, through a staged process that increased its accuracy from 50% to 84% to nearly 100%, with repeated testing, additions and clarifications to the searchable library. In total, some 170,000 compliance requirements were identified and classified by the words and characters that describe them in the state Medicaid contracts.
“Unstructured data is the largest type of data in organizations today, yet it remains largely untapped for insights and can be an ongoing burden to manage,” says Traci Gusher, EY Americas Data and Analytics Leader. “Application of modern artificial intelligence changes our ability to rapidly use this impactful type of information and drive value from it.”