From Chaos to Clarity: Using GenAI to Structure Unstructured Data in RWE Studies

From Chaos to Clarity: Using GenAI to Structure Unstructured Data in RWE Studies

DATE

October 16, 2024

AUTHOR

Dragan | Co-Founder & CTO

Introduction

Real-World Evidence (RWE) studies rely heavily on a diverse stream of data sources to draw meaningful conclusions. Despite its value, much of this data remains unstructured, creating major challenges for researchers and pharmaceutical companies. Unstructured data, like clinical notes, patient feedback and records from external databases, often exist in formats that are difficult to process using traditional methods such as relational databases, manual review and rule-based systems. As a result, these sources frequently go underutilized, reducing the potential of RWE to provide actionable insights.

This is where Generative Artificial Intelligence (GenAI) comes into play. In this blog post, we will explore how GenAI is revolutionizing the handling of unstructured data in RWE studies. We will dive into its ability to process, categorize and transform chaotic, unstructured information into well-organized datasets that are easier to analyze, monitor and derive insights from.

Furthermore, we will outline best practices for integrating GenAI into data management workflows. From data privacy and ethical considerations to maximizing return on investment, we’ll provide strategic guidelines to ensure your GenAI implementation is both successful and compliant. By leveraging the power of GenAI, researchers can gain clearer, more actionable insights, ultimately driving faster, more effective drug development and improving patient outcomes.

1. The Challenge of Unstructured Data in RWE Studies

RWE studies have become essential to understanding patient experiences, drug efficacy and long-term safety outcomes in routine healthcare settings. Unlike randomized controlled clinical trials (RCTs), RWE studies rely on a wide variety of data sources, ranging from electronic health records (EHRs) and insurance claims to clinical notes, patient-reported outcomes and even social media or wearable devices. Much of this data, particularly clinical notes and patient feedback is unstructured, making it challenging to analyze and derive actionable insights.

The Nature of Unstructured Data in RWE Studies

Unstructured data refers to information that doesn’t fit neatly into traditional databases or spreadsheets. In healthcare, this includes:

An estimated 80% of healthcare data is unstructured. The challenge lies in processing this unstructured information in a way that produces valuable, actionable insights. Traditional methods fall short because they are designed to handle structured data which is organized in columns and rows, like lab results or billing codes limiting their ability to deal with narrative text or qualitative data.

Image source: https://www.pecan.ai/blog/what-is-structured-data/

Limitations of Traditional Methods

Conventional data analysis tools, which work well with structured data, struggle with unstructured formats. For instance, relational databases and statistical software are excellent at handling numerical data but fail to accurately process and interpret free-text clinical notes or subjective patient reports.

While natural language processing (NLP) technologies have been applied to unstructured healthcare data, many NLP models encounter limitations when dealing with the complexities of medical jargon, multi-modal data formats or informal patient language. For example, clinical notes often contain abbreviations, medical acronyms and nuanced descriptions that are difficult to interpret without specialized algorithms. Even advanced NLP tools can miss important context, leading to inaccurate or incomplete data extraction.

Manual data review, while an option, is both time-consuming and prone to human error. This can negatively affect key performance indicators (KPIs) such as data processing time and data quality, leading to inefficient workflows and potentially flawed insights.

How GenAI Can Provide a Solution

GenAI offers a powerful solution to the challenges posed by unstructured data in RWE studies. Unlike traditional approaches, GenAI models, built on advanced machine learning algorithms, are capable of analyzing and interpreting vast amounts of unstructured text, images and other data types. These models use deep learning to recognize patterns and extract meaning from complex datasets, making them highly effective in structuring unstructured healthcare data.

Automating Data Structuring

GenAI models can automate the categorization and summarization of unstructured data, significantly reducing the time and effort needed to process complex datasets. For instance, a GenAI tool can analyze clinical notes, extract critical information (e.g., diagnoses, treatments and outcomes) and organize it into structured formats such as tables or charts for easier analysis. Tools like IBM Watson Health have demonstrated the ability to process large volumes of unstructured medical text with high accuracy, automatically transforming clinical narratives into structured data suitable for real-world research.

Enhancing Data Quality and Completeness

In RWE studies, data completeness is a critical KPI, especially when evaluating drug safety and efficacy. Traditional tools often struggle to incorporate all available data sources, but GenAI can synthesize information from a variety of unstructured sources. By pulling insights from clinical notes, patient feedback and external databases, GenAI ensures a more complete dataset, leading to higher-quality outcomes. This improves the data quality KPI, as more relevant information is captured and analyzed.

Accelerating Insight Generation

Another key advantage of using GenAI is its ability to enhance insight generation speed. For example, GenAI can analyze large amounts of patient-reported outcomes or social media data in near real-time, identifying trends, potential adverse effects or treatment pathways. This allows pharmaceutical companies and other healthcare providers to make quicker decisions based on real-world data, improving both the efficiency and effectiveness of RWE studies.

2. GenAI in Action: Structuring Unstructured Data for Better Insights

GenAI is reshaping how healthcare organizations manage unstructured data in RWE studies, offering practical solutions for categorizing, summarizing and deriving actionable insights from complex datasets. These capabilities have transformative implications for drug safety and effectiveness monitoring, where timely data analysis is crucial. Below, we explore how GenAI technologies are used in real-world applications to address these challenges.

Source: https://www.mdpi.com/1660-4601/19/16/10159

2.1 Summarizing Patient Feedback

GenAI is also highly effective in summarizing vast amounts of patient feedback from sources such as surveys, social media, or wearable devices. In large clinical studies or post-market surveillance, gathering insights from this feedback is crucial for monitoring the real-world effectiveness of drugs or medical devices.

Manually reviewing data from thousands of patient-reported outcomes would be incredibly time-consuming. On the other hand, by using a GenAI tool, the feedback can be automatically summarized to highlight common side effects, patient adherence to the treatment and overall satisfaction levels. This capability helps researchers quickly identify critical trends, such as unexpected (serious) adverse events (S(AEs)) or variations in drug efficacy across different populations​.

2.2 Real-Time Drug Safety Monitoring

A further vital application of GenAI is in pharmacovigilance – the process of monitoring drug safety in real time. Traditionally, drug safety monitoring relies on periodic reviews of adverse event reports submitted by healthcare providers, which can delay the identification of serious safety concerns. With GenAI, organizations can analyze unstructured data from diverse sources like clinical reports, patient forums and even social media in real-time.

One real-world example is Pfizer using AI to enhance pharmacovigilance efforts. By leveraging GenAI models to process unstructured adverse event reports and social media mentions, Pfizer was able to flag safety concerns about a new drug in a fraction of the time required by manual methods. This use of GenAI for real-time monitoring ensured that the drug’s risk profile could be updated quickly, enabling timely interventions​.

2.3 Structuring scattered secondary data

GenAI offers promising advancements in healthcare analytics by structuring scattered secondary data like EHRs and other clinical or operational sources. GenAI can efficiently organize and classify unstructured data, extracting meaningful patterns from EHRs, clinical trial data and insurance claims. This accelerates research and improves decision-making in healthcare settings.

McKinsey & Company reports a surge in AI-driven tools for managing big data in healthcare, projecting up to US$100 billion annually in efficiency gains through enhanced clinical trial designs and improved operational workflows. It can bridge the gap between raw data and actionable insights, especially when dealing with EHRs and other unstructured datasets.

3. Implementing GenAI: Best Practices and Strategic Considerations

Successful implementation of GenAI into data management workflows requires careful planning and strategic considerations to address key challenges like data privacy, ethical compliance and maximizing return on investment (ROI). Be it for clinical trials or other healthcare settings, a robust framework is essential for leveraging GenAI effectively.

3.1 Compliance with regulatory frameworks

Data privacy and ethical compliance are crucial when implementing GenAI in sensitive fields like healthcare. The European Union Artificial Intelligence Act (EU AI Act), which focuses on regulating AI technologies across the EU, offers guidance on how AI should be responsibly deployed in healthcare and other critical sectors. This regulation emphasizes transparency, fairness and safety, and categorizes AI applications based on risk levels. For instance, high-risk applications, such as those used in healthcare diagnostics or treatment, must meet stringent standards regarding data quality, privacy protection and ethical governance​. To comply with these regulations, healthcare companies using GenAI for data management workflows must ensure their models are auditable and transparent. This can involve regular assessments, establishing human oversight and utilizing explainable AI technologies. Furthermore, strict adherence to GDPR (General Data Protection Regulation) is mandatory to safeguard personal health information when deploying AI in the EU.

3.2 Leveraging open-source models with local hosting

Using locally hosted, open-source GenAI models is another best practice that enhances data security and reduces reliance on third-party services. Platforms like Hugging Face, which offers a vast repository of open-source AI models, allow organizations to deploy and fine-tune models on their own servers. This ensures that sensitive healthcare data is kept in-house, reducing the risk of data breaches and ensuring compliance with privacy regulations like GDPR and the Health Insurance Portability and Accountability Act (HIPAA) in the United States.

3.3 Selecting models specialized in medical knowledge

For healthcare settings, using GenAI models that are specifically trained in medical knowledge is vital to ensure accuracy and relevance. Over the past few years, AI systems have shown remarkable improvement on the MedQA benchmark, a key test for assessing AI’s clinical knowledge. The standout model of 2023, GPT-4 Medprompt, reached an accuracy rate of 90.2%, marking a 22.6 percentage point increase from the highest score in 2022. Since the benchmark’s introduction in 2019, AI performance on MedQA has nearly tripled. Those models can perform very well in processing unstructured healthcare data and can help automate tasks like summarizing patient histories, identifying potential treatment options or flagging adverse events in real-time, while maintaining high levels of accuracy.

Source: Stanford AI Index Report 2024, Page: 316

3.4 Maximizing Return on Investment (ROI)

To maximize ROI, organizations should focus on automating labor-intensive tasks with GenAI. Interpreting and extracting meaningful insights from unstructured data requires expertise, as it involves analyzing complex information and organizing it into a usable format. This poses a significant challenge for healthcare providers and researchers aiming to utilize such data to enhance patient care. Using GenAI models for such tasks can increase the efficiency by reducing manual human intervention.

Tracking KPIs like cost reductions in data processing, increased speed of insight generation and improvements in patient outcomes can provide quantifiable measures of the impact of GenAI implementation. According to a 2024 Stanford AI Index report, 42% of organizations reported cost reductions from implementing AI and 59% reported revenue increases, showcasing AI’s ability to drive efficiency and business growth​.

Finally, collaborating with companies or research institutions specializing in healthcare AI is a crucial step. These experts can customize GenAI tools to meet specific requirements, allowing you to benefit from their experience and avoid repeating past mistakes, thereby accelerating successful implementation. In McKinsey’s Q1 2024 survey, 59% percent of respondents from healthcare organizations – including payers, providers and healthcare services and technology (HST) groups – say that they partner with external vendors to develop or integrate customized GenAI solutions.

Conclusion

In conclusion, using GenAI to structure unstructured data in RWE studies offers groundbreaking potential for transforming how healthcare research is conducted. GenAI’s ability to intelligently categorize, summarize, and extract insights from unstructured data sources ensures that critical information, often overlooked due to complexity, is leveraged to its fullest extent. This capability can significantly enhance the depth and quality of RWE studies, ultimately driving better decision-making in drug safety, effectiveness and patient care.

Climedo stands at the forefront of integrating AI-driven technologies into RWE studies. Our team is equipped to guide clients through the challenges of unstructured data management, ensuring compliance with ethical standards and data privacy regulations. By partnering with Climedo, you can take full advantage of cutting-edge GenAI tools to unlock valuable insights, streamline workflows and contribute to improved patient outcomes as well as more effective healthcare solutions.

We look forward to telling you more in a personalized demo

Dragan | Co-Founder & CTO

Dragan | Co-Founder & CTO

Climedo

Digital health entrepreneur. Passionate about clean UX and travelling to exotic countries. Creates products with love at Climedo Health.

Envelope Icon

Stay up to date on our insights & events!