Processing both medical and commercial sales data from multiple countries
- Digital Hive
- Jul 22
- 3 min read
Industry: Pharma
The project
Our team was tasked with processing both medical and commercial sales data from multiple countries to generate actionable recommendations for sales representatives. The recommendations provided details on which Health Care Professionals (HCPs) to contact, along with their communication preferences and the optimal frequency and topics for engagement. This initiative was designed to enable targeted outreach and improve overall sales effectiveness.
The challenge
This project presented several complex challenges that required innovative solutions and seamless collaboration across teams.
Regional Differences Between Countries
Pharmaceutical data varies widely between countries, with differences in HCP titles, product mappings, and therapeutic area classifications. To accommodate these regional nuances, we developed a flexible system where each country was assigned its own configuration file, layered over a shared global pipeline. This approach ensured that localized differences were captured without compromising the consistency of the data processing workflow.
Separation of Medical and Commercial Data
The project integrated data from two distinct sources: commercial data (such as sales figures and market share data purchased from third parties) and sensitive medical data. Given the stringent regulatory requirements in the pharma industry, we designed separate pipelines for processing each type of data. This separation ensured that medical data was never inadvertently used in commercial applications, while still allowing both data sets to undergo equivalent processing techniques for quality and consistency.
Collaboration Between Data Engineering (DE) and Data Science (DS)
The success of this project hinged on the strong collaboration between the Data Engineering and Data Science teams. We adopted Kedro—an open-source framework for data science code—to implement the pipelines, utilizing Python and Spark. The pipeline architecture was split into two main segments: the initial data ingestion and transformation was managed by the DE team, while the final data modeling and recommendation tasks were handled by the DS team. By maintaining all code in a single repository, we not only streamlined the development process but also fostered mutual understanding between the teams. A key benefit of using Kedro was its built-in lineage tracking, which enhanced transparency, reproducibility, and overall quality assurance.
(Example visualization available at: https://kedro.org)
Our contribution
We began our engagement on the project with a senior consultant integrated into the existing data engineering team. Our early efforts focused on optimizing and maintaining current pipelines, as well as designing new flows to enhance data processing efficiency.
Over time, as trust was established and our technical expertise recognized, we were entrusted with the end-to-end rollout of a new medical recommendation flow across the EMEA region. This innovative flow utilized AI-driven recommendations, drawing on insights from medical publications and expert analyses to guide sales strategies. One of our senior consultants assumed full ownership of this rollout, handling sprint planning, stakeholder communication, and ensuring the alignment of project objectives with business goals. Additionally, we expanded the team by onboarding and managing five junior engineers, including additional consultants from Digital Hive, to support the increased scope of work.
Conclusion
The project was a success, leading to increased sales figures for the affected products.
After an initial ramp-up period, our team assumed full responsibility for the medical recommendation flow, leading to a successful rollout across 20 EMEA countries.
This project is an excellent example of how robust data pipelines and strong DE-DS collaboration are critical to the success of AI-driven initiatives. By ensuring that the AI models were fed with high-quality, reliable data, we not only achieved operational excellence but also directly supported strategic decision-making in the pharma sector. Our integrated approach to data engineering and data science not only enhanced HCP engagement but also drove tangible improvements in business performance.
Comments