Formularies Data Aggregation Using Machine Learning, NER, and LLMs

AI-driven automation delivered 95%+ accuracy and 70% faster updates without increasing operational costs.

The project by numbers.

95%+
Unstructured data accuracy
70%
Faster updates
0%
Ops cost increase
2.5M+
Documents processed

Meet the

business.

A UK-based healthcare intelligence provider offering critical data and insights to the NHS, private healthcare organisations, and pharmaceutical companies.

Their

challenge.

Automate drug recommendation data collection and extraction from 300 UK NHS sources with over 2.5 million documents, including PDFs, unstructured content, and mapping drug data to standard BNF codes.

What we

delivered.

Data collection was performed using ScrapeX and custom spiders, combined with a Named Entity Recognition (NER) system powered by LLMs with user-defined labels. Data mapping leveraged sentence embeddings, semantic similarity, prompt-based resolution for edge cases, and LLM inference for NICE tagging and BNF code alignment. A normalisation pipeline ensured accurate BNF mapping and presentation matching.

High Accuracy

Achieved over 95% accuracy across complex, unstructured document formats.

Faster Updates

Reduced data update turnaround time by 70% through automated workflows.

Operational Efficiency

Scaled operations with no increase in operational costs.

Global Scalability

Enabled easy expansion with minimal retraining across new geographies.

Ready to discuss

your project?

Whatever the challenge, whatever the industry, our teams work side-by-side with clients to design systems that perform today and evolve for tomorrow. That’s why leading businesses trust us to turn their toughest data ambitions into reality.

Send us a message

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.