Saving £millions for the NHS with Pandas

Saving £millions for the NHS with PandasSimon HargreavesBlockedUnblockFollowFollowingJun 22Photo by Debbie Molle on UnsplashIntroductionThere has been an open data initiative in the UK since 2010 when data.


uk was created.

After 9 years we now have a huge amount of browseable datasets on the website that can be downloaded and used for your own analysis.

One of the larger data sets is the GP Practice prescription data coming in at around 10 million rows of data every month.

This is a lot of data for your average spreadsheet to handle, so this is where tools like Pandas comes in.

Pandas is a data analysis library for python that can handle many millions of rows of data and run statistical analysis on them to try and extract useful information.

So what did we find out?The total prescription expenditure for the NHS across the UK is approximately £700 million a month.

It varies up and down, month to month, but it stays around the same value.

Of that £700 million, just over half that expenditure is spent on generic drugs and the remainder is spent Branded drugs.

The full month by month analysis is documented here:simonh10/nhs-notebooksNHS Time Series Analysis — Jupyter NotebookTo be clear the average cost to the NHS of Branded and generic drugs is very similar, and in a lot of cases drugs labelled as branded are cheaper than the equivalent generics.

But this average hides a great deal of detail.

Total Prescription costs by monthIf we look at the raw data you’ll see how it is presented and the information we can extract from it.

The Primary care trust and the individual prescribing practice are useful information and will be used in future analysis on geographic distribution, but will be ignored for this analysis.

The principal information for this analysis is the BNF Code, actual cost and the quantity.

First few rows of the raw prescription data available.

BNF What?The BNF or British National Formulary has been a foundational piece of medical information near the hand of doctors since the first edition in 1949.

It is published in book form every 3 years and is available in multiple digital formats.

Part of this publication is a coding standard that describes every medicine delivered to patients through the National Health Service.

It follows a standard hierarchical structure which can be useful for analysing the dispensing of drug groups, there is an excellent resource on this from the University of Oxford.

Prescribing Data: BNF CodesOpenPrescribing takes open datasets from NHS Digtal and NHS Business Services Authority, and makes it easy for people…ebmdatalab.

netOne of the features of the BNF coding is that every code describes a method to get to the BNF coding of the equivalent generic drug.

The last two letters of the BNF code give a strength and formulation code for the generic equivalent.


g040702040BIAC(AM) — Tradorec XL Tablets 300mg040702040AA(AM)(AM) — The equivalent generic versionThe challenge we set ourselvesCan we save the NHS a significant amount of money by switching a handful of drugs to using generics instead of the branded alternatives?.Our source information for calculating this is all contained in this single dataset.

Using this singular data set we are limited to:Consider only generics that have already been prescribed by at least one NHS prescribing practice.

Use the existing pricing for branded and generic drugs as it has already been costed by the NHS.

How we did the analysisWe took a recent dataset selected at random in this case October 2018 and we processed the data in the following ways:We generated the equivalent generic code for every item and flagged if it was already a generic drug.

We created a sum of all the quantities and all the costs for each generic BNF code, one set of sums for the branded equivalents and another for the generic version.

We calculated the unit cost difference between the equivalent branded and generic drugs and multiplied it by the quantity of branded drug prescribed.

This created an excess cost calculation for every generic drug available where a branded drug was also being prescribed.

The resultsIf all possible substitutions of generic for branded were made it would save the NHS £11,347,859.

11, just over £11 million pounds a month.

This was extracting all possible savings for all possible products.

If we just took the 10 largest cost-saving drug substitutions.

And calculated a total for just substituting these branded items we would save £5,469,671.

30, or approximately £5.

5 million a month.


The cost savings we have outlined here are just a numerical analysis and have not taken into account any medical decisions that a prescribing or dispensing practice may have made.

Considering the amount spent by the NHS every month on prescriptions and we were only able to find a 1–2% saving on prescription costs makes me think that they have been doing a pretty good job at keeping drug costs down and should be highly commended for the work they have already done.

For the full analysis and commented code used for these calculations, the project can be found here:simonh10/nhs-notebooksNHS Generics vs Branded cost saving analysis.


comDirections for further analysisThis is just scratching the surface of the prescribing dataset and there is plenty that can still be done.

Here are some of the future analysis I’m considering for this dataset:Using the geographical information in the dataset to create a heat map of health condition prevalence.



anti-depressants, statins, diabetes etc.

Identify the practices that consistently favour branded over generic drugs.

Using the US and SNOMED datasets to calculated how much more the US Medicare system is paying for equivalent drugs compared to the NHS.

Identify branded drugs where generics are available but not currently being used in the NHS.

.. More details

Leave a Reply