Introduction
Since it is common knowledge that it takes a very long time to discover and develop novel therpapeutic drugs. I’ve also often wondered about drugs for rare diseases and how they’re often for the minorities in the diseased populations. So here is the initial Python and R projects1 about rare disease drugs by using data extracted from FDA’s Orphan Drug Product designation database. I also have to acknowledge “Data Is Plural” website, which has inspired me to look into this dataset.
Unfortunately this particular dataset I’ve obtained does not contain information on incentives for pharmaceutical companies (from US’s Orphan Drug Act of 1983), which could mean that I might be able to draw some preliminary, basic or raw correlations between incentives and rare disease drug marketing approvals (this might lead to other controversial discussions about this area, which was not my initial aim for this project as I just wanted to explore a dataset on rare disease drugs at the moment).
The datatset was for the period from 1983 till present, for approved rare disease drugs only.
Python project
For Python project2, there was one question in mind to answer:
How long did it take on average for a rare disease drug to reach marketing approval?
Python version of the data analysis: link
Short summary of findings from this dataset using Python:
The orphan designation for rare disease drug that had the highest counts between 1983 till present was for the treatment of multiple myeloma
The highest counts of final approved indication for rare disease drugs spanned across several different clinical indications – it often ended up with more indication details than the initial orphan designation phase
The average time required for a rare disease drug to progress from the initial designation phase to the final approval for marketing was about 1932 days (~5 years)
The horizontal bar graph (access from link above) showed the top ten rare disease drugs with the longest time taken to reach the market Tiopronin was the one that took the longest time of 12,215 days (~33 years)
The data for Tiopronin appeared to be duplicates, but note that the two were formulated differently as one of them was the enteric-coated (EC) version (marketed as delayed-release tablets under the actual trade name of “Thiola EC”, but recorded in the dataset as “Thiola” only), while the other one was the immediate-release form (Thiola)
Image: Rawpixel.com
R project
For R project3, there were two questions in mind to answer:
What countries were involved in rare disease drug developments?
How would the time from designation to approval be displayed in timeline style for selected rare disease drugs?
R versions of the data analysis:
base R methods via Jupyter notebook with link
Tidyverse version via RStudio with link - done using RMarkdown
Short summary of findings from this dataset using R:
US was the country that had the most involvement in rare disease drug developments, which was followed by Ireland and the UK, and also a number of other countries
More work could possibly go into looking at the duplicates of brand names of the same generic drug e.g. cannabidiol with trade name as Epidiolex that had 5 repeated timelines (shown in link above), which appeared to be different clinical indications for each of these entries after further checks
The timelines have also implied that drug discovery and development is a very timely process, which could span many years, such as 10 – 20 years or more, before a drug actually reaches the market for public use
Footnotes
This work is under CC BY-SA 4.0 International License for anyone interested in exploring the topic further.↩︎
The published date of this project would be based on the last day I’ve worked on associated file, prior to the blog move↩︎
The R versions, base R and Tidyverse projects, were done after the Python one was completed↩︎