Shinylive app in Python

Data preparation for compounds in COVID-19 clinical trials

Python
Shiny
Polars
Plotly
PubChem
Cheminformatics
Author

Jennifer HY Lin

Published

May 8, 2023

Brief introduction

Since I’ve had a lot of fun building a Shiny app in R last time, I was on track to build another Shiny app again but using Python instead. So here in this post, I’ll talk about the data wrangling process to prepare the final dataset needed to build a Shinylive app in Python. The actual Shinylive app deployment and access will be shown in a separate post after this one.


Source of data

The dataset used for this Shiny app in Python was from PubChem (link here). There were a total of 631 compounds at the time when I downloaded them as .csv file, along with their relevant compound data. I only picked this dataset randomly, as the focus would be more on app building, but it was nice to see an interactive web app being built and used for a domain such as pharmaceutical research.


Import Polars

Polars dataframe library was used again this time.

import polars as pl


Reading .csv file
pc = pl.read_csv("pubchem.csv")
pc.head()
shape: (5, 38)
cid cmpdname cmpdsynonym mw mf polararea complexity xlogp heavycnt hbonddonor hbondacc rotbonds inchi isosmiles canonicalsmiles inchikey iupacname exactmass monoisotopicmass charge covalentunitcnt isotopeatomcnt totalatomstereocnt definedatomstereocnt undefinedatomstereocnt totalbondstereocnt definedbondstereocnt undefinedbondstereocnt pclidcnt gpidcnt meshheadings annothits annothitcnt aids cidcdate sidsrcname depcatg annotation
i64 str str f64 str f64 f64 str i64 i64 i64 i64 str str str str str f64 f64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 str str i64 str i64 str str str
5280453 "Calcitriol" "calcitriol|322... 416.6 "C27H44O3" 60.7 688.0 "5.100" 30 3 3 6 "InChI=1S/C27H4... "C[C@H](CCCC(C)... "CC(CCCC(C)(C)O... "GMRQFYUYWCNGIN... "(1R,3S,5Z)-5-[... 416.329 416.329 0 1 0 6 6 0 2 2 0 22311 46029 "Calcitriol" "Biological Tes... 12 "485|631|731|78... 20040916 "A2B Chem|AA BL... "Chemical Vendo... "COVID-19, COVI...
9962735 "Ubiquinol" "ubiquinol|992-... 865.4 "C59H92O4" 58.9 1600.0 "20.200" 63 2 4 31 "InChI=1S/C59H9... "CC1=C(C(=C(C(=... "CC1=C(C(=C(C(=... "QNTNKSLOFHEFPK... "2-[(2E,6E,10E,... 864.7 864.7 0 1 0 0 0 0 9 9 0 2732 21358 "NULL" "Chemical and P... 7 "NULL" 20061025 "001Chemical|A2... "Chemical Vendo... "COVID-19, COVI...
5961 "Glutamine" "L-glutamine|gl... 146.14 "C5H10N2O3" 106.0 146.0 "-3.100" 10 3 4 4 "InChI=1S/C5H10... "C(CC(=O)N)[C@@... "C(CC(=O)N)C(C(... "ZDXPYRJPNDTMRX... "(2S)-2,5-diami... 146.069 146.069 0 1 0 1 1 0 0 0 0 88218 399 "Glutamine" "Biological Tes... 12 "422|429|436|54... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...
2244 "Aspirin" "aspirin|ACETYL... 180.16 "C9H8O4" 63.6 212.0 "1.200" 13 1 4 3 "InChI=1S/C9H8O... "CC(=O)OC1=CC=C... "CC(=O)OC1=CC=C... "BSYNRYMUTXBXSQ... "2-acetyloxyben... 180.042 180.042 0 1 0 0 0 0 0 0 0 127012 364455 "Aspirin" "Biological Tes... 12 "1|3|9|15|19|21... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...
457 "1-Methylnicoti... "1-methylnicoti... 137.16 "C7H9N2O+" 47.0 136.0 "-0.100" 10 1 1 1 "InChI=1S/C7H8N... "C[N+]1=CC=CC(=... "C[N+]1=CC=CC(=... "LDHMAVIPBRSVRG... "1-methylpyridi... 137.071 137.071 1 1 0 0 0 0 0 0 0 310 674 "NULL" "Biological Tes... 8 "61001|61002|14... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...


Quick look at the data

I decided to comment out the code below to keep the post at a reasonable length for reading purpose, but they were very handy for a quick glimpse of the data content.

# Quick overview of the variables in each column in the dataset
# Uncomment line below if needed to run
#print(pc.glimpse())

# Quick look at all column names
# Uncomment line below if needed to run
#pc.columns


Check for nulls in dataset
pc.null_count()
shape: (1, 38)
cid cmpdname cmpdsynonym mw mf polararea complexity xlogp heavycnt hbonddonor hbondacc rotbonds inchi isosmiles canonicalsmiles inchikey iupacname exactmass monoisotopicmass charge covalentunitcnt isotopeatomcnt totalatomstereocnt definedatomstereocnt undefinedatomstereocnt totalbondstereocnt definedbondstereocnt undefinedbondstereocnt pclidcnt gpidcnt meshheadings annothits annothitcnt aids cidcdate sidsrcname depcatg annotation
u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32 u32
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


Change column names as needed
# Change column names
pc_cov = pc.rename(
    {
        "cmpdname": "Compound name",
        "cmpdsynonym": "Synonyms",
        "mw": "Molecular weight",
        "mf": "Molecular formula",
        "polararea": "Polar surface area",
        "complexity": "Complexity",
        "xlogp": "Partition coefficients",
        "heavycnt": "Heavy atom count",
        "hbonddonor": "Hydrogen bond donor count",
        "hbondacc": "Hydrogen bond acceptor count",
        "rotbonds": "Rotatable bond count",
        "exactmass": "Exact mass",
        "monoisotopicmass": "Monoisotopic mass",
        "charge": "Formal charge",
        "covalentunitcnt": "Covalently-bonded unit count",
        "isotopeatomcnt": "Isotope atom count",
        "totalatomstereocnt": "Total atom stereocenter count",
        "definedatomstereocnt": "Defined atom stereocenter count",
        "undefinedatomstereocnt": "Undefined atoms stereocenter count",
        "totalbondstereocnt": "Total bond stereocenter count",
        "definedbondstereocnt": "Defined bond stereocenter count",
        "undefinedbondstereocnt": "Undefined bond stereocenter count",
        "meshheadings": "MeSH headings"
    }
)

pc_cov.head()
shape: (5, 38)
cid Compound name Synonyms Molecular weight Molecular formula Polar surface area Complexity Partition coefficients Heavy atom count Hydrogen bond donor count Hydrogen bond acceptor count Rotatable bond count inchi isosmiles canonicalsmiles inchikey iupacname Exact mass Monoisotopic mass Formal charge Covalently-bonded unit count Isotope atom count Total atom stereocenter count Defined atom stereocenter count Undefined atoms stereocenter count Total bond stereocenter count Defined bond stereocenter count Undefined bond stereocenter count pclidcnt gpidcnt MeSH headings annothits annothitcnt aids cidcdate sidsrcname depcatg annotation
i64 str str f64 str f64 f64 str i64 i64 i64 i64 str str str str str f64 f64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 str str i64 str i64 str str str
5280453 "Calcitriol" "calcitriol|322... 416.6 "C27H44O3" 60.7 688.0 "5.100" 30 3 3 6 "InChI=1S/C27H4... "C[C@H](CCCC(C)... "CC(CCCC(C)(C)O... "GMRQFYUYWCNGIN... "(1R,3S,5Z)-5-[... 416.329 416.329 0 1 0 6 6 0 2 2 0 22311 46029 "Calcitriol" "Biological Tes... 12 "485|631|731|78... 20040916 "A2B Chem|AA BL... "Chemical Vendo... "COVID-19, COVI...
9962735 "Ubiquinol" "ubiquinol|992-... 865.4 "C59H92O4" 58.9 1600.0 "20.200" 63 2 4 31 "InChI=1S/C59H9... "CC1=C(C(=C(C(=... "CC1=C(C(=C(C(=... "QNTNKSLOFHEFPK... "2-[(2E,6E,10E,... 864.7 864.7 0 1 0 0 0 0 9 9 0 2732 21358 "NULL" "Chemical and P... 7 "NULL" 20061025 "001Chemical|A2... "Chemical Vendo... "COVID-19, COVI...
5961 "Glutamine" "L-glutamine|gl... 146.14 "C5H10N2O3" 106.0 146.0 "-3.100" 10 3 4 4 "InChI=1S/C5H10... "C(CC(=O)N)[C@@... "C(CC(=O)N)C(C(... "ZDXPYRJPNDTMRX... "(2S)-2,5-diami... 146.069 146.069 0 1 0 1 1 0 0 0 0 88218 399 "Glutamine" "Biological Tes... 12 "422|429|436|54... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...
2244 "Aspirin" "aspirin|ACETYL... 180.16 "C9H8O4" 63.6 212.0 "1.200" 13 1 4 3 "InChI=1S/C9H8O... "CC(=O)OC1=CC=C... "CC(=O)OC1=CC=C... "BSYNRYMUTXBXSQ... "2-acetyloxyben... 180.042 180.042 0 1 0 0 0 0 0 0 0 127012 364455 "Aspirin" "Biological Tes... 12 "1|3|9|15|19|21... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...
457 "1-Methylnicoti... "1-methylnicoti... 137.16 "C7H9N2O+" 47.0 136.0 "-0.100" 10 1 1 1 "InChI=1S/C7H8N... "C[N+]1=CC=CC(=... "C[N+]1=CC=CC(=... "LDHMAVIPBRSVRG... "1-methylpyridi... 137.071 137.071 1 1 0 0 0 0 0 0 0 310 674 "NULL" "Biological Tes... 8 "61001|61002|14... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...


Definitions of molecular properties in this PubChem dataset

The definitions for some of the column names were shown below, which were mainly derived and adapted from PubChem:

Note: please refer to PubChem documentations for full definitions

  • Molecular weight - molecular mass of compounds measured in daltons

  • Topological polar surface area - measured as an estimate of polar surface area of a molecule (i.e. the surface sum over polar atoms in a molecule), with units in angstrom squared (Å2)

  • Complexity - complexity rating for compounds, based on Bertz/Hendrickson/Ihlenfeldt formula as a rough estimation of how complex a compound was structurally

  • Partition coefficients (xlogp) - predicted octanol-water partition coefficient as a measure of the hydrophilicity or hydrophobicity of a molecule

  • Heavy atom count - number of heavy atoms e.g. non-hydrogen atoms in the compound

  • Hydrogen bond donor count - number of hydrogen bond donors in the compound

  • Hydrogen bond acceptor count - number of hydrogen bond acceptors in the compound

  • Rotatable bond count - defined as any single-order non-ring bond, where atoms on either side of the bond were in turn bound to non-terminal heavy atoms (e.g. non-hydrogen). Rotation around the bond axis would change overall molecule shape and generate conformers which could be distinguished by standard spectroscopic methods

  • Exact mass - exact mass of an isotopic species, obtained by summing masses of individual isotopes of the molecule

  • Monoisotopic mass - sum of the masses of atoms in a molecule, using unbound, ground-state, rest mass of principal (or most abundant) isotope for each element instead of isotopic average mass

  • Formal charge - the difference between the number of valence electrons of each atom, and the number of electrons the atom was associated with, assumed any shared electrons were equally shared between the two bonded atoms

  • Covalently-bonded unit count - a group of atoms connected by covalent bonds, ignoring other bond types (or a single atom without covalent bonds), representing number of such units in the compound

  • Isotope atom count - number of isotopes that were not most abundant for the corresponding chemical elements. Isotopes were variants of a chemical element that differed in neutron number

  • Defined atom stereocenter count - atom stereocenter (or chiral center) was where an atom was attached to 4 different types of atoms or groups of atoms in a tetrahedral arrangement. It could either be (R)- or (S)- configurations. Some of the compounds e.g. racemic mixtures, could have undefined atom stereocenter, where (R/S)-config was not specifically defined. Defined atom stereocenter count was the number of atom stereocenters where configurations were specifically defined

  • Undefined atoms stereocenter count - this was the undefined version of the atoms stereocenter count

  • Defined bond stereocenter count - bond stereocenter (or non-rotatable bond) was where two atoms could have different arrangement e.g. in cis- & trans- forms of butene around its double bond. Some compounds could have an undefined bond stereocenter (stereochemistry not specifically defined). Defined bond stereocenter count was the number of bond stereocenters where configurations were specifically defined.

  • Undefined bond stereocenter count - this was the undefined version of the bond stereocenter count


Convert data type for selected columns
# Convert data type - only for partition coefficients column (rest were okay)
pc_cov = pc_cov.with_column((pl.col("Partition coefficients")).cast(pl.Float64, strict = False))
pc_cov.head()
shape: (5, 38)
cid Compound name Synonyms Molecular weight Molecular formula Polar surface area Complexity Partition coefficients Heavy atom count Hydrogen bond donor count Hydrogen bond acceptor count Rotatable bond count inchi isosmiles canonicalsmiles inchikey iupacname Exact mass Monoisotopic mass Formal charge Covalently-bonded unit count Isotope atom count Total atom stereocenter count Defined atom stereocenter count Undefined atoms stereocenter count Total bond stereocenter count Defined bond stereocenter count Undefined bond stereocenter count pclidcnt gpidcnt MeSH headings annothits annothitcnt aids cidcdate sidsrcname depcatg annotation
i64 str str f64 str f64 f64 f64 i64 i64 i64 i64 str str str str str f64 f64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 str str i64 str i64 str str str
5280453 "Calcitriol" "calcitriol|322... 416.6 "C27H44O3" 60.7 688.0 5.1 30 3 3 6 "InChI=1S/C27H4... "C[C@H](CCCC(C)... "CC(CCCC(C)(C)O... "GMRQFYUYWCNGIN... "(1R,3S,5Z)-5-[... 416.329 416.329 0 1 0 6 6 0 2 2 0 22311 46029 "Calcitriol" "Biological Tes... 12 "485|631|731|78... 20040916 "A2B Chem|AA BL... "Chemical Vendo... "COVID-19, COVI...
9962735 "Ubiquinol" "ubiquinol|992-... 865.4 "C59H92O4" 58.9 1600.0 20.2 63 2 4 31 "InChI=1S/C59H9... "CC1=C(C(=C(C(=... "CC1=C(C(=C(C(=... "QNTNKSLOFHEFPK... "2-[(2E,6E,10E,... 864.7 864.7 0 1 0 0 0 0 9 9 0 2732 21358 "NULL" "Chemical and P... 7 "NULL" 20061025 "001Chemical|A2... "Chemical Vendo... "COVID-19, COVI...
5961 "Glutamine" "L-glutamine|gl... 146.14 "C5H10N2O3" 106.0 146.0 -3.1 10 3 4 4 "InChI=1S/C5H10... "C(CC(=O)N)[C@@... "C(CC(=O)N)C(C(... "ZDXPYRJPNDTMRX... "(2S)-2,5-diami... 146.069 146.069 0 1 0 1 1 0 0 0 0 88218 399 "Glutamine" "Biological Tes... 12 "422|429|436|54... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...
2244 "Aspirin" "aspirin|ACETYL... 180.16 "C9H8O4" 63.6 212.0 1.2 13 1 4 3 "InChI=1S/C9H8O... "CC(=O)OC1=CC=C... "CC(=O)OC1=CC=C... "BSYNRYMUTXBXSQ... "2-acetyloxyben... 180.042 180.042 0 1 0 0 0 0 0 0 0 127012 364455 "Aspirin" "Biological Tes... 12 "1|3|9|15|19|21... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...
457 "1-Methylnicoti... "1-methylnicoti... 137.16 "C7H9N2O+" 47.0 136.0 -0.1 10 1 1 1 "InChI=1S/C7H8N... "C[N+]1=CC=CC(=... "C[N+]1=CC=CC(=... "LDHMAVIPBRSVRG... "1-methylpyridi... 137.071 137.071 1 1 0 0 0 0 0 0 0 310 674 "NULL" "Biological Tes... 8 "61001|61002|14... 20040916 "001Chemical|3B... "Chemical Vendo... "COVID-19, COVI...


Select columns for data visualisations

The idea was really only keeping all the numerical columns for some data visualisations later. So I’ve dropped all the other columns in texts or of the string types.

# Drop unused columns in preparation for data visualisations
pc_cov = pc_cov.drop([
    "cid", 
    "Synonyms",
    "Molecular formula",
    "inchi",
    "isosmiles",
    "canonicalsmiles",
    "inchikey",
    "iupacname",
    "pclidcnt",
    "gpidcnt",
    "MeSH headings",
    "annothits",
    "annothitcnt",
    "aids",
    "cidcdate",
    "sidsrcname",
    "depcatg",
    "annotation"
])

pc_cov.head()
shape: (5, 20)
Compound name Molecular weight Polar surface area Complexity Partition coefficients Heavy atom count Hydrogen bond donor count Hydrogen bond acceptor count Rotatable bond count Exact mass Monoisotopic mass Formal charge Covalently-bonded unit count Isotope atom count Total atom stereocenter count Defined atom stereocenter count Undefined atoms stereocenter count Total bond stereocenter count Defined bond stereocenter count Undefined bond stereocenter count
str f64 f64 f64 f64 i64 i64 i64 i64 f64 f64 i64 i64 i64 i64 i64 i64 i64 i64 i64
"Calcitriol" 416.6 60.7 688.0 5.1 30 3 3 6 416.329 416.329 0 1 0 6 6 0 2 2 0
"Ubiquinol" 865.4 58.9 1600.0 20.2 63 2 4 31 864.7 864.7 0 1 0 0 0 0 9 9 0
"Glutamine" 146.14 106.0 146.0 -3.1 10 3 4 4 146.069 146.069 0 1 0 1 1 0 0 0 0
"Aspirin" 180.16 63.6 212.0 1.2 13 1 4 3 180.042 180.042 0 1 0 0 0 0 0 0 0
"1-Methylnicoti... 137.16 47.0 136.0 -0.1 10 1 1 1 137.071 137.071 1 1 0 0 0 0 0 0 0


Quick summary statistics of columns
# Overall descriptive statistics of kept columns
pc_cov.describe()
shape: (7, 21)
describe Compound name Molecular weight Polar surface area Complexity Partition coefficients Heavy atom count Hydrogen bond donor count Hydrogen bond acceptor count Rotatable bond count Exact mass Monoisotopic mass Formal charge Covalently-bonded unit count Isotope atom count Total atom stereocenter count Defined atom stereocenter count Undefined atoms stereocenter count Total bond stereocenter count Defined bond stereocenter count Undefined bond stereocenter count
str str f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64
"count" "631" 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0 631.0
"null_count" "0" 0.0 0.0 0.0 173.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
"mean" null 549.539675 163.915368 864.755626 2.25917 37.770206 4.066561 9.210777 9.518225 549.095022 549.06013 -0.004754 1.578447 0.006339 4.017433 3.551506 0.465927 0.381933 0.343899 0.038035
"std" null 455.236826 192.256415 1000.220379 3.926459 31.821967 6.348004 8.694184 15.393131 455.064211 454.958033 0.358537 1.610416 0.079429 6.128363 5.787792 2.364089 1.181171 1.107245 0.363159
"min" "(+)-Mefloquine... 103.1 0.0 0.0 -24.0 1.0 0.0 0.0 0.0 103.04 103.04 -6.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
"max" "sodium;8-amino... 4114.0 1650.0 9590.0 20.2 291.0 57.0 65.0 151.0 4112.12 4111.12 2.0 21.0 1.0 39.0 39.0 31.0 11.0 11.0 7.0
"median" null 435.9 110.0 635.0 2.5 30.0 3.0 7.0 6.0 435.227 435.227 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0


Conditional assignments in Polars

The longer I’ve used Polars, the more I like its coding styles of chaining a string of different code functions together to manipulate dataframes in one go. This usually might mean that we could avoid writing some repeated loop functions to achieve the same results. In the example below, I’d like to show how to chain “when-then-otherwise” expressions by using Polars.


Chaining when-then-otherwise expressions - creating groups in data

I had the idea of separating all data into 3 different ranges of partition coefficients, so that this could be shown visually in plots. One of the possible ways (other than writing a loop function), or really the long way, to do this might be like the code shown below:

```{python}
part_coef_1 = pc_cov.filter(pl.col("Partition_coef") <= -10)
part_coef_2 = pc_cov.filter((pl.col("Partition_coef") >= -11) & (pl.col("Partition_coef") <= 5))
part_coef_3 = pc_cov.filter(pl.col("Partition_coef") >= 6)
```

A shorter and probably more elegant way was to use the “when-then-otherwise” expression in Polars for conditional assignments (the following code snippet was adapted with thanks to the author of Polars, Ritchie Vink and also the good old Stack Overflow):

pc_cov = pc_cov.with_column(
    pl.when((pl.col("Partition coefficients") <= -10))
    .then("Smaller than -10")
    .when((pl.col("Partition coefficients") >= -11) & (pl.col("Partition coefficients") <= 5))
    .then("Between -11 and 5")
    .otherwise("Larger than 6")
    .alias("Part_coef_group")
)

pc_cov.head(10)

# a new column would be added to the end of the dataframe 
# with a new column name, "Part_coef_group" 
# (scroll to the very right to see the added column)
shape: (10, 21)
Compound name Molecular weight Polar surface area Complexity Partition coefficients Heavy atom count Hydrogen bond donor count Hydrogen bond acceptor count Rotatable bond count Exact mass Monoisotopic mass Formal charge Covalently-bonded unit count Isotope atom count Total atom stereocenter count Defined atom stereocenter count Undefined atoms stereocenter count Total bond stereocenter count Defined bond stereocenter count Undefined bond stereocenter count Part_coef_group
str f64 f64 f64 f64 i64 i64 i64 i64 f64 f64 i64 i64 i64 i64 i64 i64 i64 i64 i64 str
"Calcitriol" 416.6 60.7 688.0 5.1 30 3 3 6 416.329 416.329 0 1 0 6 6 0 2 2 0 "Larger than 6"
"Ubiquinol" 865.4 58.9 1600.0 20.2 63 2 4 31 864.7 864.7 0 1 0 0 0 0 9 9 0 "Larger than 6"
"Glutamine" 146.14 106.0 146.0 -3.1 10 3 4 4 146.069 146.069 0 1 0 1 1 0 0 0 0 "Between -11 an...
"Aspirin" 180.16 63.6 212.0 1.2 13 1 4 3 180.042 180.042 0 1 0 0 0 0 0 0 0 "Between -11 an...
"1-Methylnicoti... 137.16 47.0 136.0 -0.1 10 1 1 1 137.071 137.071 1 1 0 0 0 0 0 0 0 "Between -11 an...
"Losartan" 422.9 92.5 520.0 4.3 30 2 5 8 422.162 422.162 0 1 0 0 0 0 0 0 0 "Between -11 an...
"Vitamin E" 430.7 29.5 503.0 10.7 31 1 2 12 430.381 430.381 0 1 0 3 3 0 0 0 0 "Larger than 6"
"Nicotinamide" 122.12 56.0 114.0 -0.4 9 1 2 1 122.048 122.048 0 1 0 0 0 0 0 0 0 "Between -11 an...
"Adenosine" 267.24 140.0 335.0 -1.1 19 4 8 2 267.097 267.097 0 1 0 4 4 0 0 0 0 "Between -11 an...
"Inosine" 268.23 129.0 405.0 -1.3 19 4 7 2 268.081 268.081 0 1 0 4 4 0 0 0 0 "Between -11 an...


Import Plotly

Time for some data vizzes - importing Plotly first.

import plotly.express as px


Some examples of data visualisations

Below were some of the examples of building plots by using Plotly.

Partition coefficients vs. Molecular weights
fig = px.scatter(x = pc_cov["Partition coefficients"], 
                 y = pc_cov["Molecular weight"], 
                 hover_name = pc_cov["Compound name"],
                 color = pc_cov["Part_coef_group"],
                 width = 800, 
                 height = 400,
                 title = "Partition coefficients vs. molecular weights for compounds used in COVID-19 clinical trials")

fig.update_layout(
    title = dict(
        font = dict(
            size = 15)),
    title_x = 0.5,
    margin = dict(
        l = 20, r = 20, t = 40, b = 3),
    xaxis = dict(
        tickfont = dict(size = 9), 
        title = "Partition coefficients"
    ),
    yaxis = dict(
        tickfont = dict(size = 9), 
        title = "Molecular weights"
    ),
    legend = dict(
        font = dict(
            size = 9)))

fig.show()


Molecular weights vs. Complexity
fig = px.scatter(x = pc_cov["Molecular weight"], 
                 y = pc_cov["Complexity"], 
                 hover_name = pc_cov["Compound name"],
                 #color = pc_cov["Part_coef_group"],
                 width = 800, 
                 height = 400,
                 title = "Molecular weights vs. complexity for compounds used in COVID-19 clinical trials")

fig.update_layout(
    title = dict(
        font = dict(
            size = 15)),
    title_x = 0.5,
    margin = dict(
        l = 20, r = 20, t = 40, b = 3),
    xaxis = dict(
        tickfont = dict(size = 9), 
        title = "Molecular weights"
    ),
    yaxis = dict(
        tickfont = dict(size = 9), 
        title = "Complexity"
    ),
    legend = dict(
        font = dict(
            size = 9)))

fig.show()


Export prepared dataset

Two of the possible options to export the dataset for use in a Shiny app could be:

  1. Convert Polars dataframe into a Pandas dataframe, so that it could be imported into the app for use (Polars not directly supported in Shiny for Python yet, but we could use its to_pandas() function to coerce an object e.g. a dataframe to be converted into a Pandas dataframe).

  2. Another option was to save Polars dataframe as .csv file, then read in this file in the app.py script by using Pandas (which was the method I used for this particular app)

```{python}
# --If preferring to use Pandas--
# Convert Polars df into a Pandas df if needed
df_name = df_name.to_pandas()

# Convert the Pandas df into a csv file using Pandas 
df_name.to_csv("csv_file_name.csv", sep = ",")

# --If preferring to use Polars--
# Simply write a Polars dataframe into a .csv file
df_name.write_csv("csv_file_name.csv", separator = ",")
```