Shinylive app in Python

Brief introduction

Since I’ve had a lot of fun building a Shiny app in R last time, I was on track to build another Shiny app again but using Python instead. So here in this post, I’ll talk about the data wrangling process to prepare the final dataset needed to build a Shinylive app in Python. The actual Shinylive app deployment and access will be shown in a separate post after this one.

Source of data

The dataset used for this Shiny app in Python was from PubChem (link here). There were a total of 631 compounds at the time when I downloaded them as .csv file, along with their relevant compound data. I only picked this dataset randomly, as the focus would be more on app building, but it was nice to see an interactive web app being built and used for a domain such as pharmaceutical research.

Import Polars

Polars dataframe library was used again this time.

import polars as pl

Reading .csv file

pc = pl.read_csv("pubchem.csv")
pc.head()

shape: (5, 38)

cid	cmpdname	cmpdsynonym	mw	mf	polararea	complexity	xlogp	heavycnt	hbonddonor	hbondacc	rotbonds	inchi	isosmiles	canonicalsmiles	inchikey	iupacname	exactmass	monoisotopicmass	charge	covalentunitcnt	isotopeatomcnt	totalatomstereocnt	definedatomstereocnt	undefinedatomstereocnt	totalbondstereocnt	definedbondstereocnt	undefinedbondstereocnt	pclidcnt	gpidcnt	meshheadings	annothits	annothitcnt	aids	cidcdate	sidsrcname	depcatg	annotation
i64	str	str	f64	str	f64	f64	str	i64	i64	i64	i64	str	str	str	str	str	f64	f64	i64	i64	i64	i64	i64	i64	i64	i64	i64	i64	i64	str	str	i64	str	i64	str	str	str
5280453	"Calcitriol"	"calcitriol\|322...	416.6	"C27H44O3"	60.7	688.0	"5.100"	30	3	3	6	"InChI=1S/C27H4...	"C[C@H](CCCC(C)...	"CC(CCCC(C)(C)O...	"GMRQFYUYWCNGIN...	"(1R,3S,5Z)-5-[...	416.329	416.329	0	1	0	6	6	0	2	2	0	22311	46029	"Calcitriol"	"Biological Tes...	12	"485\|631\|731\|78...	20040916	"A2B Chem\|AA BL...	"Chemical Vendo...	"COVID-19, COVI...
9962735	"Ubiquinol"	"ubiquinol\|992-...	865.4	"C59H92O4"	58.9	1600.0	"20.200"	63	2	4	31	"InChI=1S/C59H9...	"CC1=C(C(=C(C(=...	"CC1=C(C(=C(C(=...	"QNTNKSLOFHEFPK...	"2-[(2E,6E,10E,...	864.7	864.7	0	1	0	0	0	0	9	9	0	2732	21358	"NULL"	"Chemical and P...	7	"NULL"	20061025	"001Chemical\|A2...	"Chemical Vendo...	"COVID-19, COVI...
5961	"Glutamine"	"L-glutamine\|gl...	146.14	"C5H10N2O3"	106.0	146.0	"-3.100"	10	3	4	4	"InChI=1S/C5H10...	"C(CC(=O)N)[C@@...	"C(CC(=O)N)C(C(...	"ZDXPYRJPNDTMRX...	"(2S)-2,5-diami...	146.069	146.069	0	1	0	1	1	0	0	0	0	88218	399	"Glutamine"	"Biological Tes...	12	"422\|429\|436\|54...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...
2244	"Aspirin"	"aspirin\|ACETYL...	180.16	"C9H8O4"	63.6	212.0	"1.200"	13	1	4	3	"InChI=1S/C9H8O...	"CC(=O)OC1=CC=C...	"CC(=O)OC1=CC=C...	"BSYNRYMUTXBXSQ...	"2-acetyloxyben...	180.042	180.042	0	1	0	0	0	0	0	0	0	127012	364455	"Aspirin"	"Biological Tes...	12	"1\|3\|9\|15\|19\|21...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...
457	"1-Methylnicoti...	"1-methylnicoti...	137.16	"C7H9N2O+"	47.0	136.0	"-0.100"	10	1	1	1	"InChI=1S/C7H8N...	"C[N+]1=CC=CC(=...	"C[N+]1=CC=CC(=...	"LDHMAVIPBRSVRG...	"1-methylpyridi...	137.071	137.071	1	1	0	0	0	0	0	0	0	310	674	"NULL"	"Biological Tes...	8	"61001\|61002\|14...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...

Quick look at the data

I decided to comment out the code below to keep the post at a reasonable length for reading purpose, but they were very handy for a quick glimpse of the data content.

# Quick overview of the variables in each column in the dataset
# Uncomment line below if needed to run
#print(pc.glimpse())

# Quick look at all column names
# Uncomment line below if needed to run
#pc.columns

Check for nulls in dataset

pc.null_count()

shape: (1, 38)

cid	cmpdname	cmpdsynonym	mw	mf	polararea	complexity	xlogp	heavycnt	hbonddonor	hbondacc	rotbonds	inchi	isosmiles	canonicalsmiles	inchikey	iupacname	exactmass	monoisotopicmass	charge	covalentunitcnt	isotopeatomcnt	totalatomstereocnt	definedatomstereocnt	undefinedatomstereocnt	totalbondstereocnt	definedbondstereocnt	undefinedbondstereocnt	pclidcnt	gpidcnt	meshheadings	annothits	annothitcnt	aids	cidcdate	sidsrcname	depcatg	annotation
u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32	u32
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Change column names as needed

# Change column names
pc_cov = pc.rename(
    {
        "cmpdname": "Compound name",
        "cmpdsynonym": "Synonyms",
        "mw": "Molecular weight",
        "mf": "Molecular formula",
        "polararea": "Polar surface area",
        "complexity": "Complexity",
        "xlogp": "Partition coefficients",
        "heavycnt": "Heavy atom count",
        "hbonddonor": "Hydrogen bond donor count",
        "hbondacc": "Hydrogen bond acceptor count",
        "rotbonds": "Rotatable bond count",
        "exactmass": "Exact mass",
        "monoisotopicmass": "Monoisotopic mass",
        "charge": "Formal charge",
        "covalentunitcnt": "Covalently-bonded unit count",
        "isotopeatomcnt": "Isotope atom count",
        "totalatomstereocnt": "Total atom stereocenter count",
        "definedatomstereocnt": "Defined atom stereocenter count",
        "undefinedatomstereocnt": "Undefined atoms stereocenter count",
        "totalbondstereocnt": "Total bond stereocenter count",
        "definedbondstereocnt": "Defined bond stereocenter count",
        "undefinedbondstereocnt": "Undefined bond stereocenter count",
        "meshheadings": "MeSH headings"
    }
)

pc_cov.head()

shape: (5, 38)

cid	Compound name	Synonyms	Molecular weight	Molecular formula	Polar surface area	Complexity	Partition coefficients	Heavy atom count	Hydrogen bond donor count	Hydrogen bond acceptor count	Rotatable bond count	inchi	isosmiles	canonicalsmiles	inchikey	iupacname	Exact mass	Monoisotopic mass	Formal charge	Covalently-bonded unit count	Isotope atom count	Total atom stereocenter count	Defined atom stereocenter count	Undefined atoms stereocenter count	Total bond stereocenter count	Defined bond stereocenter count	Undefined bond stereocenter count	pclidcnt	gpidcnt	MeSH headings	annothits	annothitcnt	aids	cidcdate	sidsrcname	depcatg	annotation
i64	str	str	f64	str	f64	f64	str	i64	i64	i64	i64	str	str	str	str	str	f64	f64	i64	i64	i64	i64	i64	i64	i64	i64	i64	i64	i64	str	str	i64	str	i64	str	str	str
5280453	"Calcitriol"	"calcitriol\|322...	416.6	"C27H44O3"	60.7	688.0	"5.100"	30	3	3	6	"InChI=1S/C27H4...	"C[C@H](CCCC(C)...	"CC(CCCC(C)(C)O...	"GMRQFYUYWCNGIN...	"(1R,3S,5Z)-5-[...	416.329	416.329	0	1	0	6	6	0	2	2	0	22311	46029	"Calcitriol"	"Biological Tes...	12	"485\|631\|731\|78...	20040916	"A2B Chem\|AA BL...	"Chemical Vendo...	"COVID-19, COVI...
9962735	"Ubiquinol"	"ubiquinol\|992-...	865.4	"C59H92O4"	58.9	1600.0	"20.200"	63	2	4	31	"InChI=1S/C59H9...	"CC1=C(C(=C(C(=...	"CC1=C(C(=C(C(=...	"QNTNKSLOFHEFPK...	"2-[(2E,6E,10E,...	864.7	864.7	0	1	0	0	0	0	9	9	0	2732	21358	"NULL"	"Chemical and P...	7	"NULL"	20061025	"001Chemical\|A2...	"Chemical Vendo...	"COVID-19, COVI...
5961	"Glutamine"	"L-glutamine\|gl...	146.14	"C5H10N2O3"	106.0	146.0	"-3.100"	10	3	4	4	"InChI=1S/C5H10...	"C(CC(=O)N)[C@@...	"C(CC(=O)N)C(C(...	"ZDXPYRJPNDTMRX...	"(2S)-2,5-diami...	146.069	146.069	0	1	0	1	1	0	0	0	0	88218	399	"Glutamine"	"Biological Tes...	12	"422\|429\|436\|54...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...
2244	"Aspirin"	"aspirin\|ACETYL...	180.16	"C9H8O4"	63.6	212.0	"1.200"	13	1	4	3	"InChI=1S/C9H8O...	"CC(=O)OC1=CC=C...	"CC(=O)OC1=CC=C...	"BSYNRYMUTXBXSQ...	"2-acetyloxyben...	180.042	180.042	0	1	0	0	0	0	0	0	0	127012	364455	"Aspirin"	"Biological Tes...	12	"1\|3\|9\|15\|19\|21...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...
457	"1-Methylnicoti...	"1-methylnicoti...	137.16	"C7H9N2O+"	47.0	136.0	"-0.100"	10	1	1	1	"InChI=1S/C7H8N...	"C[N+]1=CC=CC(=...	"C[N+]1=CC=CC(=...	"LDHMAVIPBRSVRG...	"1-methylpyridi...	137.071	137.071	1	1	0	0	0	0	0	0	0	310	674	"NULL"	"Biological Tes...	8	"61001\|61002\|14...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...

Definitions of molecular properties in this PubChem dataset

The definitions for some of the column names were shown below, which were mainly derived and adapted from PubChem:

Note: please refer to PubChem documentations for full definitions

Molecular weight - molecular mass of compounds measured in daltons
Topological polar surface area - measured as an estimate of polar surface area of a molecule (i.e. the surface sum over polar atoms in a molecule), with units in angstrom squared (Å²)
Complexity - complexity rating for compounds, based on Bertz/Hendrickson/Ihlenfeldt formula as a rough estimation of how complex a compound was structurally
Partition coefficients (xlogp) - predicted octanol-water partition coefficient as a measure of the hydrophilicity or hydrophobicity of a molecule
Heavy atom count - number of heavy atoms e.g. non-hydrogen atoms in the compound
Hydrogen bond donor count - number of hydrogen bond donors in the compound
Hydrogen bond acceptor count - number of hydrogen bond acceptors in the compound
Rotatable bond count - defined as any single-order non-ring bond, where atoms on either side of the bond were in turn bound to non-terminal heavy atoms (e.g. non-hydrogen). Rotation around the bond axis would change overall molecule shape and generate conformers which could be distinguished by standard spectroscopic methods
Exact mass - exact mass of an isotopic species, obtained by summing masses of individual isotopes of the molecule
Monoisotopic mass - sum of the masses of atoms in a molecule, using unbound, ground-state, rest mass of principal (or most abundant) isotope for each element instead of isotopic average mass
Formal charge - the difference between the number of valence electrons of each atom, and the number of electrons the atom was associated with, assumed any shared electrons were equally shared between the two bonded atoms
Covalently-bonded unit count - a group of atoms connected by covalent bonds, ignoring other bond types (or a single atom without covalent bonds), representing number of such units in the compound
Isotope atom count - number of isotopes that were not most abundant for the corresponding chemical elements. Isotopes were variants of a chemical element that differed in neutron number
Defined atom stereocenter count - atom stereocenter (or chiral center) was where an atom was attached to 4 different types of atoms or groups of atoms in a tetrahedral arrangement. It could either be (R)- or (S)- configurations. Some of the compounds e.g. racemic mixtures, could have undefined atom stereocenter, where (R/S)-config was not specifically defined. Defined atom stereocenter count was the number of atom stereocenters where configurations were specifically defined
Undefined atoms stereocenter count - this was the undefined version of the atoms stereocenter count
Defined bond stereocenter count - bond stereocenter (or non-rotatable bond) was where two atoms could have different arrangement e.g. in cis- & trans- forms of butene around its double bond. Some compounds could have an undefined bond stereocenter (stereochemistry not specifically defined). Defined bond stereocenter count was the number of bond stereocenters where configurations were specifically defined.
Undefined bond stereocenter count - this was the undefined version of the bond stereocenter count

Convert data type for selected columns

# Convert data type - only for partition coefficients column (rest were okay)
pc_cov = pc_cov.with_column((pl.col("Partition coefficients")).cast(pl.Float64, strict = False))
pc_cov.head()

shape: (5, 38)

cid	Compound name	Synonyms	Molecular weight	Molecular formula	Polar surface area	Complexity	Partition coefficients	Heavy atom count	Hydrogen bond donor count	Hydrogen bond acceptor count	Rotatable bond count	inchi	isosmiles	canonicalsmiles	inchikey	iupacname	Exact mass	Monoisotopic mass	Formal charge	Covalently-bonded unit count	Isotope atom count	Total atom stereocenter count	Defined atom stereocenter count	Undefined atoms stereocenter count	Total bond stereocenter count	Defined bond stereocenter count	Undefined bond stereocenter count	pclidcnt	gpidcnt	MeSH headings	annothits	annothitcnt	aids	cidcdate	sidsrcname	depcatg	annotation
i64	str	str	f64	str	f64	f64	f64	i64	i64	i64	i64	str	str	str	str	str	f64	f64	i64	i64	i64	i64	i64	i64	i64	i64	i64	i64	i64	str	str	i64	str	i64	str	str	str
5280453	"Calcitriol"	"calcitriol\|322...	416.6	"C27H44O3"	60.7	688.0	5.1	30	3	3	6	"InChI=1S/C27H4...	"C[C@H](CCCC(C)...	"CC(CCCC(C)(C)O...	"GMRQFYUYWCNGIN...	"(1R,3S,5Z)-5-[...	416.329	416.329	0	1	0	6	6	0	2	2	0	22311	46029	"Calcitriol"	"Biological Tes...	12	"485\|631\|731\|78...	20040916	"A2B Chem\|AA BL...	"Chemical Vendo...	"COVID-19, COVI...
9962735	"Ubiquinol"	"ubiquinol\|992-...	865.4	"C59H92O4"	58.9	1600.0	20.2	63	2	4	31	"InChI=1S/C59H9...	"CC1=C(C(=C(C(=...	"CC1=C(C(=C(C(=...	"QNTNKSLOFHEFPK...	"2-[(2E,6E,10E,...	864.7	864.7	0	1	0	0	0	0	9	9	0	2732	21358	"NULL"	"Chemical and P...	7	"NULL"	20061025	"001Chemical\|A2...	"Chemical Vendo...	"COVID-19, COVI...
5961	"Glutamine"	"L-glutamine\|gl...	146.14	"C5H10N2O3"	106.0	146.0	-3.1	10	3	4	4	"InChI=1S/C5H10...	"C(CC(=O)N)[C@@...	"C(CC(=O)N)C(C(...	"ZDXPYRJPNDTMRX...	"(2S)-2,5-diami...	146.069	146.069	0	1	0	1	1	0	0	0	0	88218	399	"Glutamine"	"Biological Tes...	12	"422\|429\|436\|54...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...
2244	"Aspirin"	"aspirin\|ACETYL...	180.16	"C9H8O4"	63.6	212.0	1.2	13	1	4	3	"InChI=1S/C9H8O...	"CC(=O)OC1=CC=C...	"CC(=O)OC1=CC=C...	"BSYNRYMUTXBXSQ...	"2-acetyloxyben...	180.042	180.042	0	1	0	0	0	0	0	0	0	127012	364455	"Aspirin"	"Biological Tes...	12	"1\|3\|9\|15\|19\|21...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...
457	"1-Methylnicoti...	"1-methylnicoti...	137.16	"C7H9N2O+"	47.0	136.0	-0.1	10	1	1	1	"InChI=1S/C7H8N...	"C[N+]1=CC=CC(=...	"C[N+]1=CC=CC(=...	"LDHMAVIPBRSVRG...	"1-methylpyridi...	137.071	137.071	1	1	0	0	0	0	0	0	0	310	674	"NULL"	"Biological Tes...	8	"61001\|61002\|14...	20040916	"001Chemical\|3B...	"Chemical Vendo...	"COVID-19, COVI...

Select columns for data visualisations

The idea was really only keeping all the numerical columns for some data visualisations later. So I’ve dropped all the other columns in texts or of the string types.

# Drop unused columns in preparation for data visualisations
pc_cov = pc_cov.drop([
    "cid", 
    "Synonyms",
    "Molecular formula",
    "inchi",
    "isosmiles",
    "canonicalsmiles",
    "inchikey",
    "iupacname",
    "pclidcnt",
    "gpidcnt",
    "MeSH headings",
    "annothits",
    "annothitcnt",
    "aids",
    "cidcdate",
    "sidsrcname",
    "depcatg",
    "annotation"
])

pc_cov.head()

shape: (5, 20)

Compound name	Molecular weight	Polar surface area	Complexity	Partition coefficients	Heavy atom count	Hydrogen bond donor count	Hydrogen bond acceptor count	Rotatable bond count	Exact mass	Monoisotopic mass	Formal charge	Covalently-bonded unit count	Isotope atom count	Total atom stereocenter count	Defined atom stereocenter count	Undefined atoms stereocenter count	Total bond stereocenter count	Defined bond stereocenter count	Undefined bond stereocenter count
str	f64	f64	f64	f64	i64	i64	i64	i64	f64	f64	i64	i64	i64	i64	i64	i64	i64	i64	i64
"Calcitriol"	416.6	60.7	688.0	5.1	30	3	3	6	416.329	416.329	0	1	0	6	6	0	2	2	0
"Ubiquinol"	865.4	58.9	1600.0	20.2	63	2	4	31	864.7	864.7	0	1	0	0	0	0	9	9	0
"Glutamine"	146.14	106.0	146.0	-3.1	10	3	4	4	146.069	146.069	0	1	0	1	1	0	0	0	0
"Aspirin"	180.16	63.6	212.0	1.2	13	1	4	3	180.042	180.042	0	1	0	0	0	0	0	0	0
"1-Methylnicoti...	137.16	47.0	136.0	-0.1	10	1	1	1	137.071	137.071	1	1	0	0	0	0	0	0	0

Quick summary statistics of columns

# Overall descriptive statistics of kept columns
pc_cov.describe()

shape: (7, 21)

describe	Compound name	Molecular weight	Polar surface area	Complexity	Partition coefficients	Heavy atom count	Hydrogen bond donor count	Hydrogen bond acceptor count	Rotatable bond count	Exact mass	Monoisotopic mass	Formal charge	Covalently-bonded unit count	Isotope atom count	Total atom stereocenter count	Defined atom stereocenter count	Undefined atoms stereocenter count	Total bond stereocenter count	Defined bond stereocenter count	Undefined bond stereocenter count
str	str	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64
"count"	"631"	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0	631.0
"null_count"	"0"	0.0	0.0	0.0	173.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
"mean"	null	549.539675	163.915368	864.755626	2.25917	37.770206	4.066561	9.210777	9.518225	549.095022	549.06013	-0.004754	1.578447	0.006339	4.017433	3.551506	0.465927	0.381933	0.343899	0.038035
"std"	null	455.236826	192.256415	1000.220379	3.926459	31.821967	6.348004	8.694184	15.393131	455.064211	454.958033	0.358537	1.610416	0.079429	6.128363	5.787792	2.364089	1.181171	1.107245	0.363159
"min"	"(+)-Mefloquine...	103.1	0.0	0.0	-24.0	1.0	0.0	0.0	0.0	103.04	103.04	-6.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
"max"	"sodium;8-amino...	4114.0	1650.0	9590.0	20.2	291.0	57.0	65.0	151.0	4112.12	4111.12	2.0	21.0	1.0	39.0	39.0	31.0	11.0	11.0	7.0
"median"	null	435.9	110.0	635.0	2.5	30.0	3.0	7.0	6.0	435.227	435.227	0.0	1.0	0.0	1.0	1.0	0.0	0.0	0.0	0.0

Conditional assignments in Polars

The longer I’ve used Polars, the more I like its coding styles of chaining a string of different code functions together to manipulate dataframes in one go. This usually might mean that we could avoid writing some repeated loop functions to achieve the same results. In the example below, I’d like to show how to chain “when-then-otherwise” expressions by using Polars.

Chaining when-then-otherwise expressions - creating groups in data

I had the idea of separating all data into 3 different ranges of partition coefficients, so that this could be shown visually in plots. One of the possible ways (other than writing a loop function), or really the long way, to do this might be like the code shown below:

```{python}
part_coef_1 = pc_cov.filter(pl.col("Partition_coef") <= -10)
part_coef_2 = pc_cov.filter((pl.col("Partition_coef") >= -11) & (pl.col("Partition_coef") <= 5))
part_coef_3 = pc_cov.filter(pl.col("Partition_coef") >= 6)
```

A shorter and probably more elegant way was to use the “when-then-otherwise” expression in Polars for conditional assignments (the following code snippet was adapted with thanks to the author of Polars, Ritchie Vink and also the good old Stack Overflow):

pc_cov = pc_cov.with_column(
    pl.when((pl.col("Partition coefficients") <= -10))
    .then("Smaller than -10")
    .when((pl.col("Partition coefficients") >= -11) & (pl.col("Partition coefficients") <= 5))
    .then("Between -11 and 5")
    .otherwise("Larger than 6")
    .alias("Part_coef_group")
)

pc_cov.head(10)

# a new column would be added to the end of the dataframe 
# with a new column name, "Part_coef_group" 
# (scroll to the very right to see the added column)

shape: (10, 21)

Compound name	Molecular weight	Polar surface area	Complexity	Partition coefficients	Heavy atom count	Hydrogen bond donor count	Hydrogen bond acceptor count	Rotatable bond count	Exact mass	Monoisotopic mass	Formal charge	Covalently-bonded unit count	Isotope atom count	Total atom stereocenter count	Defined atom stereocenter count	Undefined atoms stereocenter count	Total bond stereocenter count	Defined bond stereocenter count	Undefined bond stereocenter count	Part_coef_group
str	f64	f64	f64	f64	i64	i64	i64	i64	f64	f64	i64	i64	i64	i64	i64	i64	i64	i64	i64	str
"Calcitriol"	416.6	60.7	688.0	5.1	30	3	3	6	416.329	416.329	0	1	0	6	6	0	2	2	0	"Larger than 6"
"Ubiquinol"	865.4	58.9	1600.0	20.2	63	2	4	31	864.7	864.7	0	1	0	0	0	0	9	9	0	"Larger than 6"
"Glutamine"	146.14	106.0	146.0	-3.1	10	3	4	4	146.069	146.069	0	1	0	1	1	0	0	0	0	"Between -11 an...
"Aspirin"	180.16	63.6	212.0	1.2	13	1	4	3	180.042	180.042	0	1	0	0	0	0	0	0	0	"Between -11 an...
"1-Methylnicoti...	137.16	47.0	136.0	-0.1	10	1	1	1	137.071	137.071	1	1	0	0	0	0	0	0	0	"Between -11 an...
"Losartan"	422.9	92.5	520.0	4.3	30	2	5	8	422.162	422.162	0	1	0	0	0	0	0	0	0	"Between -11 an...
"Vitamin E"	430.7	29.5	503.0	10.7	31	1	2	12	430.381	430.381	0	1	0	3	3	0	0	0	0	"Larger than 6"
"Nicotinamide"	122.12	56.0	114.0	-0.4	9	1	2	1	122.048	122.048	0	1	0	0	0	0	0	0	0	"Between -11 an...
"Adenosine"	267.24	140.0	335.0	-1.1	19	4	8	2	267.097	267.097	0	1	0	4	4	0	0	0	0	"Between -11 an...
"Inosine"	268.23	129.0	405.0	-1.3	19	4	7	2	268.081	268.081	0	1	0	4	4	0	0	0	0	"Between -11 an...

Import Plotly

Time for some data vizzes - importing Plotly first.

import plotly.express as px

Some examples of data visualisations

Below were some of the examples of building plots by using Plotly.

Partition coefficients vs. Molecular weights

fig = px.scatter(x = pc_cov["Partition coefficients"], 
                 y = pc_cov["Molecular weight"], 
                 hover_name = pc_cov["Compound name"],
                 color = pc_cov["Part_coef_group"],
                 width = 800, 
                 height = 400,
                 title = "Partition coefficients vs. molecular weights for compounds used in COVID-19 clinical trials")

fig.update_layout(
    title = dict(
        font = dict(
            size = 15)),
    title_x = 0.5,
    margin = dict(
        l = 20, r = 20, t = 40, b = 3),
    xaxis = dict(
        tickfont = dict(size = 9), 
        title = "Partition coefficients"
    ),
    yaxis = dict(
        tickfont = dict(size = 9), 
        title = "Molecular weights"
    ),
    legend = dict(
        font = dict(
            size = 9)))

fig.show()

Molecular weights vs. Complexity

fig = px.scatter(x = pc_cov["Molecular weight"], 
                 y = pc_cov["Complexity"], 
                 hover_name = pc_cov["Compound name"],
                 #color = pc_cov["Part_coef_group"],
                 width = 800, 
                 height = 400,
                 title = "Molecular weights vs. complexity for compounds used in COVID-19 clinical trials")

fig.update_layout(
    title = dict(
        font = dict(
            size = 15)),
    title_x = 0.5,
    margin = dict(
        l = 20, r = 20, t = 40, b = 3),
    xaxis = dict(
        tickfont = dict(size = 9), 
        title = "Molecular weights"
    ),
    yaxis = dict(
        tickfont = dict(size = 9), 
        title = "Complexity"
    ),
    legend = dict(
        font = dict(
            size = 9)))

fig.show()

Export prepared dataset

Two of the possible options to export the dataset for use in a Shiny app could be:

Convert Polars dataframe into a Pandas dataframe, so that it could be imported into the app for use (Polars not directly supported in Shiny for Python yet, but we could use its to_pandas() function to coerce an object e.g. a dataframe to be converted into a Pandas dataframe).
Another option was to save Polars dataframe as .csv file, then read in this file in the app.py script by using Pandas (which was the method I used for this particular app)

```{python}
# --If preferring to use Pandas--
# Convert Polars df into a Pandas df if needed
df_name = df_name.to_pandas()

# Convert the Pandas df into a csv file using Pandas 
df_name.to_csv("csv_file_name.csv", sep = ",")

# --If preferring to use Polars--
# Simply write a Polars dataframe into a .csv file
df_name.write_csv("csv_file_name.csv", separator = ",")
```