Python

Graphing in Python with seaborn

The data set used to illustrate the seaborn commands is the HELP study (data name is HELPrct), which was a clinical trial for adult inpatients recruited from a detoxification unit. The variables that we use throughout this tutorial include depression (cesd), homelessness status (homeless), primary abuse substance (substance), patient’s age (age), and patient’s
gender (sex).

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Sample data structure similar to HELPrct
# Replace this with your actual dataset load
# Example: df = pd.read_csv('HELPrct.csv')

df = pd.read_csv('HELPrct.csv')  

Univariate Graphing

Bar plot for a single categorical variable (substance)

sns.countplot(data=df, x='substance')
plt.title("Primary abuse substance of subjects")
plt.show()

Histogram of a quantitative variable (cesd)

sns.histplot(data=df, x='cesd', bins=30)
plt.title("Depression Scores of Subjects")
plt.show()

3. Density plot

sns.kdeplot(data=df, x='cesd')
plt.title("Depression Scores of Subjects")
plt.show()

Bivariate Graphing

Bar plot of means (grouped by substance)

mean_df = df.groupby('substance')['cesd'].mean().reset_index()
sns.barplot(data=mean_df, x='substance', y='cesd')
plt.ylabel("Depression")
plt.title("Mean Depression Scores at each Primary Abuse Substance")
plt.show()

Boxplots

sns.boxplot(data=df, x='substance', y='cesd', hue='substance')
plt.ylabel("Depression")
plt.title("Mean Depression Scores at each Primary Abuse Substance")
plt.show()

Density plots by group

sns.kdeplot(data=df, x='cesd', hue='substance')
plt.xlabel("Depression")
plt.title("Mean Depression Scores at each Primary Abuse Substance")
plt.show()

Mean with Error Bars

from scipy.stats import sem

summary_df = df.groupby('substance')['cesd'].agg(['mean', sem]).reset_index()
plt.errorbar(x=summary_df['substance'], y=summary_df['mean'], 
             yerr=summary_df['sem'], fmt='o', capsize=5, linestyle='None')
plt.xlabel("Substance")
plt.ylabel("Depression")
plt.title("Mean Depression Scores with Error Bars by Substance")
plt.show()