Suicide Prevention Insights with Data Science

Suicide Prevention Insights with Data ScienceReligion as a powerful anti-depressantDilyan KovachevBlockedUnblockFollowFollowingJan 5I believe I was about eleven years old when I first experienced the strong emotional shock of finding out that people I knew, admired and deemed successful and happy, had suddenly decided to end their own lives by committing suicide.

This happened to two of my childhood friends’ dads within the short span of a few years.

Both men seemed like loving parents and model citizens, so the news came as a big shock to me and to their respective communities.

The same tragic story repeated with three of the main heroes of my adolescent years — Kurt Cobain, Robin Williams, and Anthony Bourdain.

Over the years, I’ve spent sleepless nights trying to figure out what could possibly push people, whom I loved and admired and who, at first sight, appeared to have successful and happy lives, towards such extremes.

Driven by all of these unanswered questions, which have probably tortured anyone who has had a close encounter with a shocking suicide of a close friend, family member or star-idol, I decided to partner up with my classmate Brian Srebrenik and try to investigate the mystery of suicide with Data Science.

The goal of this project was to build a Machine Learning supervised regression model using a number of socioeconomic and mental health metrics of 200 Countries and try to determine which factors might have a statistically significant correlation with National Suicide Rates.

After doing some research into the previous work done on the subject, we collected data on the following independent variables which we believed would help to build our model.

We used data for the year 2016.

Depression rateSchizophrenia rateAnxiety rateBipolar DisorderEating DisorderAlcohol Use DisorderDrug Use DisorderGDPHealth Spending per capitaUnemployment RatePercentage of Population Living in Cities with > 100k PeoplePercentage of Population with Internet AccessCategorical data on the % of the population that is ReligiousNote: If you are not interested in the technical details of how we built and optimized our model, please skip to the section “Model summary and final insights and conclusions” for a detailed summary of our findings.

Preparing the Data and Checking for MulticollinearityFirst, we gathered the data from the World Health Organization API and a few other sites with annual socioeconomic data for 2016.

After cleaning all the data and merging it into one pd.

DataFrame we checked for independent variable multicollinearity using a SeaBorn Correlation Heatmap:Colinearity Heatmap using sns.


corr(), center = 0)The heatmap informed us that Bipolar Disorder % has a very high correlation to three other mental health issues, so we decided to drop that variable before running our model to avoid extra noise caused by the heavy multicollinearity.

Checking for linear relationships between each independent variable and Suicide RateNext, we created paired scatterplots of each independent variable’s correlation to our dependent variable — Suicide Rate:From the graphs above we can see that there is a somewhat linear relationship between rates of depression, anxiety, schizophrenia, alcohol use, and levels of Religiousness of a country.

So, our initial expectations were that our Machine Learning model should be assigning more weight to some of these categories.

Scaling our independent variables and introducing variable combinations and polynomials to our modelSince some of our independent variables are measured in thousands and others are measured in percentages — we decided to scale all the independent variables to make sure our model doesn’t assign coefficients that are disproportionately large for certain variables.

Scaling variables using sklearn.


scaleThe next step was to introduce interactive terms via a Polynomial function of a second degree, to help our model with Feature selection and boost its final prediction accuracy and reduce its Root Mean Squared Error.

The Correlation Heatmap below helps us identify all the new variables that are correlated:Applying RFE and Cross Validation to each regression model to pick the optimal model that minimizes our RMSENow that our data is clean, scaled and preprocessed it is time for the final step — training our Machine Learning model on our train data in order to improve its predictive accuracy on the test data.

We used three model optimization sub-steps to create our own custom GridSearch function:We ran Linear, Lasso and Ridge Regressions using the function below to determine which one generates the model with the smallest error.

For this particular dataset, we found that Ridge Regression had the best results.

We used an 80–20 split for the Train_data-Test_data ratio.

We used RFE(Recursive Feature Elimination) to feed all the independent variables to each model one by one and pick the number and combination of variables that minimize the RMSE.

We used K-Fold cross-validation to optimize each model’s ability to predict accurately on test data that it has not seen before.

(Code below)Our custom GridSearch algorithmThe graph below shows our RMSE function and how we train our model to pick the number of independent variables that generate the lowest error:22 Variables were chosen with an RMSE = 3.

54Model summary and final insights and conclusionsNow that most of the heavy work is done it is time to analyze our results.

First, we plotted scatterplots with the model statistics for our Training Data versus our Test Data to ensure that there is not much train data bias or overfitting.

As we can see from the graphs below the RMSE, z_scores, and R_squared of the test and train samples are pretty similar, which is what we were hoping for.

The R_squared score of 0.

66 is not amazing, but given that we only had about 150–200 examples (Countries), we’re pretty happy with the model performance.

We’re confident that if we could feed 500+ observations into our training set, the model’s predictive accuracy will improve significantly.

Model performance on test_data versus train_dataWe also wanted to look at the actual independent variables and their coefficients and try to draw some insights and conclusions from their distribution.

Below is the final model equation and a bar-chart of each independent variable and their weight on the final model:Suicide_Rate = 10.

28 + 3.

58*Depression + 2.

74*Alcohol_use – 1.

87*Rel_High – 2.

24*Schizophrenia Depression + 0.

65*Schizophrenia health_spend_perca + 1.

23*Schizophrenia Rel_Low – 1.

56*Eating_disorders Drug_use + 1.

04*Eating_disorders Depression – 2.

05*Eating_disorders Alcohol_use + 1.

49*Eating_disorders urban_population + 0.

72*Anxiety^2 – 1.

17*Anxiety Depression – 1.

37*Anxiety GDP – 2.

89*Anxiety Rel_High + 1.

67*Drug_use Unem_rate + 1.

58*Drug_use GDP – 2.

48*Depression Rel_High – 1.

76*Depression Rel_Low – 0.

92*Alcohol_use^2 + 0.

71*Alcohol_use GDP + 2.

16*Alcohol_use urban_population – 1.

42*Alcohol_use Rel_High – 2.

23*Unem_rate GDP + 0.

91*Unem_rate health_spend_perca – 0.

45*Unem_rate Internet_access_percap – 0.

82*GDP urban_population + 1.

02*GDP Rel_High + 2.

14*urban_population Rel_LowConclusion:From the equation coefficients and the bar chart below we can posit the following hypothesis that can be tested further:Variables which tend to increase a country’s Suicide Rate:Depression rateAlcohol use disorder rateUrban population % combined with a Low Level of ReligiousnessUrban population % combined with high Alcohol useThe rate of Schizophrenia combined with a Low Level of ReligiousnessDrug use disorder combined with a high unemployment rateDrug use combined with GDP level2.

Variables which tend to decrease a country’s Suicide Rate:High level of population’s ReligiousnessDepression rate combined with High ReligiousnessAnxiety rate combined with High ReligiousnessAlcohol use disorder rate combined with High ReligiousnessDrug use rate combined with High ReligiousnessEating disorder rate combined with Alcohol use rateEating disorder rate combined with Drug use rateInvestigating the inverse relationship between a country’s religiousness and it's suicide rateI was really intrigued and surprised to find out that our model put a lot of weight on the level of religiousness of a country when it comes to its effect on suicide rates.

If we look closely at the coefficients chart above, we can see that most features that seem to have a very strong direct effect on high suicide rates such as Depression, Alcohol use and Schizophrenia, tend to lose their negative effect in countries with highly religious populations.

This interesting phenomenon is demonstrated in the two graphs below:Most High suicide rates fall inside the range of countries where less than 50% of the population is religiousCountries with the lowest level of religiousness have a much higher AVG Suicide RateFinal Personal ThoughtsEven though we didn’t analyze enough data to be able to make any strong conclusions, our findings provoked me to think about the topic a bit further and try to come up with valid explanation of how Religion can negate the effects of some of the factors that seem to cause high suicide rates such as Depression, Anxiety and Drug use.

I did some research on the main symptoms of people suffering from Depression and suicidal thoughts listed below:Feeling a deep sense of hopelessness about the future, with little expectation that circumstances can improveThe feeling of loneliness and tendency to avoid friends or social activitiesThe feeling of an emotional void, lack of love or personal connectionsIncreased use of drugs and/or alcoholHaving Experienced a recent trauma or life crisisHow might Religion happen to counter those symptoms and act as a spiritual anti-depressant?Religion often revolves around faith and hope (tackles hopelessness)Religion often tries to assign meaning to suffering and tries to teach people how to overcome personal trauma (tackles the pain of trauma)Religious establishments tend to provide a support system and a sense of community (tackles the feeling of loneliness and anti-social behavior)Many religions have a conservative view on drugs and alcohol (tackles high drug/alcohol use disorders)Many Religions teach that God is love and God loves all beings unconditionally (tackles the lack of love and fills the emotional void)So, intentionally or by sheer randomness of circumstances, Religion seems to act as a powerful anti-depressant, which can fill many of the emotional and social voids that people with depression and suicidal thoughts seem to experience.

For me personally, this was the most interesting finding of our project and I can safely say that I will carry many of the other valuable lessons that I learned during this investigation and try to use them to help people in need in the future.

I sincerely hope that this article will also reach many people across the world, who can benefit from our findings in one way or another!.Peace and love to all beings ❤.

. More details

Leave a Reply