Exploratory analysis on suicide dataLuis MeazziniBlockedUnblockFollowFollowingApr 25IntroductionIt seems that with each passing year, suicides are becoming more common.
Myself, who is only 25 years old, I had an acquaintance who during college committed suicide.
During the week, I was hanging out by Kaggle, and found a set of data on suicides.
At the time I thought about doing a little analysis and try to understand the subject better.
So now, in this article, I present this brief review.
If someone also wants to “play” with this data set just get the kaggle over the link.
I will not put all the code I used to do the visualizations (it’s in GitHub) and ask questions that the data set might answer.
But before trying to understand the data, I will briefly explain what is in these data.
Data descriptionEach data in the data set represents a year, a country, a certain age range, and a gender.
For example, in the country Brazil in the year 1985, over 75 years, committed suicide 129 men.
The data set has 10 attributes.
These being:Country: country of record data;Year: year of record data;Sex: Sex (male or female);Age: Suicide age range, ages divided into six categories;Suicides_no: number of suicides;Population: population of this sex, in this age range, in this country and in this year;Suicides / 100k pop: Reason between the number of suicides and the population / 100k;GDP_for_year: GDP of the country in the year who issue;GDP_per_capita: ratio between the country’s GDP and its population;Generation: Generation of the suicides in question, being possible 6 different categories.
Possible age categories and generations are:df['age'].
unique()array(['15-24 years', '35-54 years', '75+ years', '25-34 years', '55-74 years', '5-14 years'], dtype=object)And the possible generations are:df['generation'].
unique()array(['Generation X', 'Silent', 'G.
Generation', 'Boomers', 'Millenials', 'Generation Z'], dtype=object)Originally, the data set presented the countries HDI, but the vast majority of the values were null.
As I want to analyze whether the influence of a country’s development on the amount of suicides, so I added a column in the data.
I went to the site, and I took the name of all countries considered first and second world, I classified the data of our dataset using this information in three categories.
First, second and third world.
Frist_world = ['United States', 'Germany', 'Japan', 'Turkey', 'United Kingdom', 'France', 'Italy', 'South Korea', 'Spain', 'Canada', 'Australia', 'Netherlands', 'Belgium', 'Greece', 'Portugal', 'Sweden', 'Austria', 'Switzerland', 'Israel', 'Singapore', 'Denmark', 'Finland', 'Norway', 'Ireland', 'New Zeland', 'Slovenia', 'Estonia', 'Cyprus', 'Luxembourg', 'Iceland']Second_world = ['Russian Federation', 'Ukraine', 'Poland', 'Uzbekistan', 'Romania', 'Kazakhstan', 'Azerbaijan', 'Czech Republic', 'Hungary', 'Belarus', 'Tajikistan', 'Serbia', 'Bulgaria', 'Slovakia', 'Croatia', 'Maldova', 'Georgia', 'Bosnia And Herzegovina', 'Albania', 'Armenia', 'Lithuania', 'Latvia', 'Brazil', 'Chile', 'Argentina', 'China', 'India', 'Bolivia', 'Romenia']country_world = for i in range(len(df)): if df['country'][i] in Frist_world: country_world.
append(1) elif df['country'][i] in Second_world: country_world.
append(2) else: country_world.
append(3)df['country_world'] = country_worldNow, finally, let’s go exploratory.
Exploring the DataI will try to ask questions, and answer them quantitatively, through graphics.
Every analysis done in this part refers to the whole world.
Over the years, has the number of suicides increased?As the data goes until 2016, this chart suggests that when the data was collected, 2016 was in the beginning.
And also, we can see that from 1988 to 1990 the number of suicides increased greatly.
Moreover, the amount seems to have grown in a expected and slightly declining way in recent years, perhaps due to the most diverse suicide prevention campaigns.
Who tend to commit suicide more?.Teens?.Adults?.elders?Well, it seems like adults definitely commit suicide more, but what IS the point of it?.From the data, we have no information to answer.
But I imagine it’s a common age for depression and the like.
What about sex?.Who commits more suicide, men or women?Definitely, men.
Does this pattern repeat for all age groups?Yes, men commit suicide considerably more than women, even before the age of 14.
Are there countries that commit more suicide?More populous countries have a natural tendency to have more suicides, I used the field of quantity of suicides normalized by the population of the country.
In this way, the number of suicides is measured every 100 thousand inhabitants.
Although differences in the numbers of suicides are not so great, there are countries that stand out, such as Russia and Lithuania.
And the generation, they also influence in something?The boomers, silent and X generations are made up of people born until 1976.
These are the ones who were most in the age range where most suicides occur.
Just check the chart that deals with the age brackets.
However, what about the development of a country, does it change anything?More developed countries have a higher suicide rate.
This can be justified by various theories, such as excessive work, religious issues, rates of psychiatric illness, etc.
As for the GDP per capita, is there influence?Apparently, in impoverished places, there is much suicide.
As income increases, suicide decreases together.
However, from a point (~ 20k), suicide tends to increase again.
Apparently, the data shows some flaws in the 60k range.
Is there a correlation between data set attributes?The highest correlations are between population and GDP since rich countries, in general, are more populous.
Also, between the number of suicides and the population, if there are more people, then more suicides.
The correlation between GDP per capita and the world of the country occurs negatively since the first world countries have a higher income, and third world, lower income.
Does the distribution of suicides vary when we change countries?I have chosen some of the countries whose indices may reveal something interesting.
Note that I used the number of suicides per 100k inhabitants.
In this case, the distribution of the number of suicides in Brazil is better than in the rest of the countries, we have a low number of outliers, and a distribution concentrated in low values in comparison with the other countries.
While countries like Russia, it has a much more dispersed distribution and several points with a high amount of suicides.
Brazilian DataAs Brazil is in these data, and since I am Brazilian, I have a particular interest in the data of suicides in Brazil.
So I’ll try to analyze the specific rates in this country.
How does the number of suicides vary over time?Apparently, quite different from the rest of the world.
As Brazil is a developing country, rates of suicides that looked more like those of poorer countries are rising to that of more developed countries.
In Brazil, do adults also have higher suicide rates?Apparently, the age range 35–54 who is the record holder in suicides worldwide, does not happen in Brazil.
This generation is the Boomers.
ConclusionIn this article, the idea is a quantitative exploratory analysis of data on the amount of suicide.
Overall, we can see that the data show what we see in newspapers, television and the like, in my view most of the conclusions that I reached were already predictable; only imagined a more significant number of teenagers committing suicide, which was not seen in the data.
I tried not to justify the graphs for demographic, social and economic reasons, keeping the article neutral.
However, there may be several explanations for the numbers available.
One of the reflections in the form of a popular ready and paradoxical phrase is:The more suicides, than less suicidalI believe that, from the above, one can quantitatively ascertain the truth of this statement.
All the code used to generate the graph, dataset, and a few more things are available on GitHub.
lmeazzini/Suicidal-analysisAnalysis of suicidal data from 1985 to 2016.
Contribute to lmeazzini/Suicidal-analysis development by creating an…github.
com.. More details