Building an End-To-End Data Science Project

Building an End-To-End Data Science ProjectLearnings from my Data Scientist Ideal Profiles projectIt is often said that the majority of a Data Scientist’s work is not the actual analysis and modeling, but rather the data wrangling and cleaning part. As a result, full-cycle data science projects that involve these stages will be more valuable since they prove the author’s abilities to work independently with real data, as opposed to a given cleaned dataset.Fully understanding the value of an end-to-end data science project, I always wanted to build one but not able to, until now :)I have recently finished my Ideal Profiles project..Since it’s a major project that involves many moving parts, I want to document the process and the lessons learned, which is a further learning opportunity (inspired by William Koehrsen’s great post on the value of data science writing).StagesIn my opinion, a full-cycle data science project should include the following stages:The biggest counter-argument for working on a Kaggle project is often that it’s only focused on the second stage..Therefore, in this project, I made sure that all three stages are covered.For the first stage, I did web scraping to get the data and since the data was dirty, I had to wrangle to make the data ready for analysis..Functions are then imported and called from the Notebook as needed like this:from scrape_data import *from process_text import *from helper import *ReproducibilityAs many of the scraping scripts I found online didn’t work, I was determined to make sure that my project is reproducible.. More details

Leave a Reply