Simple Ways To Extract Features From Date Variable Using Python

Simple Ways To Extract Features From Date Variable Using PythonSwetha LakshmananBlockedUnblockFollowFollowingApr 30Date variables are a special type of categorical variable.

While, at first glance, a date gives us nothing more than a specific point on a timeline, when pre-processing properly, they can highly enrich the dataset.

Common date formats contain numbers and sometimes text as well to specify months and days.

Getting dates into a friendly format and extracting features of dates into new variables can be useful preprocessing steps.

For Example, from a date variable, you can extract basic features like,MonthQuarterSemesterYearDayDay of the weekWeekend or notTime difference in year / months / days etcLet’s go ahead and try it out.

Dataset:I will take an example of a dataset from a recent Analytics Vidhya Website’s competition — Loan Default Challenge.

You can download the dataset from here.

Now, let’s take the date variable ‘Date.

of.

Birth’ from the dataset and work on.

Importing Packages and dataset:import pandas as pdimport numpy as np import datetimecols = ['Date.

of.

Birth', 'DisbursalDate']data = pd.

read_csv('train.

csv',usecols=cols, nrows = 100)Conversion to DateTime Datatype:The first step is to convert the data type of the column to DateTime.

That can be done using .

to_datetime() method in Pandas.

data['Date.

of.

Birth']= pd.

to_datetime(data['Date.

of.

Birth']) data['DisbursalDate']= pd.

to_datetime(data['DisbursalDate'])data.

dtypesSeries.

dt :Series.

dt can be used to access the values of the series of a datetime variable and return several properties in the form of a numpy array.

Some of its attributes arepandas.

Series.

dt.

year returns the year of the date time.

pandas.

Series.

dt.

month returns the month of the date time.

pandas.

Series.

dt.

day returns the day of the date time.

pandas.

Series.

dt.

quarter returns the quarter of the date time.

Month from Date:data['month'] = data['Date.

of.

Birth'].

dt.

monthdata[['Date.

of.

Birth','month']].

head()Day from Date:data['day'] = data['Date.

of.

Birth'].

dt.

daydata[['Date.

of.

Birth','day']].

head()Year from Date:data['year'] = data['Date.

of.

Birth'].

dt.

yeardata[['Date.

of.

Birth','year']].

head()Quarter from Date:data['quarter'] = data['Date.

of.

Birth'].

dt.

quarterdata[['Date.

of.

Birth','quarter']].

head()Semester from Date:We can calculate the semester by applying a simple condition using the isin function on the quarter variable as follows,Quarter 1 & 2 as Semester 1the rest, Quarter 3 & 4 as Semester 2data['semester'] = np.

where(data.

quarter.

isin([1,2]),1,2)data[['Date.

of.

Birth','semester']].

head()Day of the week from Date:series.

dt.

dayofweek attribute returns a numpy array containing the day of the week of the DateTime variable with Monday = 0 & Sunday = 6data['dayofweek'] = data['Date.

of.

Birth'].

dt.

dayofweekdata[['Date.

of.

Birth','dayofweek']].

head()Similarly, series.

dt.

weekday_name attribute returns an array the name of the day in a week.

data['dayofweek_name'] = data['Date.

of.

Birth'].

dt.

weekday_namedata[['Date.

of.

Birth','dayofweek_name']].

head()Whether the day is a weekend or not:We can determine whether a day is a weekend or not by again using a simple isin function that assigns ‘Saturday’ and ‘Sunday’ to 1 and the rest of the days to 0.

data['is_weekend'] = np.

where(data['dayofweek_name'].

isin(['Sunday','Saturday']),1,0)data[['Date.

of.

Birth','is_weekend']].

head()Difference between two dates:The datetime.

today function returns the current local date.

Therefore in the below example, the current date — date of time will give us that age of the applicant.

(datetime.

datetime.

today() – data['Date.

of.

Birth']).

head()Similarly, if the variable had time along with the date, we can calculate the time difference in hours as well.

Apart from these, there are other attributes of series.

dt method that you can try out,pandas.

series.

dt.

hourpandas.

series.

dt.

secondpandas.

series.

dt.

minutepandas.

series.

dayofyearpandas.

series.

dt.

datepandas.

series.

dt.

timepandas.

series.

dt.

freqpandas.

series.

weekofyearSince no machine learning algorithm would be able to look at a datetime object and automatically infer the above, simple steps like these provides the possibility for insight into a dataset.

I hope this article was helpful to you in some way.

Happy learning :).

. More details

Leave a Reply