Mobile Ads Click-Through Rate (CTR) Prediction

Let’s see whether it is true.print(train.banner_pos.value_counts()/len(train))Figure 16banner_pos = train.banner_pos.unique()banner_pos.sort()ctr_avg_list=[]for i in banner_pos: ctr_avg=train.loc[np.where((train.banner_pos == i))].click.mean() ctr_avg_list.append(ctr_avg) print("for banner position: {}, click through rate: {}".format(i,ctr_avg))Figure 17The important banner positions are:position 0: 72% of the data and 0.16 CTRposition 1: 28% of the data and 0.18 CTRtrain.groupby(['banner_pos', 'click']).size().unstack().plot(kind='bar', figsize=(12,6), title='banner position histogram');Figure 18df_banner = train[['banner_pos','click']].groupby(['banner_pos']).count().reset_index()df_banner = df_banner.rename(columns={'click': 'impressions'})df_banner['clicks'] = df_click[['banner_pos','click']].groupby(['banner_pos']).count().reset_index()['click']df_banner['CTR'] = df_banner['clicks']/df_banner['impressions']*100sort_banners = df_banner.sort_values(by='CTR',ascending=False)['banner_pos'].tolist()plt.figure(figsize=(12,6))sns.barplot(y='CTR', x='banner_pos', data=df_banner, order=sort_banners)plt.title('CTR by banner position');Figure 19Although banner position 0 has the highest number of impressions and clicks, banner position 7 enjoys the highest CTR..Increasing the number of ads placed on banner position 7 seems to be a good idea.Device typeprint('The impressions by device types')print((train.device_type.value_counts()/len(train)))Figure 20train[['device_type','click']].groupby(['device_type','click']).size().unstack().plot(kind='bar', title='device types');Figure 21Device type 1 gets the most impressions and clicks, and the other device types only get the minimum impressions and clicks..We may want to look in more details about device type 1.df_click[df_click['device_type']==1].groupby(['hour_of_day', 'click']).size().unstack().plot(kind='bar', title="Clicks from device type 1 by hour of day", figsize=(12,6));Figure 22As expected, most clicks happened during the business hours from device type 1.device_type_click = df_click.groupby('device_type').agg({'click':'sum'}).reset_index()device_type_impression = train.groupby('device_type').agg({'click':'count'}).reset_index().rename(columns={'click': 'impressions'})merged_device_type = pd.merge(left = device_type_click , right = device_type_impression, how = 'inner', on = 'device_type')merged_device_type['CTR'] = merged_device_type['click'] / merged_device_type['impressions']*100merged_device_typeFigure 23The highest CTR comes from device type 0.Using the same way, I explored all the other categorical features such as site features, app features and C14-C21 features..The way of exploring are similar, the details can be found on Github, I will not repeat here.Building ModelsIntroducing HashA hash function is a function that maps a set of objects to a set of integers..When using a hash function, this mapping is performed which takes a key of arbitrary length as input and outputs an integer in a specific range.Our reduced dataset still contains 1M samples and ~2M feature values..The purposes of the hashing is to minimize memory consumption by the features.There is an excellent article on hashing tricks by Lucas Bernardi if you want to learn more.Python has a built in function that performs a hash called hash()..For the objects in our data, the hash is not surprising.def convert_obj_to_int(self): object_list_columns = self.columns object_list_dtypes = self.dtypes new_col_suffix = '_int' for index in range(0,len(object_list_columns)): if object_list_dtypes[index] == object : self[object_list_columns[index]+new_col_suffix] = self[object_list_columns[index]].map( lambda x: hash(x)) self.drop([object_list_columns[index]],inplace=True,axis=1) return selftrain = convert_obj_to_int(train)LightGBM ModellightGBM_CTRThe final output after training:Figure 24Xgboost ModelXgboost_CTRIt will train until eval-logloss hasn’t improved in 20 rounds..And the final output:Figure 25Jupyter notebook can be found on Github..Have a great weekend!. More details

Leave a Reply