Modeling Telecom Customer Churn with Variational Autoencoder

Photo credit: PixabayModeling Telecom Customer Churn with Variational AutoencoderHow to apply deep convolutional neural networks and auto-encoders for building a churn prediction model.

Susan LiBlockedUnblockFollowFollowingFeb 19An autoencoder is deep learning’s answer to dimensionality reduction.

The idea is pretty simple: transform the input through a series of hidden layers but ensure that the final output layer is the same dimension as the input layer.

However, the intervening hidden layers have progressively smaller number of nodes (and hence, reduce the dimension of the input matrix).

If the output matches or encodes the input closely, then the nodes of the smallest hidden layer can be taken as a valid dimension reduced data set.

A variational autoencoder (VAE) resembles a classical autoencoder and is a neural network consisting of an encoder, a decoder and a loss function.

They let us design complex generative models of data, and fit them to large data sets.

After reading an article on using convolutional networks and autoencoders to provide insights into user churn.

I decided to implement VAE to a telecom churn data set that can be downloaded from IBM Sample Data Sets.

It is a bit of overkill to apply VAE to a relative small data set like this, but for the sake of learning VAE, I am going to do it anyway.

The DataEach row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes the following information:Customers who left within the last month — the column is called ChurnServices that each customer has signed up for — phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and moviesCustomer account information — how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total chargesDemographic info about customers — gender, age range, and if they have partners and dependents.

import pandas as pdimport numpy as npfrom sklearn.

preprocessing import StandardScaler,MinMaxScalerimport collections%matplotlib inlineimport matplotlib.

pyplot as pltimport seaborn as snsfrom sklearn import preprocessingfrom sklearn.

metrics import (confusion_matrix, precision_recall_curve, auc, roc_curve, recall_score, classification_report, f1_score, precision_recall_fscore_support)from sklearn.

model_selection import train_test_splitfrom sklearn.

preprocessing import StandardScaler,MinMaxScalerfrom keras.

layers import Input, Dense, Lambdafrom keras.

models import Modelfrom keras.

objectives import binary_crossentropyfrom keras.

callbacks import LearningRateSchedulerfrom keras.

utils.

vis_utils import model_to_dotfrom keras.

callbacks import EarlyStopping, ModelCheckpointimport keras.

backend as Kfrom keras.

callbacks import Callbackimport matplotlibmatplotlib.

rcParams['figure.

figsize'] = (10.

0, 6.

0)df = pd.

read_csv('WA_Fn-UseC_-Telco-Customer-Churn.

csv')df.

info()Figure 1TotalCharges should be converted to numeric.

df['TotalCharges'] = pd.

to_numeric(df['TotalCharges'], errors='coerce')Most features in the data set are categorical.

We are going to visualize them first then create dummy variables.

Visualize and Analyze Categorical FeaturesGendergender_plot = df.

groupby(['gender', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='gender', values=0)gender_plot.

plot(x=gender_plot.

index, kind='bar', stacked=True);print('Gender', collections.

Counter(df['gender']))Figure 2Gender does not seem to have an effect on the churn.

Partnerpartner_plot = df.

groupby(['Partner', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='Partner', values=0)partner_plot.

plot(x=partner_plot.

index, kind='bar', stacked=True);print('Partner', collections.

Counter(df['Partner']))Figure 3Whether the customer has partner or not does seem to have some effect on the churn.

Dependentsdependents_plot = df.

groupby(['Dependents', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='Dependents', values=0)dependents_plot.

plot(x=dependents_plot.

index, kind='bar', stacked=True);print('Dependents', collections.

Counter(df['Dependents']))Figure 4Customers who have no dependents are more likely to churn than customers who have dependents.

PhoneServicephoneservice_plot = df.

groupby(['PhoneService', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='PhoneService', values=0)phoneservice_plot.

plot(x=phoneservice_plot.

index, kind='bar', stacked=True);print('PhoneService', collections.

Counter(df['PhoneService']))Figure 5There are not many customers did not sign up for phone service, whether customer have phone service or not does not seem to have an effect on the churn.

MultipleLinesmultiplelines_plot = df.

groupby(['MultipleLines', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='MultipleLines', values=0)multiplelines_plot.

plot(x=multiplelines_plot.

index, kind='bar', stacked=True);print('MultipleLines', collections.

Counter(df['MultipleLines']))Figure 6Whether customer signed up for MultipleLines or not does not seem to have an effect on the churn.

InternetServiceinternetservice_plot = df.

groupby(['InternetService', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='InternetService', values=0)internetservice_plot.

plot(x=internetservice_plot.

index, kind='bar', stacked=True);print('InternetService', collections.

Counter(df['InternetService']))Figure 7It seems customers who signed up for Fiber optic are most likely to churn, almost 50% of them churned.

OnlineSecurityonlinesecurity_plot = df.

groupby(['OnlineSecurity', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='OnlineSecurity', values=0)onlinesecurity_plot.

plot(x=onlinesecurity_plot.

index, kind='bar', stacked=True);print('OnlineSecurity', collections.

Counter(df['OnlineSecurity']))Figure 8Customers who did not sign up for OnlineSecurity are most likely to churn.

OnlineBackuponlinebackup_plot = df.

groupby(['OnlineBackup', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='OnlineBackup', values=0)onlinebackup_plot.

plot(x=onlinebackup_plot.

index, kind='bar', stacked=True);print('OnlineBackup', collections.

Counter(df['OnlineBackup']))Figure 9Customers who did not sign up for OnlineBackUp are most likely to churn.

DeviceProtectiondeviceprotection_plot = df.

groupby(['DeviceProtection', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='DeviceProtection', values=0)deviceprotection_plot.

plot(x=deviceprotection_plot.

index, kind='bar', stacked=True);print('DeviceProtection', collections.

Counter(df['DeviceProtection']))Figure 10Customers who did not sign up for DeviceProtection are most likely to churn.

TechSupporttechsupport_plot = df.

groupby(['TechSupport', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='TechSupport', values=0)techsupport_plot.

plot(x=techsupport_plot.

index, kind='bar', stacked=True);print('TechSupport', collections.

Counter(df['TechSupport']))Figure 11Customers who did not sign up for TechSupport are most likely to churn.

StreamingTVstreamingtv_plot = df.

groupby(['StreamingTV', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='StreamingTV', values=0)streamingtv_plot.

plot(x=streamingtv_plot.

index, kind='bar', stacked=True);print('StreamingTV', collections.

Counter(df['StreamingTV']))Figure 12StreamingMoviesstreamingmovies_plot = df.

groupby(['StreamingMovies', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='StreamingMovies', values=0)streamingmovies_plot.

plot(x=streamingmovies_plot.

index, kind='bar', stacked=True);print('StreamingMovies', collections.

Counter(df['StreamingMovies']))Figure 13From above seven plots, we can see that customers without internet service have a very low churn rate.

Contractcontract_plot = df.

groupby(['Contract', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='Contract', values=0)contract_plot.

plot(x=contract_plot.

index, kind='bar', stacked=True);print('Contract', collections.

Counter(df['Contract']))Figure 14It is obvious that contract term does have an effect on churn.

There were very few churns when customers have a two-year contract.

And most churns occurred on customers with a month-to-month contract.

PaperlessBillingpaperlessbilling_plot = df.

groupby(['PaperlessBilling', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='PaperlessBilling', values=0)paperlessbilling_plot.

plot(x=paperlessbilling_plot.

index, kind='bar', stacked=True);print('PaperlessBilling', collections.

Counter(df['PaperlessBilling']))Figure 15PaymentMethodpaymentmethod_plot = df.

groupby(['PaymentMethod', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='PaymentMethod', values=0)paymentmethod_plot.

plot(x=paymentmethod_plot.

index, kind='bar', stacked=True);print('PaymentMethod', collections.

Counter(df['PaymentMethod']))Figure 16PaymentMethod does seem to have an effect on churn, in particular, pay by electronic check has the highest percentage churning rate.

SeniorCitizenseniorcitizen_plot = df.

groupby(['SeniorCitizen', 'Churn']).

size().

reset_index().

pivot(columns='Churn', index='SeniorCitizen', values=0)seniorcitizen_plot.

plot(x=seniorcitizen_plot.

index, kind='bar', stacked=True);print('SeniorCitizen', collections.

Counter(df['SeniorCitizen']))Figure 17We do not have many senior citizens in the data.

It seems whether customers are seniors citizens or not does not have an effect on the churning rate.

Explore Numeric FeaturesTenuresns.

kdeplot(df['tenure'].

loc[df['Churn'] == 'No'], label='not churn', shade=True);sns.

kdeplot(df['tenure'].

loc[df['Churn'] == 'Yes'], label='churn', shade=True);Figure 18df['tenure'].

loc[df['Churn'] == 'No'].

describe()Figure 19df['tenure'].

loc[df['Churn'] == 'Yes'].

describe()Figure 20Not churned customers have a much longer average tenure (20 months) than the churned customers.

Monthly Chargessns.

kdeplot(df['MonthlyCharges'].

loc[df['Churn'] == 'No'], label='not churn', shade=True);sns.

kdeplot(df['MonthlyCharges'].

loc[df['Churn'] == 'Yes'], label='churn', shade=True);Figure 21df['MonthlyCharges'].

loc[df['Churn'] == 'No'].

describe()Figure 22df['MonthlyCharges'].

loc[df['Churn'] == 'Yes'].

describe()Figure 23Churned customers paid over 20% higher on average monthly fee than not-churned customers.

TotalChargessns.

kdeplot(df['TotalCharges'].

loc[df['Churn'] == 'No'], label='not churn', shade=True);sns.

kdeplot(df['TotalCharges'].

loc[df['Churn'] == 'Yes'], label='churn', shade=True);Figure 24Data Pre-processingEncode labels with value between 0 and 1.

le = preprocessing.

LabelEncoder()df['Churn'] = le.

fit_transform(df.

Churn.

values)Fill nan with the mean of the column.

df = df.

fillna(df.

mean())Encode categorical features.

categorical = ['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod']for f in categorical: dummies = pd.

get_dummies(df[f], prefix = f, prefix_sep = '_') df = pd.

concat([df, dummies], axis = 1)# drop original categorical featuresdf.

drop(categorical, axis = 1, inplace = True)Split the data into train, validation and test sets, and create batch to send through our network.

autoencoder_preprocessing.

pyVAE Implementation in KerasThe following code scrips were largely from Agustinus Kristiadi’s blog post: Variational Autoencoder: Intuition and Implementation.

Define input layer.

Define encoder layer.

Encoder model, to encode input into latent variable.

We use the mean as the output as it is the center point, the representative of the Gaussian.

We sample from the output of the 2 dense layers.

Define decoder layer in VAE model.

Define overall VAE model, for reconstruction and training.

Define generator model, generate new data given latent variable z.

Translate our loss into Keras code.

Start training.

VAE.

pyThe model stopped training after 55 epochs with a batch size of 100 samples.

Evaluationplt.

plot(vae_history.

history['loss'])plt.

plot(vae_history.

history['val_loss'])plt.

title('model loss')plt.

ylabel('loss')plt.

xlabel('epoch')plt.

legend(['train', 'test'], loc='upper left')plt.

show();Figure 25From the above loss plot, we can see that the model has comparable performance on both train and validation data sets, and it seems to converge nicely at the end.

We use reconstruction error to measure how well the decoder is performing.

Autoencoders are trained to reduce reconstruction error which we show below:x_train_encoded = encoder.

predict(X_train)pred_train = decoder.

predict(x_train_encoded)mse = np.

mean(np.

power(X_train – pred_train, 2), axis=1)error_df = pd.

DataFrame({'recon_error': mse, 'churn': y_train})plt.

figure(figsize=(10,6))sns.

kdeplot(error_df.

recon_error[error_df.

churn==0], label='not churn', shade=True, clip=(0,10))sns.

kdeplot(error_df.

recon_error[error_df.

churn==1], label='churn', shade=True, clip=(0,10))plt.

xlabel('reconstruction error');plt.

title('Reconstruction error – Train set');Figure 26x_val_encoded = encoder.

predict(X_val)pred = decoder.

predict(x_val_encoded)mseV = np.

mean(np.

power(X_val – pred, 2), axis=1)error_df = pd.

DataFrame({'recon_error': mseV, 'churn': y_val})plt.

figure(figsize=(10,6))sns.

kdeplot(error_df.

recon_error[error_df.

churn==0], label='not churn', shade=True, clip=(0,10))sns.

kdeplot(error_df.

recon_error[error_df.

churn==1], label='churn', shade=True, clip=(0,10))plt.

xlabel('reconstruction error');plt.

title('Reconstruction error – Validation set');Figure 27Latent Space VisualizationWe can cluster customers in the 2D latent space and visualize churned and not-churned customers, they can be separable at latent space and reveal the formation of distinct clusters.

x_train_encoded = encoder.

predict(X_train)plt.

scatter(x_train_encoded[:, 0], x_train_encoded[:, 1], c=y_train, alpha=0.

6)plt.

title('Train set in latent space')plt.

show();Figure 28x_val_encoded = encoder.

predict(X_val)plt.

scatter(x_val_encoded[:, 0], x_val_encoded[:, 1], c=y_val, alpha=0.

6)plt.

title('Validation set in latent space')plt.

show();Figure 29Prediction on the validation setx_val_encoded = encoder.

predict(X_val)fpr, tpr, thresholds = roc_curve(y_val, clf.

predict(x_val_encoded))roc_auc = auc(fpr, tpr)plt.

title('Receiver Operating Characteristic')plt.

plot(fpr, tpr, label='AUC = %0.

4f'% roc_auc)plt.

legend(loc='lower right')plt.

plot([0,1],[0,1],'r–')plt.

xlim([-0.

001, 1])plt.

ylim([0, 1.

001])plt.

ylabel('True Positive Rate')plt.

xlabel('False Positive Rate')plt.

show();Figure 29print('Accuracy:')print(accuracy_score(y_val, clf.

predict(x_val_encoded)))print("Confusion Matrix:")print(confusion_matrix(y_val,clf.

predict(x_val_encoded)))print("Classification Report:")print(classification_report(y_val,clf.

predict(x_val_encoded)))Figure 30Prediction on the test setx_test_encoded = encoder.

predict(X_test)fpr, tpr, thresholds = roc_curve(y_test, clf.

predict(x_test_encoded))roc_auc = auc(fpr, tpr)plt.

title('Receiver Operating Characteristic')plt.

plot(fpr, tpr, label='AUC = %0.

4f'% roc_auc)plt.

legend(loc='lower right')plt.

plot([0,1],[0,1],'r–')plt.

xlim([-0.

001, 1])plt.

ylim([0, 1.

001])plt.

ylabel('True Positive Rate')plt.

xlabel('False Positive Rate')plt.

show();Figure 31print('Accuracy:')print(accuracy_score(y_test, clf.

predict(x_test_encoded)))print("Confusion Matrix:")print(confusion_matrix(y_test,clf.

predict(x_test_encoded)))print("Classification Report:")print(classification_report(y_test,clf.

predict(x_test_encoded)))Figure 32That was it!.Jupyter notebook can be found on Github.

Happy Monday!References:Variational autoencoders.

In my introductory post on autoencoders, I discussed various models (undercomplete, sparse, denoising, contractive)…www.

jeremyjordan.

meTutorial – What is a variational autoencoder?.- Jaan AltosaarUnderstanding Variational Autoencoders (VAEs) from two perspectives: deep learning and graphical models.

jaan.

ioVariational Autoencoders ExplainedIn my previous post about generative adversarial networks, I went over a simple method to training a network that could…kvfrans.

comCredit Card Fraud Detection using Autoencoders in Keras — TensorFlow for Hackers (Part VII)How Anomaly Detection in credit card transactions works?medium.

comBasic Autoencoder- Anomaly Detection Using Reconstruction Error | Deeplearning4jDownload this notebook Please view the README to learn about installing, setting up dependencies, and importing…deeplearning4j.

orgnaomifridman/Deep-VAE-prediction-of-churn-customerVariational deep autoencoder to predict churn customer – naomifridman/Deep-VAE-prediction-of-churn-customergithub.

co.. More details

Leave a Reply