# Andrew Ng’s Machine Learning Course in Python (Anomaly Detection)

Andrew Ng’s Machine Learning Course in Python (Anomaly Detection)Benjamin LauBlockedUnblockFollowFollowingJan 12Machine Learning — Andrew NgThis is the last part of Andrew Ng’s Machine Learning Course python implementation and I am very excited to finally complete the series.

To give you guys some perspective, it took me a month to convert these codes to python and writes an article for each assignment.

If any of you were hesitating to do your own implementation, be it in Python, R or Java, I strongly recommend you to go for it.

Coding these algorithms from scratch not only reinforce the concepts taught, you will also get to practice your data science programming skills in the language you are comfortable with.

With that said, let’s dive into the last programming assignmentIn this part of the assignment, we will implement an anomaly detection algorithm using the Gaussian model to detect anomalous behavior in a 2D dataset first and then a high-dimensional dataset.

pyplot as pltfrom scipy.

mat")X = mat["X"]Xval = mat["Xval"]yval = mat["yval"]Visualizing the dataplt.

scatter(X[:,0],X[:,1],marker="x")plt.

xlim(0,30)plt.

ylim(0,30)plt.

xlabel("Latency (ms)")plt.

ylabel("Throughput (mb/s)")To estimate parameters (mean and variance) for the Gaussian modeldef estimateGaussian(X): """ This function estimates the parameters of a Gaussian distribution using the data in X """ m = X.

shape[0] #compute mean sum_ = np.

sum(X,axis=0) mu = 1/m *sum_ # compute variance var = 1/m * np.

sum((X – mu)**2,axis=0) return mu,varmu, sigma2 = estimateGaussian(X)Multivariate Gaussian Distribution is an optional lecture in the course and the code to compute the probability density is given to us.

However, in order for me to proceed on with the assignment, I need to write the multivariateGaussian function from scratch.

def multivariateGaussian(X, mu, sigma2): """ Computes the probability density function of the multivariate gaussian distribution.

""" k = len(mu) sigma2=np.

diag(sigma2) X = X – mu.

T p = 1/((2*np.

pi)**(k/2)*(np.

linalg.

det(sigma2)**0.

5))* np.

exp(-0.

5* np.

sum(X @ np.

linalg.

pinv(sigma2) * X,axis=1)) return pp = multivariateGaussian(X, mu, sigma2)Some of the interesting functions we had utilized here are from numpy linear algebra class.

The official documentation can be found here.

Once we estimate the Gaussian parameters and obtain the probability density of the data, we can visualize the fit.

plt.

figure(figsize=(8,6))plt.

scatter(X[:,0],X[:,1],marker="x")X1,X2 = np.

meshgrid(np.

linspace(0,35,num=70),np.

linspace(0,35,num=70))p2 = multivariateGaussian(np.

hstack((X1.

flatten()[:,np.

newaxis],X2.

flatten()[:,np.

newaxis])), mu, sigma2)contour_level = 10**np.

array([np.

arange(-20,0,3,dtype=np.

float)]).

Tplt.

contour(X1,X2,p2[:,np.

newaxis].

reshape(X1.

shape),contour_level)plt.

xlim(0,35)plt.

ylim(0,35)plt.

xlabel("Latency (ms)")plt.

ylabel("Throughput (mb/s)")I had not explained the process of creating a contour plot before as most are quite straight forward.

In simpler terms, we first create a meshgrid around the data region and compute the Z-axis.

plt.

contour then create the contour plot using the 3 axes (X, Y, Z).

Now to select a threshold that will flag an example as anomalies.

def selectThreshold(yval, pval): """ Find the best threshold (epsilon) to use for selecting outliers """ best_epi = 0 best_F1 = 0 stepsize = (max(pval) -min(pval))/1000 epi_range = np.

arange(pval.

min(),pval.

max(),stepsize) for epi in epi_range: predictions = (pval<epi)[:,np.

newaxis] tp = np.

sum(predictions[yval==1]==1) fp = np.

sum(predictions[yval==0]==1) fn = np.

sum(predictions[yval==1]==0) # compute precision, recall and F1 prec = tp/(tp+fp) rec = tp/(tp+fn) F1 = (2*prec*rec)/(prec+rec) if F1 > best_F1: best_F1 =F1 best_epi = epi return best_epi, best_F1pval = multivariateGaussian(Xval, mu, sigma2)epsilon, F1 = selectThreshold(yval, pval)print("Best epsilon found using cross-validation:",epsilon)print("Best F1 on Cross Validation Set:",F1)In case you have not noticed, F1 score is used here instead of accuracy as the dataset is highly unbalanced.

Visualizing the optimal thresholdplt.

figure(figsize=(8,6))# plot the dataplt.

scatter(X[:,0],X[:,1],marker="x")# potting of contourX1,X2 = np.

meshgrid(np.

linspace(0,35,num=70),np.

linspace(0,35,num=70))p2 = multivariateGaussian(np.

hstack((X1.

flatten()[:,np.

newaxis],X2.

flatten()[:,np.

newaxis])), mu, sigma2)contour_level = 10**np.

array([np.

arange(-20,0,3,dtype=np.

float)]).

Tplt.

contour(X1,X2,p2[:,np.

newaxis].

reshape(X1.

shape),contour_level)# Circling of anomaliesoutliers = np.

nonzero(p<epsilon)[0]plt.

scatter(X[outliers,0],X[outliers,1],marker ="o",facecolor="none",edgecolor="r",s=70)plt.

xlim(0,35)plt.

ylim(0,35)plt.

xlabel("Latency (ms)")plt.

ylabel("Throughput (mb/s)")As for high dimensional dataset, we just have to follow the exact same steps as beforemat2 = loadmat("ex8data2.

mat")X2 = mat2["X"]Xval2 = mat2["Xval"]yval2 = mat2["yval"]# compute the mean and variancemu2, sigma2_2 = estimateGaussian(X2)# Training setp3 = multivariateGaussian(X2, mu2, sigma2_2)# cross-validation setpval2 = multivariateGaussian(Xval2, mu2, sigma2_2)# Find the best thresholdepsilon2, F1_2 = selectThreshold(yval2, pval2)print("Best epsilon found using cross-validation:",epsilon2)print("Best F1 on Cross Validation Set:",F1_2)print("# Outliers found:",np.

sum(p3<epsilon2))The second part of the assignment involved implementing a collaborative filtering algorithm to build a recommender system for movie ratings.

mat")Y = mat3["Y"] # 1682 X 943 matrix, containing ratings (1-5) of 1682 movies on 943 userR = mat3["R"] # 1682 X 943 matrix, where R(i,j) = 1 if and only if user j give rating to movie iX = mat4["X"] # 1682 X 10 matrix , num_movies X num_features matrix of movie featuresTheta = mat4["Theta"] # 943 X 10 matrix, num_users X num_features matrix of user features# Compute average rating print("Average rating for movie 1 (Toy Story):",np.

sum(Y[0,:]*R[0,:])/np.

sum(R[0,:]),"/5")The print statement will print: Average rating for movie 1 (Toy Story): 3.

8783185840707963 /5plt.

figure(figsize=(8,16))plt.

imshow(Y)plt.

xlabel("Users")plt.

ylabel("Movies")Going into the algorithm proper, we start with computing the cost function and gradientdef cofiCostFunc(params, Y, R, num_users, num_movies, num_features, Lambda): """ Returns the cost and gradient for the collaborative filtering problem """ # Unfold the params X = params[:num_movies*num_features].

reshape(num_movies,num_features) Theta = params[num_movies*num_features:].

reshape(num_users,num_features) predictions = X @ Theta.

T err = (predictions – Y) J = 1/2 * np.

sum((err**2) * R) #compute regularized cost function reg_X = Lambda/2 * np.

sum(Theta**2) reg_Theta = Lambda/2 *np.

sum(X**2) reg_J = J + reg_X + reg_Theta # Compute gradient X_grad = err*R @ Theta Theta_grad = (err*R).

T @ X grad = np.

flatten()) return J, grad, reg_J, reg_gradSimilar to the previous approach, the assignment requires us to compute the cost function, gradient, regularized cost function and then regularized gradient in separate steps.

The code block above will allows you to follow the assignment step by step as long as you use the correct indexing.

To test our cost function,# Reduce the data set size to run fasternum_users, num_movies, num_features = 4,5,3X_test = X[:num_movies,:num_features]Theta_test= Theta[:num_users,:num_features]Y_test = Y[:num_movies,:num_users]R_test = R[:num_movies,:num_users]params = np.

append(X_test.

flatten(),Theta_test.

flatten())# Evaluate cost functionJ, grad = cofiCostFunc(params, Y_test, R_test, num_users, num_movies, num_features, 0)[:2]print("Cost at loaded parameters:",J)J2, grad2 = cofiCostFunc(params, Y_test, R_test, num_users, num_movies, num_features, 1.

5)[2:]print("Cost at loaded parameters (lambda = 1.

5):",J2)Once we get our cost function and gradient going, we can start training our algorithm.

txt","r").

split(".")[:-1]# see movie listnp.

set_printoptions(threshold=np.

nan)movieListYou can enter your own movie preference at this step but I used the exact same ratings as the assignment to keep it consistent.

# Initialize my ratingsmy_ratings = np.

zeros((1682,1))# Create own ratingsmy_ratings[0] = 4 my_ratings[97] = 2my_ratings[6] = 3my_ratings[11]= 5my_ratings[53] = 4my_ratings[63]= 5my_ratings[65]= 3my_ratings[68] = 5my_ratings[82]= 4my_ratings[225] = 5my_ratings[354]= 5print("New user ratings:.")for i in range(len(my_ratings)): if my_ratings[i]>0: print("Rated",int(my_ratings[i]),"for index",movieList[i])To prepare our data before inputting into the algorithm, we need to normalize the ratings, set some random initial parameters, and use an optimizing algorithm to update the parameters.

def normalizeRatings(Y, R): """ normalized Y so that each movie has a rating of 0 on average, and returns the mean rating in Ymean.

""" m,n = Y.

shape[0], Y.

shape[1] Ymean = np.

zeros((m,1)) Ynorm = np.

zeros((m,n)) for i in range(m): Ymean[i] = np.

sum(Y[i,:])/np.

count_nonzero(R[i,:]) Ynorm[i,R[i,:]==1] = Y[i,R[i,:]==1] – Ymean[i] return Ynorm, Ymeandef gradientDescent(initial_parameters,Y,R,num_users,num_movies,num_features,alpha,num_iters,Lambda): """ Optimize X and Theta """ # unfold the parameters X = initial_parameters[:num_movies*num_features].

reshape(num_movies,num_features) Theta = initial_parameters[num_movies*num_features:].

reshape(num_users,num_features) J_history =[] for i in range(num_iters): params = np.

append(X.

flatten(),Theta.

reshape(num_users,num_features) X = X – (alpha * X_grad) Theta = Theta – (alpha * Theta_grad) J_history.

append(cost) paramsFinal = np.

append(X.

flatten(),Theta.

flatten()) return paramsFinal , J_historyOnce again, I chose batch gradient descent as my optimizing algorithm.

One thing going through all the programming assignment in python taught me is that you would rarely go wrong with gradient descent.

At this point, the code for gradient descent should be fairly familiar to you.

Y = np.

hstack((my_ratings,Y))R =np.

hstack((my_ratings!=0,R))# Normalize RatingsYnorm, Ymean = normalizeRatings(Y, R)num_users = Y.

shape[1]num_movies = Y.

shape[0]num_features = 10# Set initial Parameters (Theta,X)X = np.

random.

randn(num_movies, num_features)Theta = np.

random.

randn(num_users, num_features)initial_parameters = np.

append(X.

flatten(),Theta.

001,400,Lambda)Plotting of cost function to ensure gradient descent is workingplt.

plot(J_history)plt.

xlabel("Iteration")plt.

ylabel("\$J(Theta)\$")plt.

title("Cost function using Gradient Descent")To make predictions on movies that you had not rated# unfold paramatersX = paramsFinal[:num_movies*num_features].

reshape(num_movies,num_features)Theta = paramsFinal[num_movies*num_features:].

reshape(num_users,num_features)# Predict ratingp = X @ Theta.

Tmy_predictions = p[:,0][:,np.

newaxis] + Ymeanimport pandas as pddf = pd.

DataFrame(np.

hstack((my_predictions,np.

array(movieList)[:,np.

newaxis])))df.

sort_values(by=[0],ascending=False,inplace=True)df.

reset_index(drop=True,inplace=True)print("Top recommendations for you:.")for i in range(10): print("Predicting rating",round(float(df[0][i]),1)," for index",df[1][i])Finally, I had come to an end for Andrew Ng’s Machine Learning Course in Python.

I hope this is as beneficial to you as it is for me writing it and I thank all of you for the supports.

The Jupyter notebook will be uploaded to my GitHub at (https://github.

com/Benlau93/Machine-Learning-by-Andrew-Ng-in-Python).

For other python implementation in the series,Linear RegressionLogistic RegressionRegularized Logistic RegressionNeural NetworksSupport Vector MachinesUnsupervised LearningThank you for reading.

.