Simple House Price Predictor using ML through TensorFlow in Python

The result here are input and output sets for both the test and train# Build the training set and the prediction settraining_set = train[FEATURES + FEATURES_CAT]prediction_set = train.SalePrice# Split the train and prediction sets into test train setsx_train, x_test, y_train, y_test = train_test_split(training_set[FEATURES + FEATURES_CAT] , prediction_set, test_size=0.2, random_state=42)y_train = pd.DataFrame(y_train, columns = [LABEL])training_set = pd.DataFrame(x_train, columns = FEATURES + FEATURES_CAT).merge(y_train, left_index = True, right_index = True)y_test = pd.DataFrame(y_test, columns = [LABEL])testing_set = pd.DataFrame(x_test, columns = FEATURES + FEATURES_CAT).merge(y_test, left_index = True, right_index = True)Now we can combine the continuous and categorical features back together and then construct the model framework by calling the DNNRegressor functions and passing in the features, hidden layers, and desired activation function..Here were are using three layers, each with a decreasing number of nodes..The activation function is “relu” but try using “leaky relu” or “tanh” to see if you get better results!training_set[FEATURES_CAT] = training_set[FEATURES_CAT].applymap(str)testing_set[FEATURES_CAT] = testing_set[FEATURES_CAT].applymap(str)def input_fn_new(data_set, training = True): continuous_cols = {k: tf.constant(data_set[k].values) for k in FEATURES} categorical_cols = {k: tf.SparseTensor( indices=[[i, 0] for i in range(data_set[k].size)], values = data_set[k].values, dense_shape = [data_set[k].size, 1]) for k in FEATURES_CAT}# Combines the dictionaries of the categorical and continuous features feature_cols = dict(list(continuous_cols.items()) + list(categorical_cols.items())) if training == True: # Converts the label column into a constant Tensor..label = tf.constant(data_set[LABEL].values)# Outputs the feature columns and labels return feature_cols, label return feature_cols# Builds the Model Frameworkregressor = tf.contrib.learn.DNNRegressor(feature_columns = engineered_features, activation_fn = tf.nn.relu, hidden_units=[250, 100, 50])categorical_cols = {k: tf.SparseTensor(indices=[[i, 0] for i in range(training_set[k].size)], values = training_set[k].values, dense_shape = [training_set[k].size, 1]) for k in FEATURES_CAT}Executing the following function will begin the training progress!.It will take a fe minutes, so get a stretch in!Step 5: Training the Modelregressor.fit(input_fn = lambda: input_fn_new(training_set) , steps=10000)Let’s visualize the results!.This block of code will import our data visualization tool, calculate the predicted values, grab the actual values, and then plot them against each other.Step 6: Evaluating the Model and Visualizing the Resultsimport matplotlib.pyplot as pltimport matplotlibev = regressor.evaluate(input_fn=lambda: input_fn_new(testing_set, training = True), steps=1)loss_score = ev["loss"]print("Final Loss on the testing set: {0:f}".format(loss_score))import matplotlib.pyplot as pltimport matplotlibimport itertoolsev = regressor.evaluate(input_fn=lambda: input_fn_new(testing_set, training = True), steps=1)loss_score = ev["loss"]print("Final Loss on the testing set: {0:f}".format(loss_score))reality = pd.DataFrame(prepro.inverse_transform(testing_set.select_dtypes(exclude=['object'])), columns = [COLUMNS]).SalePricey = regressor.predict(input_fn=lambda: input_fn_new(testing_set))predictions = list(itertools.islice(y, testing_set.shape[0]))predictions = pd.DataFrame(prepro_y.inverse_transform(np.array(predictions).reshape(263,1)))matplotlib.rc('xtick', labelsize=30) matplotlib.rc('ytick', labelsize=30)fig, ax = plt.subplots(figsize=(15, 12))plt.style.use('ggplot')plt.plot(predictions.values, reality.values, 'ro')plt.xlabel('Predictions', fontsize = 30)plt.ylabel('Reality', fontsize = 30)plt.title('Predictions x Reality on dataset Test', fontsize = 30)ax.plot([reality.min(), reality.max()], [reality.min(), reality.max()], 'k–', lw=4)plt.show()Not bad!.To get better results, try changing the activation function, the number of layers, or the size of the layers..Perhaps using another model entirely..This in not a huge dataset, so we are limited by the amount of information we have, but these techniques and principals can be transferred onto larger dataset or more complex problems.Feel free to contact me regarding and questions, comments, concerns, or suggestions.I would also like to give a shoutout to Julien Heiduk who’s model this is a reduction of..Go check out his Kaggle here: https://www.kaggle.com/zoupet. More details

Leave a Reply