Using Artificial Neural Networks for Regression in Python

Regression using Artificial Neural Networks

Artificial Neural Networks(ANN) can be used for a wide variety of tasks, from face recognition to self-driving cars to chatbots! To understand more about ANN in-depth please read this post.

ANN can be used for supervised ML regression problems as well.

In this post, I am going to show you how to implement a Deep Learning ANN for a Regression use case.

I am using the pre-processed data from a previous case study on predicting old car prices. You can check the data cleansing and feature selection steps there.

Data description

You can download the data pickle file required for this case study here.

The business meaning of each column in the data is as below

  • Price: The Price of the car in dollars
  • Age: The age of the car in months
  • KM: How many KMS did the car was used
  • HP: Horsepower of the car
  • MetColor: Whether the car has a metallic color or not
  • CC: The engine size of the car
  • Doors: The number of doors in the car
  • Weight: The weight of the car

Create an ML model which can predict the apt price of a second-hand car.

Defining the problem statement:

  • Target Variable: Price
  • Predictors: Age, KM, CC, etc.

Loading the data for regression

I am loading the preprocessed data ‘CarPricesData.pkl’. This data is the final list of features selected for ML.

Car prices data for ANN in Python

Splitting the Data into Training and Testing

We don’t use the full data for creating the model. Some data is randomly selected and kept aside for checking how good the model is. This is known as Testing Data and the remaining data is called Training data on which the model is built. Typically 70% of data is used as Training data and the rest 30% is used as Testing data.

In this same step, we are standardizing the data as well. This is important for Neural Networks because it improves the model training speed and helps to find global minima.

Data sampling step for regression in ML

Installing the required libraries

To implement deep learning ANNs, two libraries are required, ‘tensorflow‘ and ‘keras‘.

Creating Deep Learning- Artificial Neural Networks(ANN) model

The architecture of a Deep Learning ANN used in this case study is shown below

I am using two hidden layers with five neurons each and one output layer with one neuron. Can you change these numbers? Yes, you can change the number of hidden layers and the number of neurons in each layer.

Finally, choose the combination that produces the best possible accuracy. This is the process of tuning the ANN model.

ANN Architecture used in this case study
ANN Architecture used in this case study

In the below code snippet, the “Sequential” module from the Keras library is used to create a sequence of ANN layers stacked one after the other. Each layer is defined using the “Dense” module of Keras where we specify how many neurons would be there, which technique would be used to initialize the weights in the network. what will be the activation function for each neuron in that layer etc

Lets quickly understand the hyperparameters in below code snippets

  • units=5: This means we are creating a layer with five neurons in it. Each of these five neurons will be receiving the values of inputs, for example, the values of ‘Age’ will be passed to all five neurons, similarly all other columns.
  • input_dim=7: This means there are seven predictors in the input data which is expected by the first layer. If you see the second dense layer, we don’t specify this value, because the Sequential model passes this information further to the next layers.
  • kernel_initializer=’normal’: When the Neurons start their computation, some algorithm has to decide the value for each weight. This parameter specifies that. You can choose different values for it like ‘normal’ or ‘glorot_uniform’.
  • activation=’relu’: This specifies the activation function for the calculations inside each neuron. You can choose values like ‘relu’, ‘tanh’, ‘sigmoid’, etc.
  • batch_size=20: This specifies how many rows will be passed to the Network in one go after which the SSE calculation will begin and the neural network will start adjusting its weights based on the errors.
    When all the rows are passed in the batches of 20 rows each as specified in this parameter, then we call that 1-epoch. Or one full data cycle. This is also known as mini-batch gradient descent. A small value of batch_size will make the ANN look at the data slowly, like 2 rows at a time or 4 rows at a time which could lead to overfitting, as compared to a large value like 20 or 50 rows at a time, which will make the ANN look at the data fast which could lead to underfitting. Hence a proper value must be chosen using hyperparameter tuning.
  • Epochs=50: The same activity of adjusting weights continues for 50 times, as specified by this parameter. In simple terms, the ANN looks at the full training data 50 times and adjusts its weights.

To understand more about these calculations which happen inside a neuron, refer to this post.

ANN regression model fit log output

Hyperparameter tuning of ANN

Finding the best values for batch_size and epoch is very important as it directly affects the model performance. Bad values can lead to overfitting or underfitting. I am showing two approaches for tuning the parameters of the ANN. Apart from epoch and batch_size, you can also choose to tune the optimal number of neurons, the optimal number of layers, etc.

There is no thumb rule which can help you to decide the number of layers/number of neurons etc. in the first look at data. You need to try different parameters and choose the combination which produces the highest accuracy.

Just keep in mind, that, the bigger the network, the more computationally intensive it is, hence it will take more time to run. So always to find the best accuracy with the minimum number of layers/neurons.

Finding best set of parameters using manual grid search

This is a simple for loop based approach. You can easily edit this and adapt it for more hyperparameters by simply adding another nested for-loop.

Hyperparameter trial results for ANN
Hyperparameter trial results for ANN

Plotting the parameter trial results

Visualizing the results of parameter trials for ANN
Visualizing the results of parameter trials for ANN

This graph shows that the best set of parameters are batch_size=15 and epochs=5. Next step is to train the model with these parameters.

Training the ANN model with the best parameters

Using the best set of parameters found above, training the model again and predicting the prices on testing data.

ANN prediction output

Finding the accuracy of the model

Using the final trained model, now we are generating the prediction error for each row in testing data as the Absolute Percentage Error. Taking the average for all the rows is known as Mean Absolute Percentage Error(MAPE).

The accuracy is calculated as 100-MAPE.

Error computation for the ANN predictions

Why the accuracy comes different every time I train ANN?

Even when you use the same hyperparameters, the result will be slightly different for each run of ANN. This happens because the initial step for ANN is the random initialization of weights. So every time you run the code, there are different values that get assigned to each neuron as weights and bias, hence the final outcome also differs slightly.

Finding best hyperparameters using GridSearchCV.

Apart from the manual search method shown above, you can also use the Grid Search Cross-validation method present in the sklearn library to find the best parameters of ANN.

The below snippet defines some parameter values to try and finds the best combination out of it.

Results of hyperparameter search for ANN using GridSearch CV

Conclusion

This template can be used to fit the Deep Learning ANN regression model on any given dataset.

You can take the pre-processing steps of raw data from any of the case studies here.

Deep ANNs work great when you have a good amount of data available for learning. For small datasets with less than 50K records, I will recommend using the supervised ML models like Random Forests, Adaboosts, XGBoosts, etc.

The simple reason behind this is the high complexity and large computations of ANN. It is not worth it, if you can achieve the same accuracy with a faster and simpler model.

You look at deep learning ANNs only when you have a large amount of data available and the other algorithms are failing or do not fit for the task.

In the next post, I will show how to fit an ANN model for any classification dataset.

Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach got him to start this blog!

6 thoughts on “Using Artificial Neural Networks for Regression in Python”

  1. Phakawat Lamchuan

    Thank you very much for this example. I will try to adapt your example with my data. I am trying to predict salinity in the river to early warning for tap water production in Thailand.

  2. MAPE = np.mean(100 * (np.abs(y_test-model.predict(X_test))/y_test))
    the above formula can lead to negative error, so I recommend to use the following code,
    from sklearn.metrics import mean_absolute_percentage_error
    MAPE = mean_absolute_percentage_error(y_test, model.predict(X_test))

    1. Farukh Hashmi

      Hi Ann,

      You are correct! Hence, whenever we are looking at regression results, we make sure to check median APE as well, as some of the predictions are bound to be bad, hence generating a large error and making the accuracy value negative. However, when a client looks into the dashboard, they tend to appreciate simple percentage differences instead of ML related computations like MAE, RMSE, etc. The MAPE implementation from sklearn does some massaging to the output and will require us to educate the users. That was the rationale behind evaluating the model in the same terms as the clients would see it using simple percentage differences between original and prediction. I hope that helps!

Leave a Reply!

Your email address will not be published. Required fields are marked *