Using Artificial Neural Networks for Regression in Python

Regression using Artificial Neural Networks

Artificial Neural Networks(ANN) can be used for a wide variety of tasks, from face recognition to self-driving cars to chatbots! To understand more about ANN in-depth please read this post and watch the below video!

ANN can be used for supervised ML regression problems as well.

In this post, I am going to show you how to implement a Deep Learning ANN for a Regression use case.

I am using the pre-processed data from a previous case study on predicting old car prices. You can check the data cleansing and feature selection steps there.

Data description

You can download the data pickle file required for this case study here.

The business meaning of each column in the data is as below

  • Price: The Price of the car in dollars
  • Age: The age of the car in months
  • KM: How many KMS did the car was used
  • HP: Horsepower of the car
  • MetColor: Whether the car has a metallic color or not
  • CC: The engine size of the car
  • Doors: The number of doors in the car
  • Weight: The weight of the car

Create an ML model which can predict the apt price of a second-hand car.

Defining the problem statement:

  • Target Variable: Price
  • Predictors: Age, KM, CC, etc.

Loading the data for regression

I am loading the preprocessed data ‘CarPricesData.pkl’. This data is the final list of features selected for ML.

Car prices data for ANN in Python

Splitting the Data into Training and Testing

We don’t use the full data for creating the model. Some data is randomly selected and kept aside for checking how good the model is. This is known as Testing Data and the remaining data is called Training data on which the model is built. Typically 70% of data is used as Training data and the rest 30% is used as Testing data.

In this same step, we are standardizing the data as well. This is important for Neural Networks because it improves the model training speed and helps to find global minima.

Data sampling step for regression in ML

Installing the required libraries

To implement deep learning ANNs, two libraries are required, ‘tensorflow‘ and ‘keras‘.

Creating Deep Learning- Artificial Neural Networks(ANN) model

The architecture of a Deep Learning ANN used in this case study is shown below

I am using two hidden layers with five neurons each and one output layer with one neuron. Can you change these numbers? Yes, you can change the number of hidden layers and the number of neurons in each layer.

Finally, choose the combination that produces the best possible accuracy. This is the process of tuning the ANN model.

ANN Architecture used in this case study
ANN Architecture used in this case study

In the below code snippet, the “Sequential” module from the Keras library is used to create a sequence of ANN layers stacked one after the other. Each layer is defined using the “Dense” module of Keras where we specify how many neurons would be there, which technique would be used to initialize the weights in the network. what will be the activation function for each neuron in that layer etc

Lets quickly understand the hyperparameters in below code snippets

  • units=5: This means we are creating a layer with five neurons in it. Each of these five neurons will be receiving the values of inputs, for example, the values of ‘Age’ will be passed to all five neurons, similarly all other columns.
  • input_dim=7: This means there are seven predictors in the input data which is expected by the first layer. If you see the second dense layer, we don’t specify this value, because the Sequential model passes this information further to the next layers.
  • kernel_initializer=’normal’: When the Neurons start their computation, some algorithm has to decide the value for each weight. This parameter specifies that. You can choose different values for it like ‘normal’ or ‘glorot_uniform’.
  • activation=’relu’: This specifies the activation function for the calculations inside each neuron. You can choose values like ‘relu’, ‘tanh’, ‘sigmoid’, etc.
  • batch_size=20: This specifies how many rows will be passed to the Network in one go after which the SSE calculation will begin and the neural network will start adjusting its weights based on the errors.
    When all the rows are passed in the batches of 20 rows each as specified in this parameter, then we call that 1-epoch. Or one full data cycle. This is also known as mini-batch gradient descent. A small value of batch_size will make the ANN look at the data slowly, like 2 rows at a time or 4 rows at a time which could lead to overfitting, as compared to a large value like 20 or 50 rows at a time, which will make the ANN look at the data fast which could lead to underfitting. Hence a proper value must be chosen using hyperparameter tuning.
  • Epochs=50: The same activity of adjusting weights continues for 50 times, as specified by this parameter. In simple terms, the ANN looks at the full training data 50 times and adjusts its weights.

To understand more about these calculations which happen inside a neuron, refer to this post.

ANN regression model fit log output

Hyperparameter tuning of ANN

Finding the best values for batch_size and epoch is very important as it directly affects the model performance. Bad values can lead to overfitting or underfitting. I am showing two approaches for tuning the parameters of the ANN. Apart from epoch and batch_size, you can also choose to tune the optimal number of neurons, the optimal number of layers, etc.

There is no thumb rule which can help you to decide the number of layers/number of neurons etc. in the first look at data. You need to try different parameters and choose the combination which produces the highest accuracy.

Just keep in mind, that, the bigger the network, the more computationally intensive it is, hence it will take more time to run. So always to find the best accuracy with the minimum number of layers/neurons.

Finding best set of parameters using manual grid search

This is a simple for loop based approach. You can easily edit this and adapt it for more hyperparameters by simply adding another nested for-loop.

Hyperparameter trial results for ANN
Hyperparameter trial results for ANN

Plotting the parameter trial results

Visualizing the results of parameter trials for ANN
Visualizing the results of parameter trials for ANN

This graph shows that the best set of parameters are batch_size=15 and epochs=5. Next step is to train the model with these parameters.

Training the ANN model with the best parameters

Using the best set of parameters found above, training the model again and predicting the prices on testing data.

ANN prediction output

Finding the accuracy of the model

Using the final trained model, now we are generating the prediction error for each row in testing data as the Absolute Percentage Error. Taking the average for all the rows is known as Mean Absolute Percentage Error(MAPE).

The accuracy is calculated as 100-MAPE.

Error computation for the ANN predictions

Why the accuracy comes different every time I train ANN?

Even when you use the same hyperparameters, the result will be slightly different for each run of ANN. This happens because the initial step for ANN is the random initialization of weights. So every time you run the code, there are different values that get assigned to each neuron as weights and bias, hence the final outcome also differs slightly.

Finding best hyperparameters using GridSearchCV.

Apart from the manual search method shown above, you can also use the Grid Search Cross-validation method present in the sklearn library to find the best parameters of ANN.

The below snippet defines some parameter values to try and finds the best combination out of it.

Results of hyperparameter search for ANN using GridSearch CV

Conclusion

This template can be used to fit the Deep Learning ANN regression model on any given dataset.

You can take the pre-processing steps of raw data from any of the case studies here.

Deep ANNs work great when you have a good amount of data available for learning. For small datasets with less than 50K records, I will recommend using the supervised ML models like Random Forests, Adaboosts, XGBoosts, etc.

The simple reason behind this is the high complexity and large computations of ANN. It is not worth it, if you can achieve the same accuracy with a faster and simpler model.

You look at deep learning ANNs only when you have a large amount of data available and the other algorithms are failing or do not fit for the task.

In the next post, I will show how to fit an ANN model for any classification dataset.

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

49 thoughts on “Using Artificial Neural Networks for Regression in Python”

  1. Phakawat Lamchuan

    Thank you very much for this example. I will try to adapt your example with my data. I am trying to predict salinity in the river to early warning for tap water production in Thailand.

    1. Farukh Hashmi

      Hi Phakawat!
      Very happy to see that this post has helped you in your work!
      Keep it up 🙂

      1. Hi Farukh,
        Thanks very much for sharing your python codes and instructions for this case study how to use ANN for multiple variable regression.
        I’ve tried to follow your instructions and run your py code. I have some issue with declaring the input parameter matrix X as I found this errors:
        ” X=CarPricesDataNumeric[Predictors].values
        Traceback (most recent call last):
        File “”, line 1, in
        File “C:\Program Files\Python311\Lib\site-packages\pandas\core\frame.py”, line 3902, in __getitem__
        indexer = self.columns._get_indexer_strict(key, “columns”)[1]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File “C:\Program Files\Python311\Lib\site-packages\pandas\core\indexes\base.py”, line 6114, in _get_indexer_strict
        self._raise_if_missing(keyarr, indexer, axis_name)
        File “C:\Program Files\Python311\Lib\site-packages\pandas\core\indexes\base.py”, line 6178, in _raise_if_missing
        raise KeyError(f”{not_found} not in index”)
        KeyError: “[‘Age’] not in index”

        Can you have how to fix this bug so that I could go on to test your ANN codes.

        Many thanks and kind regards,
        Nino Ng

        1. Hi Nino,

          Thank you for kind words!
          It looks like Age column is missing from the input data. Can you just check if you have not dropped it in the previous steps

          Regards,
          Farukh Hashmi

  2. MAPE = np.mean(100 * (np.abs(y_test-model.predict(X_test))/y_test))
    the above formula can lead to negative error, so I recommend to use the following code,
    from sklearn.metrics import mean_absolute_percentage_error
    MAPE = mean_absolute_percentage_error(y_test, model.predict(X_test))

    1. Farukh Hashmi

      Hi Ann,

      You are correct! Hence, whenever we are looking at regression results, we make sure to check median APE as well, as some of the predictions are bound to be bad, hence generating a large error and making the accuracy value negative. However, when a client looks into the dashboard, they tend to appreciate simple percentage differences instead of ML related computations like MAE, RMSE, etc. The MAPE implementation from sklearn does some massaging to the output and will require us to educate the users. That was the rationale behind evaluating the model in the same terms as the clients would see it using simple percentage differences between original and prediction. I hope that helps!

      1. Hi, thanks for the article. How would you handle predictors (columns) with categorical data? For example what if a column has countries and another one has sectors? Thanks!

  3. Hi Farukh,
    Thanks for sharing the article. I was trying it and got the following error:
    cannot import name ‘mirrored_strategy_with_two_gpus_no_merge_call’ from ‘tensorflow.python.distribute.strategy_combinations’
    I tried to search for some solutions on the internet but didn’t found any. Can you help with this?
    I am using Python 3.9.7 (64 bit).
    Thanks
    Gill

    1. Hi Gill,

      It is very difficult to debug code like this, if you can share the screenshot with full error, I might be able to help.

  4. Mohamad Darouich

    Hallo FARUKH, thank you for your effort.
    i implemented this Tamplete on a Dataset its name “Housing_Price” to predict House prices , which i downloaded from Kaggle.
    everything is ok but unfortunetly i become a zero accuracy when i train the Model but i dont really know what is the problem?!

    i will be Thankfull if you could answer my question

    1. Hi Mohamad!

      It is difficult to comment about your code error without seeing your code! Can you share the screenshot of the cell and code where you are getting the accuracy as zero, maybe then I can help.

    1. Hi Pradeep!

      You can use the GridSearchCV option. It produces the avg results based on k-Fold CV.

      Regards,
      Farukh Hashmi

    1. Hi Rakesh,

      Thank you for your input!
      The testing data must be on the same level of encoding as of training data. Hence using the same fit object is necessary for standardisation/normalisation.
      If you do separate normalisation of test data, the range will vary and hence the predictions will be incorrect, because the model didn’t get the input at the same level it was trained on.
      Hope that helps!

      Regards,
      Farukh Hashmi

    1. Hi Alice,

      If I am understanding it correct. You are referring to the cross validation step. Since we are using cv=5. The data is sampled 5-times hence, for each iteration there is one accuracy value.
      Hope that helps!

    1. Due to the usage of “.head()”, that prints only the first 5 rows. Similarly, adding “.tail()” fetches the final 5 rows. [.head() and .tail() by default fetch the first 5 and last 5 rows, but to fetch the first or last 5 columns, you need to mention “axis = 1” inside the bracket of head() or tail()]
      (:

  5. HI Sir, Thank you for the guidance. One thing, can we use Hyperopt instead of gridsearch? Do you have any suggestions on that or a guide? Thank you

  6. Dear Farukh Hashmi Thank you very much for your such a great effort. I apricate your efforts for this post and a comprehensive theoretical background in the video.

    JazakaAllah Khair.

  7. Hi Farukh,

    Thank you for the article but somehow I am unable to fit the X and Y to the grid search. After Epoch 1 runs successfully I get this error :
    NotFittedError: All estimators failed to fit

  8. Hi Farukh,
    Thank you so much for your efforts,
    i have one question:
    During the training process the loss = NAN

    Epoch 1/50
    67/67 [==============================] – 0s 1ms/step – loss: nan
    Epoch 2/50
    67/67 [==============================] – 0s 1ms/step – loss: nan”
    Could you help me please?

    1. Hi Mostafa,

      Is this happening for all the runs?
      make sure your target variable does not contain any value that is zero. Because it may generate divide by zero issue or NANs

      Regards,
      Farukh Hashmi

  9. hi Farukh! my data has already been split into training and testing data, what code should i write to reference these datas?

  10. Hi,
    can you please tell me what kind of neural network this is? Is it a feedforward neural network or a CNN? Also can values of accuracy in the epoch calculating section surpass 100% and if so what does it mean for the model?

    1. Hi Mike!

      This is a Feed forward neural network.
      The accuracy should stay within 100%, However when you use custom metric like MAPE, then you may see negative results if there are outliers in the errors.

      Regards,
      Farukh Hashmi

  11. Soumyajyoti Kabi

    Very helpful post. After checking several website, lastly I succeed to fit my data with the given code. Can you help me regarding tuning the number of neurons and layers?

    1. Hi Soumya,
      You start with a smaller number for each and then keep increasing them and note accuracy after every change.
      The idea is to find that sweet spot where you are getting highest accuracy with minimum number of neurons and layers. Because after a point you will see the testing accuracy decrease, that means model has started overfitting.

      Always remember, every neuron you add is increasing one more equation for the overall model!

      Hope this helps!

      Regards,
      Farukh Hashmi

  12. Dear Farukh Hashmi, Thank you very much for your such a great article. I have a follow up question. Once you have the final best model ready, how do we apply it to a new dataset to predict the second hand car price?

  13. Thank you very much for this tutorial.

    but I am confused. I have read in different place that the accuracy is not a good metric for regression. can you please help me to understand while you used accuracy?

    Please also I would like to have an example using may be mean squared error or determination coefficient. also the kind of plot I can do to confirm my result.

    thank you and sorry for too much questions

  14. greeting all I got this code very helpful . can you tell how to get coefficients’ to write an empirical formula. how can I extend this in getting weight and biases.

  15. Dear Farukh Hashmi,
    Thank you for the article and video. It was very helpful.

    May you please advise the reason for getting accuracy above 100?
    2/2 [==============================] – 0s 0s/step
    9 Parameters: batch_size: 15 – epochs: 5 Accuracy: 128.60392062417344
    2/2 [==============================] – 0s 0s/step
    5 Parameters: batch_size: 10 – epochs: 5 Accuracy: 137.93475685949863

  16. Hi! I am getting an error when I run this that says “incompatible shapes: [##] vs. [##,%%]” (the ## is the same number, and the %% is a different number). Do you have any idea why this might be happening or how I can fix it?

  17. Hi Farukh,
    Your work is helping me a lot—many thanks. However, could you please help me on this issue?
    I am trying to move from Matlab to Python, and I was wondering why you only split between
    Train and Test. In Matlab, we are used to splitting between Train, Validate, and Test. The Validation
    split helps to fine-tune the internal hyperparameters as I am sure you know. Why we cannot see this
    in your Python Programs ? and if in the case it wasn’t necessary, can it be done and how ?
    Many Thanks Again.

Leave a Reply!

Your email address will not be published. Required fields are marked *