How to use Artificial Neural Networks for classification in python?

How to use ANN for classification in python

In the previous post, I talked about how to use Artificial Neural Networks(ANNs) for regression use cases. In this post, I will show you how to use ANN for classification.

There is a slight difference in the configuration of the output layer as listed below.

  • Regression: One neuron in the output layer
  • Classification(Binary): Two neurons in the output layer
  • Classification(Multi-class): The number of neurons in the output layer is equal to the unique classes, each representing 0/1 output for one class

I am using the famous Titanic survival data set to illustrate the use of ANN for classification.

The pre-processing and feature selection has been done in this previous case study.

Data description

The business meaning of each column in the data is as below

You can download the data required for this case study here.

  • Survived: Whether the passenger survived or not? 1=Survived, 0=Died
  • Pclass: The travel class of the passenger
  • Sex: The gender of the passenger
  • Age: The Age of the passenger
  • SibSp: Number of Siblings/Spouses Aboard
  • Parch: Number of Parents/Children Aboard
  • Fare: The amount of fare paid by the passenger
  • Embarked: Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

Loading the data

titanic survival data for ANN classification


Defining the problem statement:

Create a Predictive model that can tell if a person will survive the titanic crash or not?

  • Target Variable: Survived
  • Predictors: age, sex, passenger class, etc.

  • Survived=0 The passenger died
  • Survived=1 The passenger survived


Splitting the Data into Training and Testing

Sampling step output for ANN


Creating Deep Learning ANN model

Using the sampled data, creating the ANN classification model. Please note that the output layer has one neuron here because this is a binary classification problem. If there were multiple classes then we will have to choose those many neurons, like for 5 classes, the output layer will have 5 neurons, each giving the probability of that class, whichever class has the highest probability, becomes the final answer.

Here I have used two hidden layers, First has 10 neurons, and the second has 6 neurons. The output layer has one neuron. Which give the probability of class “1”.

How many neurons should you choose? How many hidden layers should you choose? This is something which varies from data to data, you need to check the testing accuracy and decide which combination is working best. This is why tuning ANN is a difficult task, because there are so many parameters and configurations which can be changed.

Take a look at some of the important hyper parameters of ANN below

  • units=10: This means we are creating a layer with ten neurons in it. Each of these five neurons will be receiving the values of inputs, for example, the values of ‘Age’ will be passed to all five neurons, similarly all other columns.
  • input_dim=9: This means there are nine predictors in the input data which is expected by the first layer. If you see the second dense layer, we don’t specify this value, because the Sequential model passes this information further to the next layers.
  • kernel_initializer=’uniform’: When the Neurons start their computation, some algorithm has to decide the value for each weight. This parameter specifies that. You can choose different values for it like ‘normal’ or ‘glorot_uniform’.
  • activation=’relu’: This specifies the activation function for the calculations inside each neuron. You can choose values like ‘relu’, ‘tanh’, ‘sigmoid’, etc.
  • optimizer=’adam’: This parameter helps to find the optimum values of each weight in the neural network. ‘adam’ is one of the most useful optimizers, another one is ‘rmsprop’
  • batch_size=10: This specifies how many rows will be passed to the Network in one go after which the SSE calculation will begin and the neural network will start adjusting its weights based on the errors.
    When all the rows are passed in the batches of 10 rows each as specified in this parameter, then we call that 1-epoch. Or one full data cycle. This is also known as mini-batch gradient descent. A small value of batch_size will make the ANN look at the data slowly, like 2 rows at a time or 4 rows at a time which could lead to overfitting, as compared to a large value like 20 or 50 rows at a time, which will make the ANN look at the data fast which could lead to underfitting. Hence a proper value must be chosen using hyperparameter tuning.
  • Epochs=10: The same activity of adjusting weights continues for 10 times, as specified by this parameter. In simple terms, the ANN looks at the full training data 10 times and adjusts its weights.

ANN model training output


Hyperparameter tuning of ANN

As mentioned above, the hyperparameter tuning for ANN is a big task! You can make your own function and iterate thru the values to try or use the GridSearchCV module from sklearn library.

There is no thumb rule which can help you to decide the number of layers/number of neurons etc. in the first look at data. You need to try different parameters and choose the combination which produces the highest accuracy.

Just keep in mind, that, the bigger the network, the more computationally intensive it is, hence it will take more time to run. So always to find the best accuracy with the minimum number of layers/neurons.


Hyperparameter tuning using Manual Grid Search

This method can be changed easily to suit your requirements. You can decide what you need to iterate and add another nested for-loop.

In the below snippet, I have searched for best batch_size and epochs.


Looking at the best hyperparameter

From the results data above, simply sorting the data on accuracy and getting that combination which has the highest accuracy.

Based on the below output, you can notice that the best parameters are batch_size=5 and epoch=100.

Plotting the results of hyperparameter search
Plotting the results of hyperparameter search


Training the model using best hyperparameters

Training the ANN model with best hyperparameters


Why the accuracy comes different every time I train ANN?

Even when you use the same hyperparameters, the result will be slightly different for each run of ANN. This happens because the initial step for ANN is the random initialization of weights. So every time you run the code, there are different values that get assigned to each neuron as weights and bias, hence the final outcome also differs slightly.

Checking model accuracy on Testing Data

Calculating the accuracy of the final trained model above on the testing data.

Measuring the accuracy of ANN on testing data


Finding the best ANN hyperparameters using GridSearchCV.

Apart from the manual search method shown above, you can also use the Grid Search Cross-validation method present in the sklearn library to find the best parameters of ANN.

The below snippet defines some parameter values to try and finds the best combination out of it.

Finding best hyperparamters using GridSearchCV for ANN


Conclusion

This template can be used to fit the Deep Learning ANN classification model on any given dataset.

You can take the pre-processing steps of raw data from any of the case studies here.

Deep ANNs work great when you have a good amount of data available for learning. For small datasets with less than 50K records, I will recommend using the supervised ML models like Random Forests, Adaboosts, XGBoosts, etc.

The simple reason behind this is the high complexity and large computations of ANN. It is not worth it, if you can achieve the same accuracy with a faster and simpler model.

You look at deep learning ANNs only when you have a large amount of data available and the other algorithms are failing or do not fit for the task.

Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach got him to start this blog!

Leave a Reply!

Your email address will not be published. Required fields are marked *