What is a classification in machine learning?
Classification is a problem where it uses machine learning algorithms that learn how to assign a class if given data. Here, Classes are also called targets/labels or categories. A classification model attempts to draw some conclusions from observed values. Given one or more inputs, a classification model will commit to predict the worth of one or more outcomes.
For example: consider a bank lending problem, where you have some sort of features and have to predict that our customers are fraud or non-fraud using historical data. Here, This is a binary classification because there are only two classes as fraud or non-fraud. The classifier uses the training data to learn how the given input variables relate to the “class.” In this case, the Known fraud and non-fraud customers have to be used as the training data. When the classifier is trained accurately, it can be used to detect an unknown customer.
Classification comes under supervised learning, where the target variable is also provided with the input data. There are many applications of classifiers in many domains such as medical spam detection, fraud detection, medical, marketing, etc.
There are many classification algorithms, but it is not possible to say which one is better than the other. It depends on the application, quantity of the dataset, and type of the dataset. For example, if the dataset is small, the Support Vector Machine(SVM) works well.
A Decision tree is one of the supervised learning algorithms which is principally employed in classification problems. A decision tree works for both categorical and continuous output variables. In this technique, we tend to split the population or sample into two or more homogeneous sets based on the most significant splitter/differentiator in input variables. The decision tree builds classification or regression models in the form of a tree structure. The tree is constructed in a top to down recursive divide and conquer manner. All the attributes should be categorical.
Decision trees use multiple algorithms to split a node into two or more subnodes. The creation of subnodes will increase the homogeneity of resultant sub-nodes. In other words, we would say that purity of the nodes will increase concerning the target variable. The decision tree splits the nodes on all obtainable variables so selects the split which ends in the most homogenous sub-nodes. The Gini index says, if we tend to choose two things from a population randomly then they need to be of the same category, and the probability for this is 1 if the population is pure.
A decision tree might be easily overfitted by generating too many branches and may be due to outliers. An overfitted model will lead to very poor performance on the new data even though it gives better performance on the training data. This would be avoided by pre-pruning which stops tree construction early or post-pruning which removes the branches from the fully grown tree.
Naive Bayes classifier is one of the most popular methods grouped by similarities, that works on the Bayes Theorem of probability. We can also say that Naive Bayes is a simple classification of words based on the Bayes Probability Theorem for subjective analysis of content.
Bayes theorem gives a way of calculating the following probability P(A/B) from P(A), P(B), and P(B/A). Look at the equation below.
The classification was conducted by deriving the maximum posterior which is maximum P(B/A) with the above assumption applying to Bayes theorem. This assumption mainly reduces the computational cost by only counting the class distribution. Even though the assumption is not valid in most cases since the attributes are dependent, surprisingly Naive Bayes has been able to perform well.
Naive Bayes is a straight forward algorithm to implement, and it obtained good results in most cases. Since it requires linear time It would be effectively versatile to huge datasets, rather than by costly iterative estimate as utilized for some different kinds of classifiers.
Naive Bayes has a drawback where it is called a zero probability problem. When the conditional probability is zero for a selected attribute, it fails to provide a valid prediction.
Artificial Neural Network:
An artificial neural network is a batch of connected input/output units where each connection has a weight-related to it started by psychologists and neurobiologists to develop and check computational analogs of neurons. Through the training part, the networks learn by adjusting the weights to be able to predict the right category label of the input tuples.
Currently, there are several network architectures available like Feed-forward, Convolutional, Recurrent, etc. Suitable architecture depends on the application of the model. For most of the cases, feed-forward models provide moderately accurate results, and particularly for image process applications, convolutional networks perform better.
There are often multiple hidden layers in the model depending on the complexity of the function that goes to be mapped by the model. Having more hidden layers will enable models of complicated relationships like deep neural networks.
However, when there are many hidden layers, it consumes much time to train and adjust weights. The other disadvantage is the poor interpretability of the model compared to other models like Decision Trees because of the unknown symbolism that means behind the learned weights.
But Artificial Neural Networks have performed imposingly in most of the real-world applications. It is high tolerance to noisy data and ready to classify untrained patterns. Usually, artificial Neural networks perform well with continuous-values inputs and outputs.
K-Nearest Neighbor (KNN):
K-Nearest Neighbor is a supervised learning algorithm that stores all instances corresponding to training data points in an n-dimensional space. While an unknown discrete data is received, it analyzes the closest k number of instances saved (nearest neighbors) and returns the foremost common category as the prediction and it returns the mean of k-nearest neighbors for real-valued data.
Evaluating a Classifier:
After building a classifier model, you need to evaluate the accuracy of the model and how it is behaving, which means how good the model is predicting the outcome of new observations test data that have not been used to train the model.
In alternative words, you would need to estimate the model prediction accuracy and prediction errors using a new test dataset. Because, we know the actual outcome of observations in the test set, the accuracy of the predictive model could be assessed by comparing the predicted outcome values against the known outcome values.
The average classifications will represent the proportion of correctly classified observations. The best way of evaluating the classifier is by confusion matrix terminology.
Read more on “A confusion matrix fundamentals” .