Classification (Machine Learning)

Technical design: data processing pipeline in eHealth

Patrick Schneider , Fatos Xhafa , in Anomaly Detection and Complex Event Processing over IoT Data Streams, 2022

AdaBoost ensemble classifier

Ensemble classifiers have played a predominant role in machine learning algorithms. In particular, these classifiers are used to address the problem of class imbalance in various applications [76,34]. The main goal of an ensemble classifier is to reduce the misclassification rate (error rate) of a weak classifier by aggregating multiple classifiers. The basic idea is to obtain predictions of multiple classifiers on the original data and combine the different predictions to make a strong classifier. The main strategies in ensemble classifiers are bootstrap aggregation (bagging) [17] and boosting [87]. The decision tree-based learners are widely used. The Ada-Bost algorithm was developed by Freund [32].

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128238189000237

A computationally intelligent agent for detecting fake news using generative adversarial networks

Srinidhi Hiriyannaiah , ... K.G. Srinivasa , in Hybrid Computational Intelligence, 2020

4.5.2 Process

The classifier is the agent responsible for identifying the data as fake or real. Unlike the discriminator, the classifier is built with a much larger model capacity. This allows the classifier to learn complex functions that results in much higher accuracy. The classifier is based on Google's BERT model [36]. The classifier is trained to distinguish the fake articles generated from the generator and the real articles from the data set. Once the desired accuracy is obtained, the model is used for prediction. The classification model is trained using the GAN technique. Using the technique, we do not need examples for both classes. Instead, using just the positive examples, we train a model that is capable of classifying the input into two classes. For the machine-generated content detection framework, we provide examples of real news articles to the GAN. The GAN then internally trains a fake content generator along with a classifier. The classifier is then trained to identify the real examples from the examples generated by the generator.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128186992000044

The Command-Line Interface

Ian H. Witten , ... Mark A. Hall , in Data Mining (Third Edition), 2011

The weka.classifiers Package

The classifiers package contains implementations of most of the algorithms for classification and numeric prediction described in this book. (Numeric prediction is included in classifiers: It is interpreted as prediction of a continuous class.) The most important class in this package is Classifier, which defines the general structure of any scheme for classification or numeric prediction. Classifier contains three methods: buildClassifier(), classifyInstance(), and distributionForInstance(). In the terminology of object-oriented programming, the learning algorithms are represented by subclasses of Classifier and therefore automatically inherit these three methods. Every scheme redefines them according to how it builds a classifier and how it classifies instances. This gives a uniform interface for building and using classifiers from other Java code. Thus, for example, the same evaluation module can be used to evaluate the performance of any classifier in Weka.

To see an example, click on weka.classifiers.trees and then on DecisionStump, which is a class for building a simple one-level binary decision tree (with an extra branch for missing values). Its documentation page, shown in Figure 14.2, shows the fully qualified name of this class, weka.classifiers.trees.DecisionStump, near the top. You have to use this rather lengthy name whenever you build a decision stump from the command line. The class name is sited in a small tree structure showing the relevant part of the class hierarchy. As you can see, DecisionStump is a subclass of weka.classifiers.Classifier, which is itself a subclass of java.lang.Object. The Object class is the most general one in Java: All classes are automatically subclasses of it.

FIGURE 14.2. DecisionStump, a class of the weka.classifiers.trees package.

After some generic information about the class—brief documentation, its version, and the author—Figure 14.2 gives an index of the constructors and methods of this class. A constructor is a special kind of method that is called whenever an object of that class is created, usually initializing the variables that collectively define its state. The index of methods lists the name of each one, the type of parameters it takes, and a short description of its functionality. Beneath those indexes, the web page gives more details about the constructors and methods. We return to these details later.

As you can see, DecisionStump overwrites the distributionForInstance() method from Classifier; the default implementation of classifyInstance() in Classifier then uses this method to produce its classifications. In addition, it contains the methods getCapabilities(), getRevision(), globalInfo(), toSource(), toString(), and main(). We discuss getCapabilities() shortly. The getRevision() method simply returns the revision number of the classifier. There is a utility class in the weka.core package that prints it to the screen; it is used by Weka maintainers when diagnosing and debugging problems reported by users. The globalInfo() method returns a string describing the classifier, which, along with the scheme's options, is displayed by the More button in the generic object editor (see Figure 11.7(b)). The toString() method returns a textual representation of the classifier, used whenever it is printed on the screen, while the toSource() method is used to obtain a source code representation of the learned classifier. The main() method is called when you ask for a decision stump from the command line—in other words, every time you enter a command beginning with

java weka.classifiers.trees.DecisionStump

The presence of a main() method in a class indicates that it can be run from the command line: All learning methods and filter algorithms implement it.

The getCapabilities() method is called by the generic object editor to provide information about the capabilities of a learning scheme (Figure 11.9(d)). The training data is checked against the learning scheme's capabilities when the buildClassifier() method is called, and an error is raised when the classifier's stated capabilities do not match the data's characteristics. The getCapabilities() method is present in the Classifier class and, by default, enables all capabilities (i.e., imposes no constraints). This makes it easier for new Weka programmers to get started because they need not learn about and specify capabilities initially. Capabilities are covered in more detail in Chapter 16 (page 555).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123748560000146

Research Methodology

Oluwatobi Ayodeji Akanbi , ... Elahe Fazeldehkordi , in A Machine-Learning Approach to Phishing Detection and Defense, 2015

3.3.3 Phase 3a: Evaluation of Classifier Ensemble

Classifier ensemble was proposed to improve the classification performance of a single classifier (Kittler et al., 1998). The classifiers trained and tested in Phase 1 are used in this phase to determine the ensemble design. Also, this phase is divided into two subphases, that is, Phase 3a and Phase 3b.

Simple majority voting is used to ensemble the classifiers in determining detection accuracy. This is an iterative phase in which a threshold (acceptable detection accuracy set) is set and checked with the evaluation results until an optimum result is achieved. Equation (3.5) shows the formula for calculating detection accuracy.

(3.5) D e t e c t i o n A c c u r a c y R e s u l t = ( a × d 1 ) + ( b × d 2 ) + ( c × d 3 )

Where a + b + c = 1 and a, b, and c are variables in the range of [0,1]

Phase 3a is divided into two parts, namely design and decision. In the design part, four algorithms are being considered for ensemble and a committee of three algorithms is used to form an ensemble since majority voting requires an odd number of participants. On the basis of the output of Phase 2, all the individual algorithms will be evaluated with the same metrics used in Phase 2 and then voted on. The decision part of Phase 3a rely on the output of the design part to decide which of the ensemble is the best performed which is then passed to Phase 3b for comparison with the best of the four algorithms evaluated in Phase 2.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128029275000034

Classification and Analysis of Facebook Metrics Dataset Using Supervised Classifiers

Ranjit Panigrahi , Samarjeet Borah , in Social Network Analytics, 2019

2.3 Lazy Classifiers

All the classifiers under this group are termed as Lazy, because as the name suggests generalization beyond the training data is delayed until a query is made to the system. That is, it does not build a classifier until a new instance needs to be classified. Due to this reason, these classifiers are called instance based and consumes more computation time while building the model. The classifiers under consideration of lazy classifiers are Kstar [37], RseslibKnn [38], and locally weighted learning (LWL) [39, 40].

KStar [37] is a K-nearest neighbors classifier with various distance measures, which implements fast-neighbor search in large datasets and has the mode to work as RIONA [41] algorithm. KStar has a significant impact of classification and prediction. Due to its wide application [42–45], KStar becomes a potential candidate classifier for analysis.

LWL is another smart classifier which incorporate an instance-based mechanism to assign instance weights which are then used by a specified weighted instances handler for classification and prediction. Markines et al. [46] proposed a social spam detection method by evaluating many classification mechanisms. In their research work, LWL achieved a high detection rate of 97.68%. A reality mining-based social network analysis [47] was conducted using the LWL classifier along with other associate classifiers, where the accuracy rate of LWL was realized by an amount of 86.67%.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128154588000013

Principles and Methods for Data Science

Kalidas Yeturu , in Handbook of Statistics, 2020

2.6.2 Gradient boosting algorithm

Gradient Boosting (Friedman, 2001) is another highly popular of boosting algorithm, where over iterations error itself is predicted and subtracted from the output of the classifier. The word gradient is used to imply the fact that error is proportional to the negative direction of gradient of a loss function.

Let y = F(x) be the machine learned function that predicts y coordinate for the input x. Let L(F(x), y) be the loss function that computes difference between predicted and actual outputs. Let L(F(x), y) = 1/2 * (yF(x))2 be the squared loss function. Then the following derivations compute new values of F in Eq. (40).

L ( F ( x ) , y ) = 1 / 2 * ( y F ( x ) ) 2 L ( F ( x ) , y ) F ( x ) = 1 / 2 * 2 * ( y F ( x ) ) * 1 L ( F ( x ) , y ) = L ( F ( x ) , y ) F ( x ) = ( F ( x ) y ) F n e w ( x ) = F o l d ( x ) L ( F ( x ) , y ) | F ( x ) = F o l d ( x )

(40) F n e w ( x ) = F o l d ( x ) p r e d i c t e d ( F o l d ( x ) y )

Building sequence of classifiers: The steps in building sequence of classifiers is as follows.

Let F 1(x) be the first classifier built over the data set

Let e 1(x) = F 1(x) − y be the classifier or regressor for the error

Let F 2(x) = F 1(x) − e 1(x) be the updated classifier

Let e 2(x) = F 2(x) − y be the classifier or regressor for the error on the update classifier

Let F 3(x) = F 2(x) − e 2(x) be the updated classifier

and so on

Let F M+1(x) = F M (x) − e M (x)

Then we can expand F M + 1 ( x ) = F 1 ( x ) i = 1 i = M e i ( x )

For any other loss function of the form L(F(x) − y) if the function is not a constant function, then the gradient term, ∇L(F(x) − y) ∝ (F(x) − y). There are other variants of gradient boosting techniques, one of the very popular technique is called xgboost and it combines features of random forest and gradient boosting.

Though boosting algorithms reduce error, they are prone to overfitting. Unlike bagging, boosting algorithms can be ensembles of several weak classifiers. The focus in boosting is error reduction, where as the focus of bagging is variance reduction.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0169716120300225

Cognitive Research: Knowledge Representation and Learning

Vincent J. KovarikJr., in Cognitive Radio Technology (Second Edition), 2009

12.3.2 Classifiers

Classifiers refer to algorithms that analyze two or more units of data or knowledge and identify similarities or patterns in the structure and content of the data. Classifiers are applied to areas (e.g., data mining) and, in the case of declarative knowledge structures, provide a learning mechanism by extending an ontology with new concepts based on the similarity of the new concepts to those already within the system.

Similarities can be discovered between two distinct pieces of declarative knowledge by a number of methods. A classifier can compare the set of properties associated with each of the concepts and, based on the similarities of the two concepts, item B may be categorized as a subtype of item A. This would result in the insertion of item B into the ontology as a subtype or subclass of item A.

The process of quantitatively assessing the similarity between two concepts can be attributed to Tversky in his "Features of Similarity" paper [16]. A computational implementation of a common method for assessing the similarity between two objects is to take the set of properties associated with each of the objects and create an n-dimensional space, with each dimension representing a property, and then assign a number between 0 and 1 to represent the value of each property for each object.

Once this structure has been built for each object, the center-of-gravity point can be calculated as the sum of the distances between each of the property values in the N-dimensional space. Then, the relative similarity between the two concepts is the distance (i.e., difference in magnitude) between the two center-of-gravity points. The shorter the distance between the points, the more similar the concepts are.

In addition to classifiers being applied to declarative knowledge structures in an ontology, they can be applied to identifying similarities and patterns in declarative structures representing temporal relationships. Learning as applied to temporal knowledge enables the radio system to enhance or extend its repertoire of known behavior patterns and then apply those patterns to new situations. For example, the system may observe a particular pattern of interference on a regular set of frequencies at a particular time of day. Or, a system may observe that a particular pattern of activities and events precedes the initiation of a communications activity that is of interest from a signal-analysis perspective. Repeated observation of the temporal patterns reinforces the validity of the set of observed temporal events. The stored temporal pattern can then be applied by matching observed events against the collection of known temporal patterns to predict a specific activity and initiate an appropriate action or countermeasure based on the temporal prediction.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123745354000126

An integration of handcrafted features for violent event detection in videos

B.H. Lohithashva , ... D.S. Guru , in Recent Trends in Computational Intelligence Enabled Research, 2021

17.2.5 Classifier

SVM classifier is a supervised classification algorithm, which is based on the principle of structural risk minimization (SRM). Vladimir Vapnik (Cortes & Vapnik, 1995) introduced the SVM classifier, which initially was used for binary classification problem. Later, many researchers used multiclassification SVM for their applications. If the feature vectors are not separate linearly, nonparametric functions are used. It attempts to maximize the distance of the separating boundary between the violent and nonviolent events by maximizing the distance of the separated plane from each of the feature vectors. In this work, the coarse Gaussian kernel function is used to discriminate the features.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128228449000396

Classification in theory and practice

In Classification in Theory and Practice (Second Edition), 2014

Determination of subject: 'what is this about?'

No classifier has unlimited amounts of time for subject analysis and classification of materials. In the workplace classifiers have to develop skills that will enable them to accurately and quickly determine the subject of a new acquisition which will ensure it is shelved in the correct place among existing items. Experienced classifiers no longer have to think about the complex analytical process they are engaging in. For inexperienced classifiers there are some simple procedures to follow.

The first place to look when classifying new material is the title of the work. In some cases this will tell the classifier everything they need to know. In a library using DDC, a work entitled An Introduction to Psychology will almost certainly be classified at 150. But even in apparently straightforward cases like this it is worthwhile to check the contents' page. A scan of the contents may reveal that the text focuses upon educational psychology, in which case it would be classed at 370.15.

In many cases the title will not include information about the subject of the work. The title of Stephen Jay Gould's Eight Little Piggies 14 does not immediately suggest a work on evolution. Other title information as given in a subtitle may help to determine the subject, after which chapter and section headings as listed on the contents' page should be scanned. If the nature of the work is still unclear, the brief publisher's introduction on the cover may help in determining its subject. Further assistance or confirmation can be found in the author's or editor's foreword and the introductory chapter. Other sources of information that can be consulted if necessary include publishers' blurbs and reviews.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781843347859500016

Phishing, SMishing, and Vishing

In Mobile Malware Attacks and Defense, 2009

Experimental Setup

We optimize the classifiers' performance by testing those using different input parameters. In order to find the maximum AUC, we test the classifiers using the complete dataset, applying different input parameters. Also, we apply 10-fold-cross-validation and average the estimates of all ten folds (subsamples) to evaluate the average error rate for each of the classifiers, using the 70 features and 6,561 e-mails. We do not perform any preliminary variable selection since most classifiers in the study can perform automatic variable selection. To be fair, we use L1-SVM and penalized LR, where variable selection is performed automatically. The optimum classifiers' parameters are summarized in Table 6.4.

Table 6.4. Optimized Input Parameters in Classifiers

Classifier Input Parameters
CBART number of trees = 100 power = 1
LR λ = 1 x 10-4
RF number of trees = 50
SVM γ = 0.1 cost (c) = 12
NNet size (s) = 35 weight decay (w) = 0.7

Cross-validation is dividing the dataset into subsets. During a classifier's learning, some of these subsets are used for training, while others are used for validation.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781597492980000069