What are Classifiers?
A classifier is a machine learning model used for classification tasks, where the goal is to predict which category (or “class”) a given data point belongs to.
Example Problems:
- Determining if an email is “spam” or “not spam.”
- Predicting whether a patient has “disease” or “no disease” based on medical tests.
The classifier learns from labeled training data (input features and their corresponding classes) to map input features to target labels. The output of the classifier is a decision boundary that separates different classes.
Types of Classifiers
Linear Classifiers
- Use straight-line decision boundaries (or hyperplanes) to separate classes.
- Examples: Logistic Regression, Linear Support Vector Machine (SVM), Perceptron.
Nonlinear Classifiers
- Handle data that is not linearly separable using complex boundaries.
- Examples: Decision Trees, Random Forests, Neural Networks, Nonlinear SVM.
Probabilistic Classifiers
- Output probabilities of data points belonging to each class.
- Examples: Naive Bayes, Logistic Regression (interpreted probabilistically).
Ensemble Methods
- Combine multiple classifiers to improve performance.
- Examples: Bagging (e.g., Random Forest), Boosting (e.g., AdaBoost, Gradient Boosting).
Linear Classifiers
Linear classifiers create a decision boundary based on a linear function: $$ f(x) = w_1x_1 + w_2x_2 + \dots + w_nx_n + b $$
- $
x_1, x_2, \dots, x_n
$: Input features. - $
w_1, w_2, \dots, w_n
$: Weights (parameters defining feature importance). - $
b
$: Bias (shifts the decision boundary).
The sign of $f(x)
$ determines the class:
- $
f(x) > 0
$: Class A. - $
f(x) < 0
$: Class B.
Examples of Linear Classifiers:
- Logistic Regression: Maps input to probabilities using the sigmoid function.
- Support Vector Machine (SVM): Maximizes the margin (distance) between classes.
- Perceptron: Simplest neural network that works as a linear classifier.
Strengths:
- Simple, efficient, and interpretable.
- Works well on linearly separable data.
Weaknesses:
- Struggles with nonlinear separable data.
- Sensitive to outliers.
Bias Term ($b
$)
The bias term is a parameter in the linear function that shifts the decision boundary: $$ f(x) = w_1x_1 + w_2x_2 + \dots + w_nx_n + b $$
Purpose:
- Shifts the decision boundary away from the origin.
- Provides more flexibility in separating data.
Example:
- Without bias ($
b = 0
$): The line passes through the origin. - With bias ($
b \neq 0
$): The line shifts up, down, or elsewhere in the feature space.
Classification for Multiple Classes
Binary Classification
- Involves two classes and a decision boundary dividing the feature space.
Multiclass Classification
- Handled using:
- One-vs-Rest (OvR): Creates one binary classifier per class.
- One-vs-One (OvO): Creates a binary classifier for every pair of classes.
Higher Dimensions
- The decision boundary becomes a hyperplane:
- Line in 2D.
- Plane in 3D.
- Hyperplane in higher dimensions.
Numerical Example of Linear Classification
Dataset:
Data Point | $x_1 $ | $x_2 $ | Target ($y $) |
---|---|---|---|
1 | 2 | 3 | 1 |
2 | 1 | 5 | 1 |
3 | 2 | 1 | -1 |
4 | 3 | 2 | -1 |
Linear Classifier Equation:
$$ f(x) = w_1x_1 + w_2x_2 + b $$
- Initialize weights: $
w_1 = 1
$, $w_2 = 1
$, and $b = 0
$.
Predictions:
For each data point, compute $f(x)
$:
- For $
x_1 = 2, x_2 = 3
$: $f(x) = 5
$ → Class 1. - For $
x_1 = 1, x_2 = 5
$: $f(x) = 6
$ → Class 1. - For $
x_1 = 2, x_2 = 1
$: $f(x) = 3
$ → Class 1 (incorrect). - For $
x_1 = 3, x_2 = 2
$: $f(x) = 5
$ → Class 1 (incorrect).
Decision Boundary After Training:
Update weights and bias to minimize errors: $x_1 + x_2 = 4
$
Suggested Topics for Further Study
- Training Process: Understand gradient descent for optimizing weights and bias.
- Loss Functions: Explore error measurements like cross-entropy and hinge loss.
- Handling Nonlinear Data: Study kernel methods and nonlinear classifiers.
- Classifier Evaluation: Learn metrics like accuracy, precision, recall, and F1-score.
Let me know if this formatting suits you or if you’d like further refinements!