Introduction of Linear Discriminant Analysis (LDA)

November 15, 2017 Author: munishmishra04_3od47tgp
Print Friendly, PDF & Email

LDA is widely used to find linear combinations of features while preserving class separability. Unlike PCA, LDA tries to model the differences between classes. Classic LDA is designed to take into account only two classes. Specifically, it requires data points for different classes to be far from each other, while points from the same class are close. Consequently, LDA obtains differenced projection vectors for each class. Multi-class LDA algorithms which can manage more than two classes are more used. Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid over fitting (“curse of dimensionality”) and also reduce computational costs.

Summarizing the LDA approach in 5 steps

Listed below are the 5 general steps for performing a linear discriminant analysis; we will explore them in more detail in the following sections.

  1. Compute the d-dimensional mean vectors for the different classes from the dataset.
  2. Compute the scatter matrices (in-between-class and within-class scatter matrix).
  3. Compute the eigenvectors ​\( (e_1,e_2,…e_d) \)​ and corresponding eigenvalues ​\( (λ_1,λ_2,…λ_d) \)​ for the scatter matrices.
  4. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form  a d*k dimensional matrix (where every column represents an eigenvector).
  5. Use this d * k  eigenvector matrix to transform the samples onto the new subspace. This can be summarized by the matrix multiplication: Y = X * W (where X is a n * d dimensional matrix representing the nn samples, and y are the transformed n * k dimensional samples in the new subspace).

Different Approaches to LDA

Data sets can be transformed and test vectors can be classified in the transformed space by two different approaches.

Class-dependent transformation: This type of approach involves maximizing the ratio of between class variance to within class variance. The main objective is to maximize this ratio so that adequate class separability is obtained. The class-specific type approach involves using two optimizing criteria for transforming the data sets independently.

Class-independent transformation: This approach involves maximizing the ratio of overall variance to within class variance. This approach uses only one optimizing criterion to transform the data sets and hence all data points irrespective of their class identity are transformed using this transform. In this type of LDA, each class is considered as a separate class against all other classes.

LDA Example

Compute the Linear Discriminant projection for the following two-dimensional dataset:

two-dimensional dataset

\[ X_(1 )=(x_1,x_2 )={(4,1),(2,4),(2,3),(3,6),(4,4)} \]

\[ X_(2 )=(x_1,x_2 )={(9,10),(6,8),(9,5),(8,7),(10,8)} \]


  • The class statistics are:

class statistics

  • The within- and between- class scatter are:

within- and between- class

  • The LDA projection is then obtained as the solution of the generalized eigenvalue problem

generalized eigenvalue

  • Or directly by

directly by

Variants of LDA

Non-parametric LDA

NPLDA removes the unimodal Gaussian assumption by computing the between-class scatter matrix SB using local information and the K Nearest Neighbors rule. As a result of this:

  • The matrix ​\( S_B \)​ is full-rank, allowing us to extract more than (C-1) feature
  • The projections are able to preserve the structure of the data more closely

Orthonormal LDA

OLDA computes projections that maximize the Fisher criterion and, at the same time, are pair-wise orthonormal

The method used in OLDA combines the eigenvalue solution of ​\( S_W^{-1} S_B \)​ and the Gram-Schmidt orthonormalization procedure

  • OLDA sequentially finds axes that maximize the Fisher criterion in the subspace orthogonal to all features already extracted
  • OLDA is also capable of finding more than (C-1) features
  • OLDA computes projections that maximize the Fisher criterion and, at the same time, are pair-wise orthonormal

Generalized LDA

GLDA generalizes the Fisher criterion by incorporating a cost function similar to the one we used to compute the Bayes Risk

  • The effect of this generalized criterion is an LDA projection with a structure that is biased by the cost function
  • Classes with a higher cost ​\( C_{ij} \)​ will be placed further apart in the low-dimensional projection


[1] Guerreiro, Pedro Miguel Correia, “Linear Discriminant Analysis Algorithms”, Unpublished master’s thesis), Technical University of Lisbon, Portugal (2008)

[2] S. Balakrishnama and A. GanapathirajuLinear Discriminant Analysis – A Brief Tutorial, Institute for Signal and Information Processing

[3] “LECTURE 10: Linear Discriminant Analysis”, available online at:

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview