Lesson – 2: Central Tendency

Unit – 1: Introduction to Descriptive Statistics

Lesson 2: Central Tendency

• Introduction

Statistics is defined by the American Statistical Association (ASA, 2023) as:

“The science of learning from data, and of measuring, controlling and communicating uncertainty.”

In psychology, statistics help:

-Analyze and interpret data.

-Test research hypotheses.

-Establish relationships between variables.

-Draw valid conclusions and predictions.

Descriptive Statistics

-It is the statistics that involves the process of summarizing and describing the features of a dataset.

-It involves calculating the measures of central tendency like mean, median, mode and the measure of variability like range, semi-quartile range, variance and standard deviation of the dataset.

Inferential Statistics

-This kind of statistics involves the use of the sample data to make inferences about a population.

-It is also used to test hypotheses and make predictions about the population.

Measures of Central Tendency: Definition, Properties and Comparison

Central tendency is also known as central location measures which are statistical values that indicate the central location of a set of data in a distribution.

The three major measures of central tendency are: Mean, Median, Mode.

Mean:

The mean (average) is the sum of all values divided by the number of values. It is denoted by μ (mu) for population and M for sample.

Properties of Mean:

1. Sensitive to outliers – extreme values greatly affect it.

2. Suitable when data is normally distributed.

3. Takes into account all values in the dataset.

4. Rigidly defined and easy to calculate.

Median:

The median is the middle value in an ordered dataset.

If the number of values is:

Odd: middle value is the median.

Even: median = average of two middle values.

Also called: Positional Average

Properties of Median

1. Not affected by outliers.

2. Best for skewed distributions.

3. Easy to compute and understand.

Mode:

The mode is the most frequently occurring value in the dataset.

Properties of Mode

1. Not affected by outliers.

2. Useful for categorical and repeated values.

3. A dataset can be:

-Unimodal (1 mode),

-Bimodal (2 modes),

-Multimodal (more than 2),

-Or have no mode at all.

Comparison of Mean, Median, and Mode

The choice of using mean, median, or mode depends on the nature of the data and the purpose of the analysis.

Usefulness:

Mean: Most useful when data is symmetrical and lacks outliers.

Median: More suitable for skewed data or when outliers are present.

Mode: Best for categorical data or identifying the most frequent value.

Sensitivity to Outliers:

Mean: Highly sensitive to extreme values.

Median: Not affected by outliers.

Mode: Not affected by outliers.

Mathematical Properties:

Mean: Can be used for further statistical calculations.

Median: Represents the exact middle value.

Mode: Simple to determine but may not exist or be unique.

Calculation of Mode, Median and Mean from Raw Scores

Calculation of Mean

The mean (average) is the sum of all values divided by the total number of values.

Formula: μ = (Σx) / N (where Σx is the sum of all values, and N is the number of values).

Example: Raw scores: 8, 3, 6, 5, 11, 5, 2

Mean: (8 + 3 + 6 + 5 + 11 + 5 + 2) / 7 = 5.71

Calculation of Median

-To find the median, arrange the data in ascending order.

-If there’s an odd number of values, the median is the middle value.

-If there’s an even number, the median is the average of the two middle values.

Example (Odd number of values):

Raw data: 8, 3, 6, 5, 11, 5, 2

Ordered set: 2, 3, 5, 5, 6, 8, 11

Median: 5

Example (Even number of values):

Raw data: 8, 3, 6, 11, 5, 2

Ordered set: 2, 3, 5, 6, 8, 11

Median: (5 + 6) / 2 = 5.5

Calculation of Mode

The mode is the value that appears most frequently in a dataset.

Example:

Raw scores: 8, 3, 6, 5, 11, 5, 2

Ordered set: 2, 3, 5, 5, 6, 8, 11

Mode: 5 (appears twice)

Effects of Linear Score Transformations on Measures of Central Tendency

◦ Linear score transformations involve changing all scores in a dataset by the same rule, using addition, subtraction, multiplication, or division.

◦ If you transform scores (X) into new scores (Y) using the formula Y = aX + b:

-The new mean (Y) is calculated as: a (mean of X) + b

-The median changes in the same way as the scores.

-The mode may or may not change, depending on the values of a and b.

Linear Transformation Formula:

Y = aX + b

Breakdown of formula:

X = original score

Y = transformed score

a = scaling factor

b = shift factor

Effect:

-Mean and Median both shift according to the formula.

-Mode may or may not change.

Example:

X = [1, 3, 4], b = +4

Y = [5, 7, 8]

Mean changes from 2.667 → 6.667

Median: 3 → 7

Measures of Variability: Range; Semi-Interquartile Range; Variance; Standard Deviation (Properties and Comparison)

Measures of variability describe how spread out or dispersed the data points are in a dataset.

Range and Semi-Interquartile Range

• Range

-The range is the difference between the highest and lowest values in a dataset.

-It is simple to calculate but is greatly affected by extreme values.

• Semi-Interquartile Range

-The semi-interquartile range (SIQR) is a measure of variability that, unlike the range, is not sensitive to extreme scores.

-It is calculated as one-half of the interquartile range (IQR). The interquartile range is the difference between the third quartile (Q3) and the first quartile (Q1).

-SIQR = (Q3 – Q1) / 2

• Variance

-Variance measures the average squared deviation of each score from the mean.

-It indicates how much individual scores vary around the mean.

-A higher variance means greater variability in the dataset.

• Standard Deviation

-The standard deviation is the square root of the variance.

-It is a widely used measure of variability that represents the average distance of scores from the mean.

-It is expressed in the same units as the original data, making it easier to interpret than variance.

• Quartile Deviation

-Quartile deviation is a measure of dispersion.

-It is calculated as one-half of the difference between the third quartile (Q3) and the first quartile (Q1).

-It is less sensitive to extreme values than the standard deviation.

 Calculation of Variance and Standard Deviation

Calculation of Variance

Variance measures the spread of data points around the mean.

Variance Formula = Σ (x – x̄)² / n

Breakdown of formula:

x = each individual value in the dataset

x̄ = the mean (average) of the dataset

n = the number of values in the dataset

Steps to calculate variance:

  1. Calculate the mean (x̄) of the dataset.
  2. For each value (x), subtract the mean (x̄) and square the result: (x – x̄)²
  3. Sum all the squared differences: Σ (x – x̄)²
  4. Divide the sum by the number of values (n).

Example:

Dataset: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

Mean = (1 + 2 + 3 + … + 15) / 15 = 8

Variance = ((1-8)² + (2-8)² + (3-8)² + (4-8)² + (5-8)² … + (15-8)²) / 15 = 280 / 15 = 18.6

Calculation of Standard Deviation

-Standard deviation measures the typical distance of data points from the mean.

-It is the square root of the variance.

-Formula: Standard Deviation = √Variance

Example (using the variance from the previous example):

Standard Deviation = √18.66 = 4.3197

 Effects of Linear Score Transformations on Measures of Variability

◦ A linear transformation refers to changing each score in a dataset by using formula Y = aX + b

◦ The new standard deviation (SDY) is equal to the original standard deviation (SDX) multiplied by the absolute value of the scaling factor ‘a’. The shift factor ‘b’ does not affect the standard deviation.

◦ The variance of the transformed score is equal to the original variance multiplied by a^2.

◦ The range and semi-interquartile range are affected in the same way as the standard deviation.

Important names and dates

1.American statistical association (ASA): ASA in the introduction when providing a definition of statistics (2023)

2.Vetter T. R.: Cited in the definition of the mean.

3.Hurley & Tenny: Cited in the properties of mean.

4.Manikandan S.: Cited in the definition of median.

5.Sundaram et al.: Cited within Manikandan S.