Lesson 4 – Analysis of Relationships

Introduction

  • The analysis of relationships in statistics is crucial for understanding how two or more variables interact. In psychology, it helps determine whether variables such as intelligence and academic performance, or stress and health, are related and to what extent.
  • It covers correlation (measuring the strength and direction of a relationship) and regression (predicting one variable from another).

Understanding Correlation

Correlation helps us understand how two variables change together.

Scatter Diagram

  • A scatter diagram is a graph that visually represents the relationship between two variables.
  • Each point on the graph represents a pair of scores (X, Y).
  • Patterns in the plot indicate the type and strength of relationship:
  • Dots closely following a straight line = strong linear relationship.
  • Scattered dots = weak or no relationship.

◦ To construct a scatter plot:

  1. Draw the x and y axes.
  2. Plot the values of the variables on the axes.
  3. Mark the intersection of corresponding values with a dot.
  4. Label each axis and title the graph

Types of Relationships

a. Linear Relationship

  • Described by a straight line.
  • The change in one variable corresponds to a proportional change in another.

b. Non-linear (Curvilinear) Relationship

  • The relationship follows a curve, not a straight line.
  • Example: Stress and performance may rise together up to a point, after which performance declines.

Correlation

Correlation measures the degree and direction of a linear relationship between two variables. It is quantified using a statistic called the correlation coefficient (denoted as r).

Correlation Coefficient (r)

Values range from -1 to +1:

+1 = Perfect positive correlation.

-1 = Perfect negative correlation.

0 = No correlation.

Magnitude shows strength; sign shows direction.

Types of Correlation

  • Positive correlation: High values of one variable are associated with high values of another.
  • Negative correlation: High values of one variable are associated with low values of another.
  • Zero correlation: No association.

Calculating Pearson's Correlation 

Pearson’s correlation coefficient can be calculated using two methods 

1. Standard score or Z-score formula:

r = ΣZxZy / n

Breakdown of formula:

r = Pearson’s correlation coefficient

Zx = z-score for each value of variable X

Zy = z-score for each value of variable Y

Σ = Summation (add up all the values)

n = Number of pairs of data points

This formula emphasizes that correlation is about how much each variable deviates from its mean in standard units.

2. Deviation Score Formula:

r = Σ(X – X̄)(Y – Ȳ) / √[Σ(X – X̄)² * Σ(Y – Ȳ)²]

Breakdown of formula:

r = Pearson’s correlation coefficient

X = Individual value of variable X

X̄ = Mean of variable X

Y = Individual value of variable Y

Ȳ = Mean of variable Y

Σ = Summation

This formula is often more practical for manual calculation as it works directly with the raw scores and their deviations from their respective means.

◦ Steps Involved (General Outline):

The specific steps depend on which formula you use, but the general process involves:

  1. Organizing your data into pairs of X and Y values.
  2. Calculating the necessary intermediate values (e.g., z-scores, deviations from the mean, squared deviations, etc.).
  3. Plugging those values into the chosen formula.
  4. Performing the calculations to arrive at the value of r.

◦ Important Considerations:

  • Pearson’s r measures linear relationships. If the relationship between the variables is curved, Pearson’s r may not accurately reflect the strength of the relationship.
  • Outliers can significantly influence the value of r.
  • It’s essential to visualize the data with a scatter plot to get a sense of the relationship before relying solely on the r value.

Correlation and Causation

  • Correlation does not imply causation.
  • A strong correlation between two variables doesn’t necessarily mean that one causes the other. There may be other factors involved, or the relationship could be coincidental.

Correlation vs. Causation

  • Correlation shows association, not cause-effect.
  • Spurious correlations can occur due to lurking variables.

Effects of Linear Score Transformations

  • Linear transformations (adding, subtracting, multiplying, or dividing by a constant) do not change the correlation coefficient.
  • The direction and strength of the linear relationship remain the same.

Factors Influencing Correlation

Several factors can affect the correlation coefficient, including:

Restriction of range: If the range of data is limited, the correlation may be lower.

Outliers: Extreme values can disproportionately influence the correlation.

Non-linear relationships: Pearson’s r only measures linear relationships; if the relationship is curved, r may be low even if there is a strong relationship.

Spearman Rank Correlation Method

It measures the degree to which the ranks of two variables are related.

Unlike Pearson’s correlation, which measures linear relationships between continuous variables, Spearman’s rho is used when:

  • The data is ordinal (ranked).
  • The relationship between the variables is monotonic but not necessarily linear (monotonic means that as one variable increases, the other variable tends to increase or decrease, but not necessarily at a constant rate).

Spearman’s rho also ranges from -1.0 to +1.0:

  • +1.0: Perfect monotonic increasing relationship between the ranks.
  • -1.0: Perfect monotonic decreasing relationship between the ranks.
  • 0: No monotonic relationship between the ranks.

Formula for Spearman’s Rho: ρ = 1 – [ (6 * ΣD²) / (n * (n² – 1)) ]

Breakdown of formula:

ρ = Spearman’s rank correlation coefficient

D = The difference between the ranks of the paired observations (for each pair of X and Y values, subtract the rank of Y from the rank of X)

ΣD² = The sum of the squared differences of the ranks

n = The number of pairs of observations

  Steps to Calculate Spearman’s Rho:

  1. Rank the data:
  • Rank the observations for each variable separately.
  • If there are ties (two or more values are the same), assign the average rank to the tied values.
  1. Calculate the differences:
  • For each pair of observations, calculate the difference (D) between the rank of X and the rank of Y.
  1. Square the differences:
  • Square each of the differences (D²).
  1. Sum the squared differences:
  • Add up all the squared differences (ΣD²).
  1. Apply the formula:
  • Plug the values of ΣD² and n into the Spearman’s rho formula and calculate ρ.

Example Scenario:

Suppose you want to find the relationship between students’ performance in a music competition (ranked) and their creativity scores (also ranked).

Spearman’s rho would be appropriate here because the data is ordinal (the students are ranked).

Linear Regression Analysis/Simple Regression

Regression analysis is used to predict the value of one variable (dependent variable) from another (independent variable).

Simple linear regression involves one independent and one dependent variable.

The goal is to find the “best-fitting” line that represents the relationship between the variables.

Equation of the regression line: Y = b₀ + b₁X

  • Y is the dependent variable.
  • X is the independent variable.
  • b₀ is the y-intercept.
  • b₁ is the slope.