The Kruskal-Wallis test is a non-parametric statistical test used in biostatistics to compare three or more independent samples.
It's an extension of the Mann-Whitney U test and is used when you want to determine if there are statistically significant differences between the medians of the groups.
It's particularly useful when the data do not meet the assumptions of normality required for ANOVA (Analysis of Variance).
Characteristics
Non-parametric: The Kruskal-Wallis test does not assume that the data follows a normal distribution.
Multiple Groups: Suitable for comparing three or more independent samples, unlike the Mann-Whitney U test which is used for two groups.
Ordinal or Continuous Data: Works well with ordinal data (ranked data) or continuous data.
Median Comparison: Essentially compares the medians of the groups by analyzing the ranks of the data.
Steps to Perform the Kruskal-Wallis Test
1. Rank the Combined Data:
Combine all groups into a single dataset and rank all observations from smallest to largest, regardless of the group they belong to. Handle ties by assigning the average rank to tied values.
2. Calculate the Sum of Ranks for Each Group:
For each group, calculate the sum of the ranks of its observations.
3. Compute the Kruskal-Wallis Statistic 𝐻:
Let 𝑛 be the total number of observations across all groups, and 𝑛𝑖 be the number of observations in group 𝑖.
The Kruskal-Wallis statistic 𝐻H is calculated using:
where 𝑅𝑖 is the sum of ranks for group 𝑖, and 𝑘 is the number of groups.
4. Determine the Degrees of Freedom and Calculate the P-value:
The degrees of freedom (df) for 𝐻 are 𝑘 − 1, where 𝑘 is the number of groups.
For large samples, 𝐻 follows a chi-square distribution with 𝑘 − 1 degrees of freedom. Use this distribution to calculate the p-value.
5. Interpret the Results:
Compare the p-value with your chosen significance level (commonly 0.05).
If the p-value is less than the significance level, reject the null hypothesis that all groups have the same median.
Calculation Example
Suppose you have three groups with the following data:
Group 1: [7, 15, 6, 9]
Group 2: [13, 22, 8, 12]
Group 3: [11, 9, 10, 18]
Step 1: Rank the Combined Data
Combine and sort all data: [6, 7, 8, 9, 9, 10, 11, 12, 13, 15, 18, 22]
Assign ranks: [1, 2, 3, 4.5, 4.5, 6, 7, 8, 9, 10, 11, 12]
Step 2: Sum of Ranks for Each Group
Group 1 ranks: 2, 10, 1, 4.5 → Sum: 17.5
Group 2 ranks: 9, 12, 3, 8 → Sum: 32
Group 3 ranks: 7, 4.5, 6, 11 → Sum: 28.5
Step 3: Compute 𝐻
Total 𝑛 = 12
𝑅1 = 17.5, 𝑅2 = 32, 𝑅3 = 28.5
𝑛1 = 4, 𝑛2 = 4, 𝑛3 = 4
𝐻 = 12 / 12×13 (17.52/4+322/4+28.52/4) − 3 × 13
𝐻 ≈ 12/156 (76.5625 + 256 + 203.0625)−39
H ≈12/156 × 535.625 − 39
𝐻 ≈ 6427.5 / 156 – 39
𝐻 ≈ 41.2−39
𝐻 ≈ 2.2
Step 4: Calculate P-value
𝑑𝑓 = 3 – 1 = 2
Look up or compute the p-value using the chi-square distribution with 𝐻 ≈ 2.2 and 𝑑𝑓 = 2.
Step 5: Interpret the Results
Compare the p-value with 0.05. If the p-value is higher, you retain the null hypothesis; otherwise, you reject it.
This simplified example should guide you through the Kruskal-Wallis test process, helping you to analyze non-parametric data effectively in biostatistical research.