One way analysis of variance (1-way ANOVA) using R

Data

# 31 salesmen training in three groups, 1 (Group A), 2 (Group B) and 3 (Group C) days.
perform.A = c(63.3, 68.3, 86.7, 52.8, 75, 58, 69.5, 32.7, 60.9, 58.2)
perform.B = c(72.8, 88.2, 80.8, 71.3, 81.5, 47.6, 81, 81.4, 83, 76, 74.5)
perform.C = c(82.3, 89.7, 81, 85.1, 74.1, 75.9, 74.7, 81.1, 76.4, 81.8)
sales = c(perform.A, perform.B, perform.C)
group = c(rep(1, 10), rep(2, 11), rep(3, 10))
group = factor(group, levels = c(1, 2, 3), labels = c("A", "B", "C"))
anova.df = data.frame(group, sales)

1. Normality assumption (Kolmogorov - Smirnov)
ks.test(perform.A, "pnorm", mean(perform.A), sd(perform.A))
ks.test(perform.B, "pnorm", mean(perform.B), sd(perform.B))
ks.test(perform.C, "pnorm", mean(perform.C), sd(perform.C))

Kolmogorov - Smirnov normality test is not appropriate since the parameters are computed from the same sample. Thus, two options are available.

Either
  • the Shapiro - Wilk (shapiro.test) should be applied, or 
  • Lilliefors correction should be computed for KS test. 
ks.test.like.SPSS = function(var){
  ks.test.R = ks.test(var, "pnorm", mean(var), sd(var))
  print(ks.test.R)
  
  D = ks.test.R$statistic
  n = length(var)
  
  if(n > 100){
    D = D * (n / 100)^0.49
    n = n * 100
  }
  corrected.p = exp(-7.01256* D^2*(n + 2.78019) + 2.99587*D*(n+2.78019)^(1/2) - 0.122119 + 0.97498/(n^(1/2)) + 1.67997/n) 
  
  message = "p value for Lilliefors's significance correction\n"
  message = paste(message, round(corrected.p, 4), "\n")
  message = paste(message, "Notice:\n")
  message = paste(message, "Consider the correction as more reliable when mean and sd are estimated from the data.\n")
  message = paste(message, "For more details, please consider reading the original article\n")
  message = paste(message, "Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov tests for normality with mean and variance unknown. Journal of the American Statistical Association, 62, 399–402.")
  noquote(strsplit(message, "\n")[[1]])
}
ks.test.like.SPSS(perform.A)

Note: The user should be aware that the K - S test is not sensitive for small samples, while it is very sensitive at large samples.

 
Alternative normality test (Shapiro - Wilk) # Better option for small samples.
shapiro.test(perform.A)
shapiro.test(perform.B)
shapiro.test(perform.C)

2. Homogeneity test among the three groups # For non symmetric distribution it is better to consider center = median
library(car)
leveneTest(anova.df$sales, anova.df$group, center = mean)

Alternative option for homogeneity test
fligner.test(sales ~ group, data = anova.df)

3. Comparative plot of the three distributions
library(HH)
hov(sales ~ group, data = anova.df)
hovPlot(sales ~ group, data = anova.df)

4. 1 way ANOVA
fit = aov(sales ~ group, data = anova.df)
summary(fit)

5. Error bar plot (1st option)
library(gplots)
plotmeans(sales~group,xlab="Ομάδα",ylab="Sales", main="Sales by group (95% C.I.)")

Error bar plot (2nd option)
library(Rmisc)
tgc = summarySE(anova.df, measurevar="sales", groupvars=c("group"))
gplot(data=tgc, aes(x=group, y=sales, group=1)) +
  geom_errorbar(aes(ymin=sales-sd, ymax=sales+sd), width=.1) +
  geom_line(linetype = "dashed") +
  geom_point() +
  labs(title="Sales by group",x="Group", y = "Sales") +
  theme_classic()

6. Post hoc split in homogeneous groups
library(agricolae)
posthoc = LSD.test(fit, "group", alpha=0.05)
print(posthoc)

7. Homogeneity is not assumed: Welch test
library(onewaytests)
welch.test(sales ~ group, anova.df)

8. Neither normality nor homogeneity are assumed
# Kruskal Wallis test. Non parametric analysis of variance.
kruskal.test(sales ~ group, anova.df)

9. Homogeneous groups using a non parametric method.
anova.df$rank.sales = rank(anova.df$sales)
rank.fit = aov(rank.sales ~ group, data = anova.df)
summary(rank.fit)
library(agricolae)
posthoc = LSD.test(rank.fit, "group", alpha=0.05)
print(posthoc)

The above example is contained in the paragraph 5.2 of the book "Στατιστική ανάλυση με τη γλώσσα R" (in Greek, ISBN: 978-960-93-9445-1) published in Thessaloniki, 2017.

Comments