1. Fill-in-the-blanks:

    In a one-way ANOVA, SStotal is partitioned into 2 components:

    (1) _______________ (variance due to the treatment effect) and (2) _______________ (unexplained variance).

    We divide SS by their df to get ____________ which represent ________________________________________.

    We then calculate the ratio of those MS values and compare it to the _______________ distribution.

    Under the null hypothesis, we expect this ratio to equal _______________.

    If the null hypothesis is false, this ratio will be _________________________.


    The power of a statistical test refers to ____________________________________________________________.

    If we want to increase the power of our ANOVA, we could:

      (a) ______________________________ our sample size,

      (b) ______________________________ our alpha-level, or

      (c) ______________________________ the size of MSE.


    If MSE is small, our MSR will be ____________________ and we have a better chance of rejecting our null hypothesis.


  1. The following figures attempt to demonstrate how the total variation is partitioned under one-way, AxB, and AxS (repeated measures) ANOVA. Explain how the AxB and AxS work to increase the power of our test.

Drawing









Scenario: Optimism


Drawing


How optimistic are you? Are you more or less optimistic than you were when you were 16? To study this, 50 subjects were given an optimism test at age 16. These same subjects were give the same test at ages 20 and 24. A total of 18 subjects completed the survey at all 3 ages. The data for these 18 subjects is displayed below:

Drawing


# Load optimism data
optimism <- read.csv("http://www.bradthiessen.com/html5/data/opt.csv")

## Change subject and age to factor variables
optimism$age <-as.factor(optimism$age)
optimism$subject <-as.factor(optimism$subject)

## Summary of data
favstats(optimism ~ age, data=optimism)
##   age min    Q1 median    Q3 max     mean       sd  n missing
## 1  16  13 24.00     27 30.50  35 25.88889 6.578833 18       0
## 2  20  13 25.25     29 31.75  39 26.94444 7.487026 18       0
## 3  24  12 19.00     24 27.75  32 23.38889 5.922429 18       0
## Some summary plots
densityplot(~optimism | age, data=optimism, lwd=2,
            main="Optimism by age", layout=c(1,3))

## Levene's test
leveneTest(optimism ~ age, data=optimism)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.1673 0.8464
##       51


Incorrect analysis: one-way ANOVA

Let’s pretend as though the data in this study come from 3 independent groups of subjects (and not the same subjects at 3 different ages). With this erroneous assumption, let’s conduct a one-way ANOVA:

# Summarize ("anova") the analysis ("aov")
anova(aov(optimism ~ age, data=optimism))
## Analysis of Variance Table
## 
## Response: optimism
##           Df  Sum Sq Mean Sq F value Pr(>F)
## age        2  120.04  60.019  1.3396  0.271
## Residuals 51 2285.00  44.804
# Post-hoc tests with Holm's method
with(optimism, pairwise.t.test(optimism, age, p.adjust.method="holm"))
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  optimism and age 
## 
##    16   20  
## 20 0.64 -   
## 24 0.54 0.35
## 
## P value adjustment method: holm


  1. What conclusion would we make about optimism across the age groups? Calculate and interpret an effect size.










Somewhat correct analysis: AxS ANOVA

The data in this study did not come from 3 independent groups. The scores are from the same 18 subjects measured 3 different times (at ages 16, 20, and 24). This is called a repeated measures design.

  1. What are some advantages and disadvantages in designing this study for repeated measures?










Before we conduct an AxS (repeated measures) ANOVA, let’s calculate the mean optimism score for each of the 18 subjects in this study.

# Mean optimism for each subject
mean(optimism ~ subject, data=optimism)
##        1        2        3        4        5        6        7        8 
## 35.33333 32.66667 31.00000 31.00000 30.00000 29.33333 29.00000 27.66667 
##        9       10       11       12       13       14       15       16 
## 27.33333 26.33333 25.66667 25.33333 24.00000 22.66667 17.00000 16.33333 
##       17       18 
## 13.66667 13.00000


We can display those means alongside our data.

Drawing


  1. Let’s construct a model for our data. In our dataset, why do optimism scores differ from others?










The age effect is what we’re interested in estimating. When we analyzed the data with a one-way ANOVA, we treated all other sources of variation as unexplained error variance.

In an AxS design, we can partition this unexplained error variance further (by explaining some of it as individual subject differences).

We know some of the variation in optimism within and across the age groups is due to pre-existing individual differences. For example, subject #3 tends to be more optimistic than subject #18. If we can estimate these subject-effects, we can remove them from our unexplained error variance.

After estimating the age and subject effects, the only remaining variation would be due to the interaction between a subject and his or her age. Some people will become more optimistic over time, some will become less optimistic, and others will have other paths. This variation is still considered unexplained, because it includes other potential factors, too.


Let’s take a look at the paths of our subjects across the age groups:

# Profile plots (optimism by subject at each age)
interaction.plot(x.factor=optimism$age, trace.factor=optimism$subject,
                 response=optimism$optimism,
                 fun=mean, type="b", legend=F,
                 ylab = "Mean optimism score", xlab = "Age group")

# This plot looks a little nicer.  It uses the ggplot2 package
library(ggplot2)
# Here's the syntax
ggplot(data = optimism, aes(x = age, y = optimism, group = subject)) +
  ggtitle("Optimism by age") +
  ylab("optimism") + geom_line()


  1. Based on the profile plots, do you think we’ll find a significant subject effect? In other words, do the subjects seem to be consistent across age groups?










Formulas for AxS ANOVA

To calculate an AxS ANOVA by hand, you could use the following formulas:


Drawing


We can calculate everything in R:

# AxS ANOVA
# We specify the error term as "subjects nested within age groups"
axs <- aov(optimism ~ age + Error(subject/age), data=optimism)

# We have to use "summary" (and not "anova") to get the summary table
summary(axs)
## 
## Error: subject
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 17   2182   128.3               
## 
## Error: subject:age
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## age        2  120.0   60.02   19.75 2.03e-06 ***
## Residuals 34  103.3    3.04                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Here are the same results calculated by hand:


Drawing


  1. Which mean square ratios do we calculate? To help answer this, take a look at the expected value of our mean square terms. If we’re interested in the effect of age on optimism, which mean square ratio should we calculate?


Drawing









  1. Calculate the appropriate mean square ratio and compare it to the output from R. Explain why your conclusion from the AxS ANOVA differs from your conclusion from the one-way ANOVA. Then, calculate an effect size for the effect of age on optimism.









  1. Calculate an effect size for the subject effect. What proportion of variation in optimism is due to the individual subjects? Was it worthwhile to conduct this study as a repeated measures design? Explain.











  1. The models for one-way, AxB, and AxS ANOVA analyses are displayed below. Interpret each piece of the model for the AxS ANOVA.


Drawing










Other experimental designs (GwT)

Suppose we’re interested in the the effectiveness of statistics textbooks. If we sample 3 classrooms and randomly assign them to use one of 3 different textbooks, our data could look like this:


Drawing


  1. At first glance, it might seem as though we could analyze this data with a one-way ANOVA (or perhaps an AxB ANOVA). Explain why we should not use a one-way or AxB ANOVA to analyze this data. Which condition has been violated?









  1. How many independent observations do we have in our dataset: 9 or 150?










Let’s look at a different dataset. In Iowa, local governments can be organized in several ways (e.g., mayor and council with all officers elected at large; mayor elected at large with council members elected by district; council members elected by district and mayor elected from within the council; variations of these organizational types with and without a hired city manager; etc.).

A political scientist wanted to investigate whether some organizations of local governments result in more content citizens than others. He identified 4 types of government (A1, A2, A3, A4) and interviewed citizens regarding their satisfaction with the local government’s responsiveness to the needs and concerns of citizens.

5 communities were identified within each type of government organization. Within each community, 50 heads of households were randomly selected for interviews. The data are summarized below:


Drawing


  1. Notice the stratified random sampling technique used to select 5 communities and then 50 individuals within those communities. Do we have reason to believe the individuals (minor units) within each community (major unit) are similar to one another? In other words, do we believe there are dependencies within each community? Depending on your answers to the those questions, do we have 20 or 860 independent observations in this dataset? How many would we prefer to have?










This Groups within Treatments or (GwT ANOVA) design requires two separate analyses. First, we will begin under the hope that the 860 individuals in the study are independent observations. With this hope, we’ll test to see if we find significant dependencies within each community. If we find significant dependencies, we’ll need to re-run the analysis with only 20 independent observations.

Here are the formulas we’d use to run the first minor unit analysis:


Drawing


Our data would produce the following summary table. Note that weighted averages are used in all calculations (averages weighted by the number of individuals in each community):


Drawing


If that mean square ratio were not statistically significant, we could treat all 860 individuals as independent observations and run a one-way ANOVA (with SSE = SSw + SSgwa).

Since the mean square ratio in this scenario produces a p-value less than 0.05, we may conclude that individuals within communities are not independent. Then we’d have to run a major units analysis using unweighted means (ignoring the sample size within each community):


Drawing


Here are the results from this dataset:


Drawing


From this, we could conclude the types of governments are associated with different average levels of contentment.


Your turn

  1. Do some stores charge higher prices than others for the same items? To study this, the prices of 10 items were found at 4 different stores. Conduct an AxS ANOVA to determine if mean prices differ across stores. To do this, do the following:

    (a) Load the data and summarize it graphically and numerically.

    (b) Run a one-way ANOVA (ignoring the fact that we have repeated measures) and state your conclusion.

    (c) Sketch a profile plot of the prices across stores for each item.

    (d) Run the AxS ANOVA and state your conclusions.


Drawing




Publishing your solutions

When you’re ready to try the Your Turn exercises:


Sources:

This document is released under a Creative Commons Attribution-ShareAlike 3.0 Unported license.