Scenario: Ambiguous Prose

If the balloons popped, the sound wouldn’t be able to carry since everything would be too far away from the correct floor. A closed window would also prevent the sound from carrying, since most buildings tend to be well insulated. Since the whole operation depends on a steady flow of electricity, a break in the middle of the wire would also cause problems. Of course, the fellow could shout, but the human voice is not loud enough to carry that far. An additional problem is that a string could break on the instrument. Then there could be no accompaniment to the message. It is clear that the best situation would involve less distance. Then there would be fewer potential problems. With face to face contact, the least number of things could go wrong.

Wait… what?


In 1972, 57 high school students were asked to listen, understand, and remember that ambiguous passage. The students were randomly assigned to 3 groups:

   • 19 students were shown a picture before reading the passage

   • 19 students were shown a picture after reading the passage

   • 19 students were not shown any picture before or after


Here’s the picture that was shown to two of the groups:

Drawing


The students were then asked:

   • How easy was it to comprehend the passage? Students responded on a 1-7 scale, where 1 = very difficult and 7 = very easy.

   • How well can you recall the passage? Students were asked to write out the passage, listing as many ideas as they remember.


Does showing the picture before, after, or not at all influence a student’s ability to comprehend and recall the passage?

To answer this question, we’ve decided to compare the means from all 3 groups.



Data

We’ll download this data from my website.

# Download the ambiguousprose.csv file from my website
ambiguous <- read.csv("http://www.bradthiessen.com/html5/data/ambiguousprose.csv")

# Look at the first several rows to get a sense of the data
head(ambiguous)
##   Condition Comprehension
## 1     After             6
## 2     After             5
## 3     After             4
## 4     After             2
## 5     After             1
## 6     After             3


Using the favstats() command, I found the following summary statistics:

Group n mean median std. dev
None 19 3.368 3 1.25656
Before 19 4.947 5 1.31122
After 19 3.211 3 1.39758
Total N = 57 M = 3.842 Mdn = 4 s = 1.52115
  1. What do the values M and s represent? Based on these summary statistics, do you think we’ll conclude the group means differ?









Let’s visualize our data:

# Dot plots to see the distribution of scores within each group
dotPlot( ~ Comprehension | Condition, data = ambiguous, 
         nint = 17, endpoints = c(-2, 10), layout = c(1,3), 
         aspect = .35, cex=.7, xlab = "Comprehension score (0-7)")

# Error bars showing mean and standard errors
# I need to load the ggplot2 library to generate this plot
library(ggplot2)
ggplot(ambiguous, aes(Condition, Comprehension)) + 
  stat_summary(fun.y = mean, geom = "point") + 
  stat_summary(fun.y = mean, geom = "line", aes(group = 1),
               colour = "Blue", linetype = "dashed") +
  stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) +
  labs(x = "Treatment", y = "Mean Comprehension Score")

  1. Based on these visualizations, do you think we’ll conclude the group means differ? Explain.









  1. Write out null and alternative hypotheses regarding the population means of our 3 groups.









Now let’s develop methods and evaluate some methods we could use to compare 3 or more group means.



Idea #1: Multiple t-tests

We know how to conduct independent samples t-tests, so why not run a series of t-tests to compare all possible pairs of group means?

  1. How many t-tests would we need to conduct in order to test all possible pairs of group means in this example? How many t-tests would we need to conduct if we had G groups?











  1. Suppose we conduct 3 t-tests to compare all pairs of group means in this example. If we set \(\alpha =0.05\) for each test, what would be the probability of making at least one alpha error in our 3 tests? In other words, what’s \(P(\mathrm{at\ least\ one\ }\alpha\ \mathrm{error\ in\ 3\ tests})\)?











  1. Suppose we conduct t-tests to compare all possible pairs of means from G groups. If we have a Type I error rate of \(\alpha\) for each test, what would be the probability of making at least one alpha error in all our tests?











  1. Explain the problem with running multiple t-tests to compare pairs of group means from a single dataset.









To run multiple t-tests on our data, we could use the t.test() command on different subsets of our data. Even better, we can use the pairwise.t.test() command to run all possible t-tests for a dataset.

# We could use "with" to identify the dataset
# with(ambiguous, pairwise.t.test(Comprehension, Condition, p.adjust.method="none"))

# Or we could identify each variable as belonging to the ambiguous dataset
# p.adjust.method="none" just means I don't want R to modify the p-value
# In the next lesson, we'll see why we might want to modify the p-value
pairwise.t.test(ambiguous$Comprehension, ambiguous$Condition, p.adjust.method="none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  ambiguous$Comprehension and ambiguous$Condition 
## 
##        After   Before 
## Before 0.00017 -      
## None   0.71444 0.00054
## 
## P value adjustment method: none


  1. The output states the t-tests used a “pooled SD.” What does that mean?






  1. The output shows p-values from each of our 3 t-tests. From this what could we conclude? Why are we uncertain about our conclusions? Is there anything we could do to increase our certainty?










Idea #2: Get MAD (or SAD)

It looks like running multiple t-tests is problematic. Could we use a randomization-based method to compare our 3 group means?


When we compared 2 means, our test statistic was:   \(\overline{X}_{1}-\overline{X}_{2}\). That statistic made sense. If the means were similar, the test statistic was small. If the means were further apart, the test statistic was larger.


  1. Can you think of a test statistic we could use to measure the difference among 3 group means? We want a single number that changes depending on whether the means are similar are different.











Here, once again, are the means of the 3 groups in this study:

mean(Comprehension ~ Condition, data=ambiguous)
##    After   Before     None 
## 3.210526 4.947368 3.368421


We can calculate the SAD or MAD from our dataset:

# Calculate SAD of the group means
SAD( mean(Comprehension ~ Condition, data=ambiguous) )
## [1] 3.473684
# Calculate MAD of the group means
MAD( mean(Comprehension ~ Condition, data=ambiguous) )
## [1] 1.157895


  1. Verify these values for the SAD and MAD to ensure you understand what they represent. Under the null hypothesis, what value would we expect for these test statistics?









Now that we have our test statistic, we can run a randomization-based test. To do this, we need to state our randomization null hypothesis that the treatment groups in this study had no effect on how easy the ambiguous passage was to comprehend.

  1. Assume the null hypothesis is true. In the actual study, the first subject was shown the picture after hearing the passage and rated the passage a 6 in ease of comprehension. Suppose we go back in time and run this study again, randomly assigning subjects to treatment groups. If this same subject was now randomly assigned to the none group, what rating would the subject give to the passage for ease of comprehension?






  1. Suppose we keep going back in time to randomly assign subjects to groups. How many different ways could we randomly assign 57 subjects to 3 equally-sized groups?










We can have R calculate this number of possible randomizations:

# choose(n, k) calculates the number of ways to choose k objects from n
options(scipen=999) # Eliminates scientific notation

# Calculate the number of ways to split 57 subjects in 3 groups
(choose(n=57, k=19) * choose(n=38, k=19) * choose(n=19, k=19)) / factorial(3)
## [1] 3752394405341099206901760
options(scipen=5) # Turn scientific notation back on


That’s way too many randomizations to list (or compute), so we’ll have the computer take a random, representative sample of 10,000 randomizations. For each randomization, we’ll calculate our MAD test statistic. Then, once we have 10,000 test statistics, we’ll plot them and determine the likelihood that our test statistic came from this distribution.

# Let's store our observed MAD statistic (MAD = 1.1579) as MADobs
MADobs <- MAD(mean(Comprehension ~ Condition, data=ambiguous))

# Now let's randomize our data by shuffling the condition groups 10,000 times
MADrandomized <- do(10000) * MAD(mean(Comprehension ~ shuffle(Condition), data=ambiguous))

# Plot the distribution of MAD values from the randomized distribution
histogram(~MAD, data=MADrandomized,
          xlim=c(-.1,2),    # Set the x-axis range
          width=.1,         # Set the width of each bar
          xlab="Possible MAD values assuming null hypothesis is true", 
          groups=MAD >= MADobs,  # Highlight values greater than MADobs
          main=paste("mean =", round(mean(MADrandomized$MAD, na.rm=T),3),
                     "; std. dev =",round(sd(MADrandomized$MAD, na.rm=T),3),
                     "; p-value = ", prop(~MAD >= MADobs, data=MADrandomized)))
ladd(panel.abline(v=MADobs))   # Add vertical line at test statistic

# Find p-value
prop(~MAD >= MADobs, data=MADrandomized)
##   TRUE 
## 0.0006


Based on this p-value, it appears as though it would have been extremely unlikely to observe our data if, in fact, the treatment groups had no effect on comprehension ratings.


  1. When we used randomization-based methods to compare two means, our randomization distribution was centered at zero with a somewhat unimodal, symmetric appearance. It looks like the randomization distribution we just plotted is not centered at zero and is positively skewed. Explain why we should expect this.










It appears as though this randomization-based method (using MAD or SAD) allows us to compare 2+ group means. Does it have any drawbacks (aside from estimating a slightly different p-value each time we run the randomization test)?


To find out, take a look at the following figure:

Drawing


The 3 dotplots in the middle column represent the actual data in this study. The datasets in the left and right columns are simulated to have identical means as the actual data.

The red arrows point to the mean comprehension score for each group. Since the means are identical across each dataset, each dataset would yield the same MAD value.


  1. Which dataset (A, B, or C) provides the strongest evidence against the null hypothesis? In other words, which dataset represents the greater difference among group means? Explain.













Instead of using the MAD statistic as test statistic, perhaps we should create a statistic that standardizes the mean differences by comparing the variability between group means to the variability within each group.

If we’re able to calculate these two measures of variability, we’ll simply need to compare them to see if they differ from one another.


  1. How do we compare two variances?










Idea #3: ANOVA (mean square ratio)

Our goal is to calculate:  \(F=\frac{\textrm{variance between group means}}{\textrm{variance within groups}}\)


Because we’ll be analyzing variances, we call this process analysis of variance or ANOVA. Despite its name, remember the purpose of ANOVA is to compare means; not variances.


If you remember from the previous lesson, ratios of variances follow an F-distribution if we assume:

   • The population (or expected) variances are equal

   • The populations follow normal distributions


It looks like we’ve already spotted a drawback with ANOVA – we’re going to need to make some assumptions. We’ll investigate those assumptions later. For now, let’s see how ANOVA works on a simplified dataset.


Scenario: Diet and cholesterol

To investigate the effect of diet on cholesterol, you randomly select 15 subjects and assign them to one of 3 diets: vegetarian, healthy, or regular. After six months, you measure the cholesterol level of each individual.

Let’s create a dataset for this study:

# Diets
diet <- c(rep("vegetarian",5), rep("healthy",5), rep("regular", 5))
cholesterol <- c(190, 202, 208, 216, 225, 
                 200, 205, 210, 215, 230,
                 245, 260, 265, 270, 280)
dietstudy <- data.frame(diet, cholesterol)


Drawing


  1. Below, I’ve sketched hypothetical population distributions for each of our groups. On the left, I assumed the null hypothesis was true. On the right, I assumed it was false. Write out the null hypothesis and list the assumptions I made in creating these distributions.


Drawing










Our model

Let’s construct a model for this cholesterol study. Think about 3 subjects in this study:

     • A person randomly assigned to the vegetarian group who ended with a cholesterol level of 190.

     • Another person in the vegetarian group who ended with a cholesterol level of 225

     • A third person, assigned to the regular diet group, who ended with a cholesterol level of 280.

  1. Identify several potential reasons why these 3 subjects ended the study with different cholesterol levels.









Our model, once again, is:   \(y_{ig}=\mu +\alpha _{g}+\epsilon_{ig}=\mu+(\mu _{g}-\mu )+(y_{ig}-\mu _{g})\)


  1. Identify the components of this model and explain why this model must be true. What assumptions will we make about the error term?









In an ANOVA, we want to analyze the total variation in our data (the overall difference in cholesterol levels among all subjects in this study).

We’re then going to divide that total variation into two components:

     • Between-groups or treatment variation represents variation due to the treatments (variation in cholesterol due to diet).

     • Within-groups or error variation represents variation that is not due to the treatments (variation among subjects in same diet).


We can (roughly) visualize these sources of variation:

Drawing


  1. Suppose we calculated the variances between and within groups and calculated the ratio of those variances (variance between / variance within). If that ratio turns out to be relatively large, what does that say about our null hypothesis? What does it say if the ratio is relatively small?











Variance

Let’s derive the formulas for these variance components. First, recall the formula for the unbiased estimate of the population variance:


          \(s^{2}=\frac{\sum_{i=1}^{n}(x_{i}-\overline{X})^{2}}{n-1}=\frac{\textrm{sum of squared deviations from the mean}}{\textrm{degrees of freedom}}=\frac{SS}{df}=\textrm{mean square}\)


  1. Explain what SS and MS represent.











MStotal = Total variation

Let’s calculate the total variation in our cholesterol data.


  1. Fill-in-the-blanks in the following formula. Then, thinking about what MStotal represents, write out another formula for the numerator, SStotal. Finally, calculate MStotal for this cholesterol study and explain what it represents.


          \(\mathbf{\textrm{MS}_{\textrm{Total}}=\frac{\textrm{SS}_{\textrm{total}}}{\textrm{df}_{\textrm{total}}}=\frac{\sum_{i=1}^{n}(\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ )^{2}}{(\ \ \ \ \ \ \ \ \ \ -\ \ \ \ \ \ \ \ \ \ )}}\)














MSerror = Within-group variation

We just calculated the total variation in cholesterol to be approximately SST = 11633. We’re now going to partition that variation into two components.

The first component will represent the average variation within each group (or the variation that is not due to our treatments).


  1. Fill-in-the-blanks to derive formulas for SSE, dfE, and MSE. Explain what MSE represents. Given this explanation, write out another formula you could use to calculate SSE. Finally, calculate MSE for our cholesterol data.


          \(\mathbf{\textrm{MS}_{\textrm{Error}}=\frac{\textrm{SS}_{\textrm{error}}}{\textrm{df}_{\textrm{error}}}=\frac{\sum(\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ )^{2}}{(\ \ \ \ \ \ \ \ \ \ -\ \ \ \ \ \ \ \ \ \ )}}\)















MSA = Between-group variation

The other component will represent the variation between the groups (or the variation that is due to our treatments).


  1. Fill-in-the-blanks to derive formulas for SSA, dfA, and MSA. Explain what they represent. Then, calculate MSA for our data.


          \(\mathbf{\textrm{MS}_{\textrm{A}}=\mathbf{\textrm{MS}_{\textrm{treatment}}}=\frac{\textrm{SS}_{\textrm{A}}}{\textrm{df}_{\textrm{A}}}=\frac{\sum\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ )^{2}}{(\ \ \ \ \ \ \ \ \ \ -\ \ \ \ \ \ \ \ \ \ )}}\)















ANOVA - summary of calculations

Drawing


Source SS df MS MSR
Treatment 9720.1 2 4860 ?
Error 1912.8 12 159.4 ?
Total 11632.9 14 830.9 ?


Notice that \(\mathbf{\textrm{SS}_{\textrm{total}}}=\mathbf{\textrm{SS}_{\textrm{A}}}+\mathbf{\textrm{SS}_{\textrm{E}}}\). Let’s see if we can generalize that result.


Partioning sums of squares


  1. Explain what is happening at each step of the following derivation:


\(\mathbf{\textrm{SS}_{\textrm{total}}}=\sum \sum (x_{i}-M)^{2}=\sum \sum \left [ \left ( x_{i}- \overline{X}_{a}\right )-\left ( \overline{X}_{a}-M \right ) \right ]^{2}=\sum \sum \left [ \left ( \overline{X}_{a}- M\right )-\left ( x_{i}-\overline{X}_{a} \right ) \right ]^{2}=\)


\(\mathbf{\textrm{SS}_{\textrm{total}}}=\sum \sum\left ( \overline{X}_{a}- M\right )^{2}+\sum \sum\left ( x_{i}-\overline{X}_{a} \right )^{2}+2\sum \sum\left ( \overline{X}_{a}- M\right )\left ( x_{i}-\overline{X}_{a} \right )\)


Since \(\sum \left ( x_{i}-\overline{X}_{a} \right )=0\)


\(\mathbf{\textrm{SS}_{\textrm{total}}}=\sum n_{a}\left ( \overline{X}_{a}- M\right )^{2}+\sum \sum\left ( x_{i}-\overline{X}_{a} \right )^{2}=\)


\(\mathbf{\textrm{SS}_{\textrm{total}}}=\mathbf{\textrm{SS}_{\textrm{A}}}+\mathbf{\textrm{SS}_{\textrm{E}}}\)



One more look at MSE

Since mean square is another term for variance, MSE must represent error variance – the variance within a distribution (or the variance of a single group).

How can MSE represent the variance of a single group if we have 3 (or more) groups in our study? In other words, if we have multiple groups, how do we know which single group to “choose” when we calculate MSE?

Remember, because we’re going to ultimately take the ratio of two variances and compare it to an F-distribution, we’re going to assume equal population variances. This means we don’t have to choose a single group. Instead, we can calculate a pooled variance of all our groups. In this sense, MSE represents the average variance of each of our groups.


The concept of an average, or pooled, variance should sound familiar. We use Spooled when we conduct independent samples t-tests (with an equal variance assumption).

If we take the formula for Spooled and extend it to a scenario with 3 groups, we would have:


\(s_{\textrm{pooled}}^{2}=\frac{\left ( n_{1}-1 \right )s_{1}^{2}+\left ( n_{2}-1 \right )s_{2}^{2}+\left ( n_{3}-1 \right )s_{3}^{2}}{\left ( n_{1}-1 \right )+\left ( n_{2}-1 \right )+\left ( n_{3}-1 \right )}\)


\(s_{\textrm{pooled}}^{2}=\frac{\left ( n_{1}-1 \right )\frac{\sum\left ( x_{i1}-\overline{X}_{1} \right )^{2}}{n_{1}-1} + \left ( n_{2}-1 \right )\frac{\sum\left ( x_{i2}-\overline{X}_{2} \right )^{2}}{n_{2}-1} + \left ( n_{3}-1 \right )\frac{\sum\left ( x_{i3}-\overline{X}_{3} \right )^{2}}{n_{3}-1}}{\left ( n_{1}-1 \right )+\left ( n_{2}-1 \right )+\left ( n_{3}-1 \right )}\)


\(s_{\textrm{pooled}}^{2}=\frac{\sum \left ( x_{i1}-\overline{X}_{1} \right )^{2} + \sum \left ( x_{i2}-\overline{X}_{2} \right )^{2} + \sum \left ( x_{i3}-\overline{X}_{3} \right )^{2}}{\left ( n_{1}-1 \right )+\left ( n_{2}-1 \right )+\left ( n_{3}-1 \right )}\)


\(s_{\textrm{pooled}}^{2}=\frac{\sum \left ( x_{ia}-\overline{X}_{a} \right )^{2}}{N-a}=\textrm{MS}_{\textrm{E}}\)


  1. If the null hypothesis were true, what would the expected value of MSE be? What’s the expected value if the null hypothesis were false?









  1. If the null hypothesis were true, what would the expected value of MSA be? What’s the expected value if the null hypothesis were false?









This is the logic behind ANOVA. If the null hypothesis is true, both MSA and MSE provide unbiased estimates of the variance within a group. If the null hypothesis is false, MSA becomes larger.

All we need to do is compare these two variances to determine if MSA is significantly larger than MSE.

  1. How do we compare MSA to MSE? What sampling distribution do we use? How many degrees of freedom will our test statistic have? What’s the expected value of our test statistic if the null hypothesis is true?












Now that we know how to compare MSA to MSE, complete the following ANOVA summary table and estimate the p-value. To estimate the p-value, you may want to use this F-distribution applet


Source SS df MS MSR
Treatment 9720.1 2 4860 MSR =
Error 1912.8 12 159.4 p =
Total 11632.9 14 830.9 eta-squared =


You may want to take a look at this F-distribution table to get a sense of what values of the MSR will yield low p-values.


  1. What does that p-value tell us? Does it inform us about the magnitude of the differences among the group means?









Effect sizes: eta- and omega-squared

Since we’re partitioning the total variation (SStotal), we can define an effect size (eta-squared) as:   \(\eta ^{2}=\frac{\textrm{SS}_\textrm{A}}{\textrm{SS}_\textrm{T}}=\frac{9720.1}{11632.9}=0.8356\)


  1. Interpret the value of eta-squared.









One problem with eta-squared is the fact that it’s based entirely on sums of squares from the sample and no adjustment is made for the fact that we’re trying to estimate the effect size of the population. As we’ll learn later in the course, our models typically overfit our sample data.

As an adjusted effect size, we can calculate omega-squared:   \(\omega ^{2}=\frac{\textrm{SS}_\textrm{A}-\left (\textrm{df}_{\textrm{A}} \right )\left ( \textrm{MS}_\textrm{E} \right )}{\textrm{SS}_\textrm{T}+\textrm{MS}_\textrm{E}}\)


In this example, we would calculate:   \(\omega ^{2}=\frac{\textrm{SS}_\textrm{A}-\left (\textrm{df}_{\textrm{A}} \right )\left ( \textrm{MS}_\textrm{E} \right )}{\textrm{SS}_\textrm{T}+\textrm{MS}_\textrm{E}}=\frac{9720.1-\left ( 2 \right )\left ( 159.4 \right )}{11632.9+159.4}=0.7972\)

  1. Based on the p-value and the effect size, what conclusions can we make from this study? Can we conclude individuals on vegetarian diets have lower cholesterol levels than those on regular diets?











ANOVA in R

Now that we have a conceptual understanding of ANOVA, let’s see how to implement it in R. We can actually do this in several ways (as we’ll see later in the course). For now, we’ll use the aov() and anova() commands.

# Our dataset is dietstudy
# Our model is cholesterol ~ diet
# We use the aov() command to run the analysis of variance
# We'll store the results in something called "model"
model <- aov(cholesterol ~ diet, data=dietstudy)

# We can now use the anova() command to summarize this model
# We could also use summary(model)
anova(model)
## Analysis of Variance Table
## 
## Response: cholesterol
##           Df Sum Sq Mean Sq F value     Pr(>F)    
## diet       2 9720.1  4860.1   30.49 0.00001976 ***
## Residuals 12 1912.8   159.4                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



Back to our ambiguous prose study: ANOVA

Let’s run an ANOVA on our ambiguous prose study. Remember, we want to see if the mean comprehension rating given by the before, after, and none groups differ.

# aov() stored as ambiguousmodel
ambiguousmodel <- aov(Comprehension ~ Condition, data=ambiguous)

# anova() to summarize the model
anova(ambiguousmodel)
## Analysis of Variance Table
## 
## Response: Comprehension
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Condition  2 35.053 17.5263  10.012 0.0002002 ***
## Residuals 54 94.526  1.7505                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Calculate and interpret an effect size for this study.











ANOVA assumptions

  1. List the assumptions (or conditions) necessary to get valid results from an ANOVA. Identify why we need each condition.












To assess the homogeneity of variances assumption, we can informally run an Fmax test:


           \(F_{\textrm{max}}=\frac{s_{\textrm{largest}}^{2}}{s_{\textrm{smallest}}^{2}}<4\textrm{ means the equal variances assumption is reasonable}\)


We could also run Levene’s Test or Bartlett’s test to assess the equality of variances across two or more groups.

I won’t go into the details of these tests. I’ll only mention that Levene’s test is typically preferred because Bartlett’s test requires a normality assumption.

# Levene's test
leveneTest(Comprehension ~ Condition, data=ambiguous)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.0243  0.976
##       54
# Bartlett's test
bartlett.test(Comprehension~Condition, data=ambiguous)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Comprehension by Condition
## Bartlett's K-squared = 0.20255, df = 2, p-value = 0.9037
  1. Based on the p-values from Levene’s test and Bartlett’s test, what do you conclude about the homogeneity of variance assumption?









To assess the normality assumption, we can informally look at visualizations of our sample data (displayed at the beginning of this lesson). We could also examine Q-Q plots:

# Example of a Q-Q plot for the "Before" group in our study
qqnorm(ambiguous$Comprehension[ambiguous$Condition=="Before"], ylab = "Comprehension"); qqline(ambiguous$Comprehension[ambiguous$Condition=="Before"], lwd=3)


We could also run a test for normality, such as the Shapiro-Wilk test:

# Shapiro-Wilk test
shapiro.test(ambiguous$Comprehension[ambiguous$Condition=="Before"])
## 
##  Shapiro-Wilk normality test
## 
## data:  ambiguous$Comprehension[ambiguous$Condition == "Before"]
## W = 0.90423, p-value = 0.05805
  1. Based on the Q-Q plot and the Shapiro-Wilk test, does normality seem like a reasonable assumption (for the “before” group, at least)?










One question you may have is: What can we do if the assumptions don’t look reasonable?

We’ll deal with this issue throughout the semester. For now, I’ll list out a few options:

    • Transform your data (by taking, for example, the logarithm of each value in an attempt to force it to look more like a normal distribution)

    • Use a nonparametric tests (like the randomization-based method we saw earlier or another technique like the Kruskal-Wallis test)

    • Use a robust ANOVA procedure (we’ll see a couple examples of these methods later in the course)


If you remember from the first lesson, if we wanted to conduct a t-test without making the equal variances assumption, we could use the Welch-Satterthwaite approximation. It modified the degrees of freedom of our test statistic.

We can do something similar with ANOVA. If we want to conduct an ANOVA without the equal variances assumption, we can use Welch’s F

# Welch's F (ANOVA without equal variances assumption)
oneway.test(Comprehension ~ Condition, data=ambiguous, var.equal=FALSE)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Comprehension and Condition
## F = 9.8552, num df = 2.000, denom df = 35.933, p-value = 0.0003871


From that output, you can see the degrees of freedom were modified. You can also see that in this example, the p-value was similar to the one obtained with the equal variances assumption.



Idea #4: Randomized mean-square ratios (F-statistics)

Each idea we’ve had to compare our group means has had drawbacks:

    • Multiple t-tests inflate our Type I error rate

    • Randomized MAD (or SAD) methods do not account for within-group variation

    • ANOVA requires normality and equal variance assumptions (or requires an understanding of robust alternatives)


Could we combine the relatively-assumption-free randomization methods with the within-groups-variance-accounting of ANOVA? Sure!


Let’s have R randomize our data 10,000 times (by randomly shuffling treatment groups among the subjects in the study). For each randomization, we’ll have R calculate the F-statistic (ratio of MSA/MSE). We can then estimate the likelihood of our observed F-statistic by comparing it to the distribution of F-statistics we’ll get from the 10,000 randomizations.

# Store our observed F=10.012 as "Fobserved"
# Don't worry about the syntax here, I could have used Fobserved <- 10.012
Fobserved = summary(lm(Comprehension~Condition, data=ambiguous))$fstatistic[1]

# Run 10,000 randomizations and store in "ANOVAs"
# Note I'm using "lm" instead of "aov" to run the ANOVAs
# It's virtually the same thing, so don't worry about it for now
ANOVAs <- do(10000) * lm(cholesterol ~ shuffle(diet), data=dietstudy)

# Histogram of randomized F-statistics
# Histogram of the randomized sampling distribution
histogram(~F, data=ANOVAs,
          xlab="Possible mean differences assuming null hypothesis is true", 
          groups=F >= Fobserved,   # Highlight values > test statistic
          main=paste("p-value = ", prop(~F >= Fobserved, data=ANOVAs)))
ladd(panel.abline(v=Fobserved))   # Add vertical line at test statistic

# Calculate p-value
prop(~F >= Fobserved, data=ANOVAs)
##   TRUE 
## 0.0043



ANOVA vs t-test

Let’s (finally) finish this lesson by looking at the relationship between the F-statistic in an ANOVA and our t-statistic in a t-test:


\(t_{\textrm{n}_1+\textrm{n}_2-2}=\frac{\overline{X}_{1}-\overline{X}_{2}}{\sqrt{\frac{1}{\textrm{n}_1}+\frac{1}{\textrm{n}_2}}\sqrt{\frac{\left ( \textrm{n}_1-1 \right )s_{1}^{2}+\left ( \textrm{n}_2-1 \right )s_{2}^{2}}{\textrm{n}_1+\textrm{n}_2-2}}}=\frac{\overline{X}_{1}-\overline{X}_{2}}{\sqrt{\frac{1}{\textrm{n}_1}+\frac{1}{\textrm{n}_2}}\sqrt{s_{pooled}^{2}}}\)


\(t_{{\textrm{n}_1+\textrm{n}_2-2}}^{2}=\frac{\left (\overline{X}_{1}-\overline{X}_{2} \right )^{2}}{\left ( \frac{1}{n_{1}}+ \frac{1}{n_{2}}\right )}s_{pooled}^{2}=\frac{\left ( n_{1}+n_{2} \right )\left (\overline{X}_{1}-\overline{X}_{2} \right )^{2}}{s_{pooled}^{2}}=\frac{n_{1}\left (\overline{X}_{1}-\overline{X}_{2} \right )^{2}+n_{2}\left (\overline{X}_{1}-\overline{X}_{2} \right )^{2}}{{s_{pooled}^{2}}}\)


\(t_{{\textrm{n}_1+\textrm{n}_2-2}}^{2}=\frac{\sum n_{a}\left (\overline{X}_{a}-M \right )^{2}/1}{\textrm{MS}_{\textrm{E}}}=\frac{\textrm{SS}_{\textrm{A}}/\textrm{df}_{\textrm{A}}}{\textrm{MS}_{\textrm{E}}}=\frac{\textrm{MS}_{\textrm{A}}}{\textrm{MS}_{\textrm{E}}}=F\)


  1. Explain the consequences of what was just derived. Also, verify this result with the following output.
# Recreate the spider dataset from lesson #1
group <- c(rep("photo", 12), rep("real", 12))
anxiety<-c(30, 35, 45, 40, 50, 35, 55, 25, 30, 45, 40, 50, 
           40, 35, 50, 55, 65, 55, 50, 35, 30, 50, 60, 39)
spider <- data.frame(group, anxiety)

# Run a t-test
t.test(anxiety ~ group, data=spider, alternative = c("less"), var.equal = TRUE, conf.level = 0.95)
## 
##  Two Sample t-test
## 
## data:  anxiety by group
## t = -1.6813, df = 22, p-value = 0.05342
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf 0.1490421
## sample estimates:
## mean in group photo  mean in group real 
##                  40                  47
# Run an ANOVA on the same data
anova(aov(anxiety ~ group, data=spider))
## Analysis of Variance Table
## 
## Response: anxiety
##           Df Sum Sq Mean Sq F value Pr(>F)
## group      1    294     294  2.8269 0.1068
## Residuals 22   2288     104














Your turn

  1. If you want to lose weight, are some diets more effective than others? To investigate this, researchers randomly assigned 93 subjects to one of four diets:

      (a) Atkins, a low-carb diet,   (b) Zone, a 4-4-3 ratio of carbs-protein-fat,
      (c) Ornish, a low-fat diet, or (d) Weight Watchers.

    The 93 subjects were educated on their assigned diet and observed periodically as they stayed on the diet for one year. At the end of the year, the researchers calculated the changes in weight for each subject.

    (a) Suppose we wanted to compare all possible pairs of group means using t-tests with alpha=0.05 for each test. What would the probability of making at least one Type I error across all these tests?

    (b) Check the conditions necessary to conduct an ANOVA. You may want to use Levene’s test to check for equal variances. You can check for normality graphically or with Shapiro-Wilks’ test.

    (c) Conduct an ANOVA on this data (with or without the equal variance assumption) and write out any conclusions you can draw.

    (d) Calculate eta-squared and omega-squared.

    (e) Calculate the MAD (or SAD) for this dataset and then conduct a randomization-based test to estimate a p-value for this study.

    (f) Run the randomization-based test with the F-statistic to estimate a p-value.

    The data have been loaded into a data.frame called diet.
    The variables are Diet (a factor variable identifying which diet each subject is assigned to) and Weightloss (the number of pounds lost by each subject by the end of the study).

    A summary of the data is displayed below. Notice the variables begin with Capital letters and that negative values of the Weightloss variable indicate subjects who gained weight.

Drawing




Publishing your solutions

When you’re ready to try the Your Turn exercises: