---
title: "Your turn - Lesson 3"
author: "Comparing 2+ means"
output:
html_document:
css: http://www.bradthiessen.com/batlab2.css
highlight: pygments
theme: spacelab
fig_width: 5.6
fig_height: 4
---
*****
**Author(s):** [Enter names of people working on these solutions]
*****
```{r message=FALSE, echo=FALSE}
# Above, type your name in the "Author(s)" section
# Load the mosaic package
library(mosaic)
# Load or install the ggplot2 package
require(ggplot2)
# Your lab report begins below
```
## Your turn
37. If you want to lose weight, are some diets more effective than others? To investigate this, researchers randomly assigned 93 subjects to one of four diets:
(a) **Atkins**, a low-carb diet, (b) **Zone**, a 4-4-3 ratio of carbs-protein-fat,
(c) **Ornish**, a low-fat diet, or (d) **Weight Watchers**.
The 93 subjects were educated on their assigned diet and observed periodically as they stayed on the diet for one year. At the end of the year, the researchers calculated the changes in weight for each subject.
(a) Suppose we wanted to compare all possible pairs of group means using t-tests with alpha=0.05 for each test. What would the probability of making at least one Type I error across all these tests?
(b) Check the conditions necessary to conduct an ANOVA. You may want to use Levene's test to check for equal variances. You can check for normality graphically or with Shapiro-Wilks' test.
(c) Conduct an ANOVA on this data (with or without the equal variance assumption) and write out any conclusions you can draw.
(d) Calculate eta-squared and omega-squared.
(e) Calculate the MAD (or SAD) for this dataset and then conduct a randomization-based test to estimate a p-value for this study.
(f) Run the randomization-based test with the F-statistic to estimate a p-value.
The data have been loaded into a data.frame called **diet**.
The variables are **Diet** (a factor variable identifying which diet each subject is assigned to) and **Weightloss** (the number of pounds lost by each subject by the end of the study).
A summary of the data is displayed below. Notice the variables begin with **Capital** letters and that negative values of the **Weightloss** variable indicate subjects who *gained* weight.
```{r}
# This section will load the data and display some visualizations
# The data are loaded into a data.frame called "diet"
diet <- read.csv("http://www.bradthiessen.com/html5/data/diet.csv")
## Notice the capital L in the WeightLoss variable. If you forget that upper-case L, you'll get an error message.
# Density plot to see the distribution of weightloss values within each group
densityplot(~WeightLoss|Diet, data=diet, lwd=3,
main="Weight loss by diet", xlab="Pounds lost", layout=c(2,2))
# Error bars showing mean and standard errors
ggplot(diet, aes(Diet, WeightLoss)) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line", aes(group = 1),
colour = "Blue", linetype = "dashed") +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) +
labs(x = "Diet", y = "mean Weightloss")
```
**(a) Suppose we wanted to compare all possible pairs of group means using t-tests with alpha=0.05 for each test. What would the probability of making at least one Type I error across all these tests?**
(Type your answer here)
**(b) Check the conditions necessary to conduct an ANOVA. You may want to use Levene’s test to check for equal variances. You can check for normality graphically or with Shapiro-Wilks’ test.**
```{r}
# Levene's test.
# Replace the XXXXX with variable names and the YYYYY with the data.frame
leveneTest(XXXXX ~ XXXXX, data=YYYYY)
# Here's the code to run Shapiro-Wilks test for all 4 diets.
# The last line will print the p-values for each group
ww <- round(shapiro.test(diet$WeightLoss[diet$Diet=="WtWtchrs"])$p.value,3)
z <- round(shapiro.test(diet$WeightLoss[diet$Diet=="Zone"])$p.value,3)
a <- round(shapiro.test(diet$WeightLoss[diet$Diet=="Atkins"])$p.value,3)
o <- round(shapiro.test(diet$WeightLoss[diet$Diet=="Ornish"])$p.value,3)
paste("WtWtchrs: p =", ww, ". Zone: p =", z, ". Atkins: p =", a, ". Ornish: p =", o)
```
(Here's where you can type your conclusions regarding the conditions needed to run an ANOVA. Are the assumptions reasonable to make?)
**(c) Conduct an ANOVA on this data (with or without the equal variance assumption) and write out any conclusions you can draw. ***
```{r}
# ANOVA model
model <- aov(XXXXX ~ XXXXX, data=YYYYY)
# Summarize the model
anova(XXXXX)
# No equal variance assumption
oneway.test(XXXXX ~ XXXXX, data=YYYYY, var.equal=FALSE)
```
(Write your conclusion from the ANOVA here)
**(d) Calculate eta-squared and omega-squared.***
(You can type your answers here)
**(e) Calculate the MAD (or SAD) for this dataset and then conduct a randomization-based test to estimate a p-value for this study.**
```{r, fig.keep="last"}
# Try to follow along with the code we saw in class
# Calculate SAD or MAD and store it in a variable
# The syntax will look like this:
# test.statistic <- MAD(mean(Y ~ X, data=Z))
# Now run 10,000 randomizations and store the result in a data.frame
# The syntax will look like this:
# df <- do(10000) * MAD(mean(Y ~ shuffle(X), data=Z))
# Plot the distribution of MAD values from the randomized distribution
# Replace XXX with either MAD or SAD (depending on which one you chose)
# Replace YYYYY with the name of the dataframe you created in the previous line of code
# Replace ZZZZZ with a label for your x-axis
# Replace QQQQQ with the name of your test.statistic
histogram(~XXX, data=YYYYY,
xlab="ZZZZZ",
groups=XXX >= QQQQQ)
# Find p-value
# Replace XXX with either MAD or SAD (depending on which one you chose)
# Replace YYYYY with the name of the dataframe you created in the previous line of code
# Replace QQQQQ with the name of your test.statistic
prop(~XXX >= QQQQQ, data=YYYYY)
```
**(f) Run the randomization-based test with the F-statistic to estimate a p-value.**
```{r}
# The ANOVA F-statistic is 0.5361475. I'll store it as "Fobserved"
# Don't change the next line
Fobserved = summary(lm(WeightLoss~Diet, data=diet))$fstatistic[1]
# Run 10,000 randomizations and store in "ANOVAs"
# Note I'm using "lm" instead of "aov" to run the ANOVAs
# Its virtually the same thing, so don't worry about it for now
ANOVAs <- do(10000) * lm(WeightLoss ~ shuffle(Diet), data=diet)
# Histogram of randomized F-statistics
# Histogram of the randomized sampling distribution
histogram(~F, data=ANOVAs,
xlab="Possible mean differences assuming null hypothesis is true",
groups=F >= Fobserved, # Highlight values > test statistic
main=paste("p-value = ", prop(~F >= Fobserved, data=ANOVAs)))
ladd(panel.abline(v=Fobserved)) # Add vertical line at test statistic
```