--- title: "Your turn - Lesson 3" author: "Comparing 2+ means" output: html_document: css: http://www.bradthiessen.com/batlab2.css highlight: pygments theme: spacelab fig_width: 5.6 fig_height: 4 --- ***** **Author(s):** [Enter names of people working on these solutions] ***** ```{r message=FALSE, echo=FALSE} # Above, type your name in the "Author(s)" section # Load the mosaic package library(mosaic) # Load or install the ggplot2 package require(ggplot2) # Your lab report begins below ```
## Your turn 37. If you want to lose weight, are some diets more effective than others? To investigate this, researchers randomly assigned 93 subjects to one of four diets:

  (a) **Atkins**, a low-carb diet,   (b) **Zone**, a 4-4-3 ratio of carbs-protein-fat,
  (c) **Ornish**, a low-fat diet, or (d) **Weight Watchers**.

The 93 subjects were educated on their assigned diet and observed periodically as they stayed on the diet for one year. At the end of the year, the researchers calculated the changes in weight for each subject.

(a) Suppose we wanted to compare all possible pairs of group means using t-tests with alpha=0.05 for each test. What would the probability of making at least one Type I error across all these tests?

(b) Check the conditions necessary to conduct an ANOVA. You may want to use Levene's test to check for equal variances. You can check for normality graphically or with Shapiro-Wilks' test.

(c) Conduct an ANOVA on this data (with or without the equal variance assumption) and write out any conclusions you can draw.

(d) Calculate eta-squared and omega-squared.

(e) Calculate the MAD (or SAD) for this dataset and then conduct a randomization-based test to estimate a p-value for this study.

(f) Run the randomization-based test with the F-statistic to estimate a p-value.

The data have been loaded into a data.frame called **diet**.
The variables are **Diet** (a factor variable identifying which diet each subject is assigned to) and **Weightloss** (the number of pounds lost by each subject by the end of the study).

A summary of the data is displayed below. Notice the variables begin with **Capital** letters and that negative values of the **Weightloss** variable indicate subjects who *gained* weight. Drawing
```{r} # This section will load the data and display some visualizations # The data are loaded into a data.frame called "diet" diet <- read.csv("http://www.bradthiessen.com/html5/data/diet.csv") ## Notice the capital L in the WeightLoss variable. If you forget that upper-case L, you'll get an error message. # Density plot to see the distribution of weightloss values within each group densityplot(~WeightLoss|Diet, data=diet, lwd=3, main="Weight loss by diet", xlab="Pounds lost", layout=c(2,2)) # Error bars showing mean and standard errors ggplot(diet, aes(Diet, WeightLoss)) + stat_summary(fun.y = mean, geom = "point") + stat_summary(fun.y = mean, geom = "line", aes(group = 1), colour = "Blue", linetype = "dashed") + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + labs(x = "Diet", y = "mean Weightloss") ``` **(a) Suppose we wanted to compare all possible pairs of group means using t-tests with alpha=0.05 for each test. What would the probability of making at least one Type I error across all these tests?** (Type your answer here)

**(b) Check the conditions necessary to conduct an ANOVA. You may want to use Levene’s test to check for equal variances. You can check for normality graphically or with Shapiro-Wilks’ test.** ```{r} # Levene's test. # Replace the XXXXX with variable names and the YYYYY with the data.frame leveneTest(XXXXX ~ XXXXX, data=YYYYY) # Here's the code to run Shapiro-Wilks test for all 4 diets. # The last line will print the p-values for each group ww <- round(shapiro.test(diet$WeightLoss[diet$Diet=="WtWtchrs"])$p.value,3) z <- round(shapiro.test(diet$WeightLoss[diet$Diet=="Zone"])$p.value,3) a <- round(shapiro.test(diet$WeightLoss[diet$Diet=="Atkins"])$p.value,3) o <- round(shapiro.test(diet$WeightLoss[diet$Diet=="Ornish"])$p.value,3) paste("WtWtchrs: p =", ww, ". Zone: p =", z, ". Atkins: p =", a, ". Ornish: p =", o) ``` (Here's where you can type your conclusions regarding the conditions needed to run an ANOVA. Are the assumptions reasonable to make?)

**(c) Conduct an ANOVA on this data (with or without the equal variance assumption) and write out any conclusions you can draw. *** ```{r} # ANOVA model model <- aov(XXXXX ~ XXXXX, data=YYYYY) # Summarize the model anova(XXXXX) # No equal variance assumption oneway.test(XXXXX ~ XXXXX, data=YYYYY, var.equal=FALSE) ``` (Write your conclusion from the ANOVA here)

**(d) Calculate eta-squared and omega-squared.*** (You can type your answers here)

**(e) Calculate the MAD (or SAD) for this dataset and then conduct a randomization-based test to estimate a p-value for this study.** ```{r, fig.keep="last"} # Try to follow along with the code we saw in class # Calculate SAD or MAD and store it in a variable # The syntax will look like this: # test.statistic <- MAD(mean(Y ~ X, data=Z)) # Now run 10,000 randomizations and store the result in a data.frame # The syntax will look like this: # df <- do(10000) * MAD(mean(Y ~ shuffle(X), data=Z)) # Plot the distribution of MAD values from the randomized distribution # Replace XXX with either MAD or SAD (depending on which one you chose) # Replace YYYYY with the name of the dataframe you created in the previous line of code # Replace ZZZZZ with a label for your x-axis # Replace QQQQQ with the name of your test.statistic histogram(~XXX, data=YYYYY, xlab="ZZZZZ", groups=XXX >= QQQQQ) # Find p-value # Replace XXX with either MAD or SAD (depending on which one you chose) # Replace YYYYY with the name of the dataframe you created in the previous line of code # Replace QQQQQ with the name of your test.statistic prop(~XXX >= QQQQQ, data=YYYYY) ```

**(f) Run the randomization-based test with the F-statistic to estimate a p-value.** ```{r} # The ANOVA F-statistic is 0.5361475. I'll store it as "Fobserved" # Don't change the next line Fobserved = summary(lm(WeightLoss~Diet, data=diet))$fstatistic[1] # Run 10,000 randomizations and store in "ANOVAs" # Note I'm using "lm" instead of "aov" to run the ANOVAs # Its virtually the same thing, so don't worry about it for now ANOVAs <- do(10000) * lm(WeightLoss ~ shuffle(Diet), data=diet) # Histogram of randomized F-statistics # Histogram of the randomized sampling distribution histogram(~F, data=ANOVAs, xlab="Possible mean differences assuming null hypothesis is true", groups=F >= Fobserved, # Highlight values > test statistic main=paste("p-value = ", prop(~F >= Fobserved, data=ANOVAs))) ladd(panel.abline(v=Fobserved)) # Add vertical line at test statistic ```