---
title: "Your turn - Lesson 2"
author: "Comparing 2 variances"
output:
html_document:
css: http://www.bradthiessen.com/batlab2.css
highlight: pygments
theme: spacelab
fig_width: 5.6
fig_height: 4
---
*****
**Author(s):** [Enter names of people working on these solutions]
*****
```{r message=FALSE, echo=FALSE}
# Above, type your name in the "Author(s)" section
# Load the mosaic package
library(mosaic)
# Load the custom function to compare variances
# Function to calculate the ratio of variances between two groups
varianceratio <- function (x, ..., data = parent.frame(), only.2 = TRUE)
{
v <- var(x, ..., data = data)
res <- v/lag(v) # This calculates the 2nd variance / 1st variance
res[2]
}
# Your lab report begins below
```
## Your turn
20. In 2013, 219 **freshmen** at St. Ambrose responded to a survey asking **how many hours they study per week**. In 2014, those same 219 students (as **sophomores**) responded to the same question. Load this `studyhours` dataset and examine the first several rows of data with the `head(studyhours)` command.
```{r}
# The data are stored on my website as a .csv (comma separated values) file
# We can load .csv files from the web with the "read.csv()" command
# I will store the data in a data.frame named "studyhours"
studyhours <- read.csv("http://www.bradthiessen.com/html5/data/studyhours.csv")
# Use the head() command to examine the first several rows of data
# Notice the variables in this dataset are: hours and class
```
21. Now that you have the `studyhours` dataset loaded, create a `densityplot` to examine the distribution of `hours` for each class (`Freshmen` and `Sophomores`). Then, calculate the variance in hours for each class. Finally, use the `varianceratio()` function to calculate the ratio of variances (and store it as `test.stat`).
```{r}
# Create a visualization of the data to check for normality
# Replace the XXXXX values below with the appropriate variables
# Replace YYYYY to label the x-axis
densityplot(~XXXXX | XXXXX, data=studyhours, lw=4, col="steelblue", layout=c(1,2),
cex=0.7, xlab="YYYYY")
# Calculate the variance of each class
# Replace XXXXX with appropriate variable names
# Replace YYYYY with the name of the data.frame
var(XXXXX ~ XXXXX, data=YYYYY)
# Calculate the ratio of variances and store it as test.stat
test.stat <- varianceratio(XXXXX ~ XXXXX, data=YYYYY)
```
22. Suppose we believe that freshmen take similar classes and, therefore, tend to study the same amount. Sophomores, on the other hand, begin to take courses in their chosen majors and, therefore, have a larger variance in the amount of time they study each week. Run a randomization-based test to test the hypothesis that class (freshmen vs. sophomore) has no impact on hours studying per week. Display the distribution of your randomized variance ratio and report a p-value.
```{r}
# Calculate the variance ratio in 10,000 randomizations (shuffling the class assignment)
# Replace all the XXXXX and YYYYY values
ratios <- do(XXXXX) * varianceratio(XXXXX ~ shuffle(XXXXX), data=YYYYY)
# Create a histogram (or density plot) of the randomized values of the test.stat
# Replace the XXXXX with a label for your x-axis
histogram(~ ratios, data=ratios, groups=ratios >= test.stat,
xlim=c(0,2.5), width=.05,
xlab="XXXXX")
# Calculate a p-value
prop(~ratios >= test.stat, data=ratios)
```
23. Use bootstrap methods to construct a 95% confidence interval for the ratio of variances. Based on the p-value you just reported, you should be able to predict whether the interval will contain the value 1.0.
```{r}
# Generate 10,000 bootstrap samples and calculate our variance ratio for each
# Replace all the XXXXX and YYYYY values
bootratios <- do(XXXXX) * varianceratio(XXXXX ~ XXXXX, data=resample(YYYYY))
# Create a plot of the bootstrap estimates of our test statistic
densityplot(~ratios, data=bootratios, plot.points = FALSE, col="steelblue", lwd=4)
# Get the 95% confidence interval
# Replace all the XXXXX values
confint(XXXXX, level = XXXXX, method = "quantile")
```
24. The observed variance ratio in this example is approximately 1.397. It comes from taking the ratio of variances from two groups that each have a sample size of n=219. If the population distributions of each group have equal variances -- and if our data come from populations with normal distributions -- our observed variance ratio comes from an F-distribution. Sketch this F-distribution (with the correct degrees of freedom in the numerator and denominator).
```{r}
# Plot F distribution with the appropriate degrees of freedom
plotDist("f", df1=XXXXX, df2=XXXXX, lw=5, col="steelblue")
```
25. Use `var.test()` to conduct an F-test to test if the population variances are equal.
```{r}
# Run the F-test to compare two variances
var.test(XXXXX ~ XXXXX, data=YYYYY)
```
26. Considering everything you just did, what's your conclusion regarding the variance in hours studying for freshmen and sophomores?
**TYPE YOUR ANSWER HERE**
26. Complete [assignment 2](http://bradthiessen.com/html5/stats/m301/assign2.pdf)