--- title: "Your turn - Lesson 2" author: "Comparing 2 variances" output: html_document: css: http://www.bradthiessen.com/batlab2.css highlight: pygments theme: spacelab fig_width: 5.6 fig_height: 4 --- ***** **Author(s):** [Enter names of people working on these solutions] ***** ```{r message=FALSE, echo=FALSE} # Above, type your name in the "Author(s)" section # Load the mosaic package library(mosaic) # Load the custom function to compare variances # Function to calculate the ratio of variances between two groups varianceratio <- function (x, ..., data = parent.frame(), only.2 = TRUE) { v <- var(x, ..., data = data) res <- v/lag(v) # This calculates the 2nd variance / 1st variance res[2] } # Your lab report begins below ```
## Your turn 20. In 2013, 219 **freshmen** at St. Ambrose responded to a survey asking **how many hours they study per week**. In 2014, those same 219 students (as **sophomores**) responded to the same question. Load this `studyhours` dataset and examine the first several rows of data with the `head(studyhours)` command. ```{r} # The data are stored on my website as a .csv (comma separated values) file # We can load .csv files from the web with the "read.csv()" command # I will store the data in a data.frame named "studyhours" studyhours <- read.csv("http://www.bradthiessen.com/html5/data/studyhours.csv") # Use the head() command to examine the first several rows of data # Notice the variables in this dataset are: hours and class ```
21. Now that you have the `studyhours` dataset loaded, create a `densityplot` to examine the distribution of `hours` for each class (`Freshmen` and `Sophomores`). Then, calculate the variance in hours for each class. Finally, use the `varianceratio()` function to calculate the ratio of variances (and store it as `test.stat`). ```{r} # Create a visualization of the data to check for normality # Replace the XXXXX values below with the appropriate variables # Replace YYYYY to label the x-axis densityplot(~XXXXX | XXXXX, data=studyhours, lw=4, col="steelblue", layout=c(1,2), cex=0.7, xlab="YYYYY") # Calculate the variance of each class # Replace XXXXX with appropriate variable names # Replace YYYYY with the name of the data.frame var(XXXXX ~ XXXXX, data=YYYYY) # Calculate the ratio of variances and store it as test.stat test.stat <- varianceratio(XXXXX ~ XXXXX, data=YYYYY) ```
22. Suppose we believe that freshmen take similar classes and, therefore, tend to study the same amount. Sophomores, on the other hand, begin to take courses in their chosen majors and, therefore, have a larger variance in the amount of time they study each week. Run a randomization-based test to test the hypothesis that class (freshmen vs. sophomore) has no impact on hours studying per week. Display the distribution of your randomized variance ratio and report a p-value. ```{r} # Calculate the variance ratio in 10,000 randomizations (shuffling the class assignment) # Replace all the XXXXX and YYYYY values ratios <- do(XXXXX) * varianceratio(XXXXX ~ shuffle(XXXXX), data=YYYYY) # Create a histogram (or density plot) of the randomized values of the test.stat # Replace the XXXXX with a label for your x-axis histogram(~ ratios, data=ratios, groups=ratios >= test.stat, xlim=c(0,2.5), width=.05, xlab="XXXXX") # Calculate a p-value prop(~ratios >= test.stat, data=ratios) ```
23. Use bootstrap methods to construct a 95% confidence interval for the ratio of variances. Based on the p-value you just reported, you should be able to predict whether the interval will contain the value 1.0. ```{r} # Generate 10,000 bootstrap samples and calculate our variance ratio for each # Replace all the XXXXX and YYYYY values bootratios <- do(XXXXX) * varianceratio(XXXXX ~ XXXXX, data=resample(YYYYY)) # Create a plot of the bootstrap estimates of our test statistic densityplot(~ratios, data=bootratios, plot.points = FALSE, col="steelblue", lwd=4) # Get the 95% confidence interval # Replace all the XXXXX values confint(XXXXX, level = XXXXX, method = "quantile") ```
24. The observed variance ratio in this example is approximately 1.397. It comes from taking the ratio of variances from two groups that each have a sample size of n=219. If the population distributions of each group have equal variances -- and if our data come from populations with normal distributions -- our observed variance ratio comes from an F-distribution. Sketch this F-distribution (with the correct degrees of freedom in the numerator and denominator). ```{r} # Plot F distribution with the appropriate degrees of freedom plotDist("f", df1=XXXXX, df2=XXXXX, lw=5, col="steelblue") ```
25. Use `var.test()` to conduct an F-test to test if the population variances are equal. ```{r} # Run the F-test to compare two variances var.test(XXXXX ~ XXXXX, data=YYYYY) ```
26. Considering everything you just did, what's your conclusion regarding the variance in hours studying for freshmen and sophomores? **TYPE YOUR ANSWER HERE**
26. Complete [assignment 2](http://bradthiessen.com/html5/stats/m301/assign2.pdf)