```{r, echo=FALSE} # If you want to load the data before publishing this report, # highlight the next line and click the RUN button at the top of this panel load(url("http://bradthiessen.com/html5/stats/m300/ames.RData")) ``` A dataset named `ames` has been loaded for you. It contains information about all 2,930 houses sold in Ames, Iowa between 2006-2010. The data includes 82 variables, including: - `Lot.Area` = the size of the lot measured in square feet - `Street` = whether the street is `Pave` (paved) or `Grvl` (gravel) - `Year.Built` = the year in which the house was built - `SalePrice` = the sale price of the house - `Gr.Liv.Area' = the above-ground living area measured in square feet If you wanted to access the `Street` variable in this `ames` dataframe, you'd refer to `ames$Street`. 1. Create a table of the `Street` variable to show how many houses are on paved and gravel roads. Then, use the `subset()` command to select only houses on gravel roads. Construct another table to verify you've selected the correct subset of data. ```{r} #### Exercise 1a # Below this line, construct a table of the street variable # Replace the XX values table(XX) #### Exercise 1b # Below this line, select only the houses on gravel streets # Replace the XX values amesgravel <- subset(XX, XX) #### Exercise 1c # Below this line, construct a table of the street variable # Replace the XX values table(amesgravel$XX) ```

2. Use square brackets `[]` to select only houses on gravel roads. ```{r} #### Exercise 2a # Below this line, select only the houses on gravel streets # Replace the XX values amesgravel2 <- ames[ames$XX == XX,XX] # This line will construct a table of the street variable to verify your subset table(amesgravel2$Street) ```

3. Use the filter verb in the dplyr package to select only houses on gravel roads. ```{r} #### Exercise 3a ## The following line will load the dplyr package library(dplyr) # Below this line, replace the XX values to select only houses on gravel roads amesgravel3 <- ames %>% XX # This line will construct a table of the street variable to verify your subset table(amesgravel3$Street) ```

4. Use the dplyr package to (a) **select** the `Street`, `Year.Built`, and `SalePrice` variables; (b) **group** the data by `Street` type; (c) **filter** to keep only houses built after 1976; (d) **mutate** a new variable named `PricePerFoot` that measures price per square foot (of above-ground living area); (e) summarize the mean `PricePerFoot`. ```{r} #### Exercise 4 # Replace all the XX values to complete the code ames %>% # Take our ames dataframe select(XX) %>% # Select variables of interest group_by(XX) %>% # Group data by variable of interest filter(XX) %>% # Filter out data as instructed mutate(PricePerFoot = XX) %>% # Create new PricePerFoot variable summarize(mean = mean(PricePerFoot)) # Calculate mean of PricePerFoot ```

5. Look at the sampling distribution (in the lab) we obtained by repeatedly taking samples of size n=30. Does it look as though the CLT held? Explain. ```{r, eval=FALSE} #### Exercise 5 # You can type your answer below or print out this lab and # write your answer by hand. ```

To simplify things, I've extracted the SalePrice variable and put it into a vector named `price`. To do this, I used the command: `price <- ames$price` ```{r, echo=FALSE} ## Nothing to do here. I'm just creating that price vector price <- ames$price ``` 6. Calculate the mean and standard deviation of this `price` variable. ```{r} #### Exercise 6 # Type your code below. Remember, you can use `price` -- you don't # need to use ames$price ```

7. Construct a histogram of the `price` variable. ```{r} #### Exercise 7 # Construct a histogram. You may want to look through the lab for example syntax. # Set whatever options you want. Type your code below this line. ```

8. Take 10,000 samples of size n = 50 from this price distribution. Calculate the mean of each sample. Then, generate a plot of the sampling distribution (of those 10,000 means). ```{r} #### Exercise 8 # Look through the activity to see example code. # You'll want to create a dataframe to hold your sample means # Something like XX <- do(XX) * mean(sample(XX)) # Type your code below this line. ```

9. Use a graphical method to determine if your sampling distribution approximates a normal distribution. ```{r} #### Exercise 9 # You can superimpose a normal curve or try a Q-Q plot # Type your code below this line ```

10. Calculate the mean and standard deviation of your sampling distribution. ```{r} #### Exercise 10 # Type your code below this line ```

End of Lab Report #5