Lab #3: Binomial & Sign Tests


Remember to download the report template for this lab and open it in RStudio. You can download the template by clicking this link: http://bradthiessen.com/html5/stats/m300/lab3report.Rmd


Simulating binomial random variables


No time to study

I’m typing this on June 15, 2015, but I’m willing to bet that no one earned the 1,000,000 extra credit points during activity #7. Let’s see if we can simulate the 4- and 10-question quizzes and estimate the probability of getting a perfect score on each.

Each question in our quizzes had 4 possible answers: A, B, C, or D. Of those 4 answers, only one was correct. Your task, as a student in the class, was to choose the one correct answer from the 4 possible choices.

Using this logic, each quiz question had 4 possible outcomes: 1, 0, 0, 0. Either you chose the correct (1) answer or you chose one of the three incorrect (0, 0, 0) answers.

Let’s construct a dataframe containing the possible answers for each test question:

choices <- c(1, 0, 0, 0)

To simulate the 4-question quiz, we need to sample 4 of these possible answers (choices) with replacement. By sampling with replacement, we allow our simulated student to choose the correct answer multiple times.

Let’s simulate a single student taking the 4-question quiz:

set.seed(3141)
sample(choices, 4, replace=TRUE)
## [1] 0 0 1 0

Wait… what’s set.seed(3141)? Every time we run a randomized simulation, we get different results. That’s a good thing – we want a random sample. Unfortunately, it means we’re unable to replicate our results. By setting a random number seed, it ensures I get the same random sample each time I run the sample() command (at least until I run the command again).

I arbitrarily chose the value 3141 in the seed. When I choose that number and run the command sample(choices, 4, replace=TRUE), the output shows a simulated student who answered the 3rd question correctly but missed questions 1, 2, and 4.

If we only care about the simulated student’s score on the quiz (and we don’t care about which questions the student answered correctly), we can ask R to give us the sum of his item scores. In this case, we should get a score of 1.

set.seed(3141)  # This seed ensures I get that same simulated student
sum(sample(choices, 4, replace=TRUE))
## [1] 1

Now that we know how to simulate a single student, let’s simulate a class of 10,000 students.

quiz4 <- do(10000) * sum(sample(choices, 4, replace=TRUE))

We can then visualize the results and estimate the probability of getting a perfect score:

histogram(~result,                  # Plot the results
          data=quiz4,               # Stored in quiz4 
          type="count",             # Show frequencies on the y-axis
          width=1,                  # Make the bars have width = 1
          col="grey", border="white", # Change the colors
          xlab = "4-question quiz scores")