Loading the Mosaic Package

Let’s load the Mosaic package:

require(mosaic)



Examples from Activity #8

1. You’re looking for a free agent who can hit .300. You will sign the player if he gets at least 9 hits in 25 at-bats. If the free agent really does have a batting average of .300, what’s the probability you will make a mistake and NOT give him a contract?

We have a binomial distribution with: p = 0.300, n = 25, and x ≤ 8

# Binomial Distribution
plotDist('binom', params=list(size=25, prob=0.300), xlim=c(-1,26), main="Binomial Probabilities with n=25, p=0.30")

plot of chunk unnamed-chunk-3

That’s the binomial distribution. We can also look at the cdf:

# Binomial CDF
plotDist('binom', kind="cdf", params=list(size=25, prob=0.300), xlim=c(-1,26), main="Binomial Probabilities with n=25, p=0.30")

plot of chunk unnamed-chunk-4

# Calculate P(X≤8)
pbinom(8, 25, 0.30, lower.tail=TRUE, log.p=FALSE)
## [1] 0.6769

If we wanted to reduce this probability, we can either give him more at-bats or lower our standard. Suppose we lower our standard so that they player only needs 7 or more hits to get a contract. The probability we make a mistake is nearly cut in half:

# P(X≤6)
pbinom(6, 25, 0.30, lower.tail=TRUE, log.p=FALSE)
## [1] 0.3407

The problem with lowering our standard is that a player who cannot hit .300 may be able to get 7 hits in 25 at-bats. As an example, here’s the probability that a .250 hitter gets 7 or more hits in 25 at-bats:

# P(X≤6)
1-pbinom(6, 25, 0.250, lower.tail=TRUE, log.p=FALSE)
## [1] 0.4389

That gives us a 44% chance of making the mistake of signing a .250 hitter to a contract.


Generating data from the binomial distribution

rbinom(n, size, prob)

Let’s generate 5,000 numbers from a binomial distribution with p=0.3 and n=10. To make sure our code will work, let’s first try to get 5 numbers:

rbinom(5, 12, .3)
## [1] 5 7 1 3 4

Each of those values represents the number of successes in 12 trials. It looks like the code works, so let’s get our 5,000 values:

randomBinomial <- rbinom(5000, 12, .3)

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the binomial distribution (in blue):

histogram(~randomBinomial, width=1, col = "lightyellow", border="yellow")
plotDist('binom', params=list(size=12, prob=0.300), xlim=c(-1,13), lwt=5, col="blue", add=TRUE)

plot of chunk unnamed-chunk-9


5. Calculate the probability of the professor’s first winning game occurring on the 3rd or 4th attempt.

We know the professor has a .167 (74/444) chance of winning any game. We can call on the Geometric Distribution to calculate the probabilities of interest.

To see how to use the geometric distribution in R, we can ask for help: help(pgeom)

The help file shows lots of information, including these four commands:
dgeom(x, prob, log = FALSE)
pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)
qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)
rgeom(n, prob)

Let’s plot the Geometric Distribution with p=.167:

# Geometric Distribution with p=.167
plotDist('geom', params=list(prob=74/444), xlim=c(-1,21), main="Geometric Distribution with p=.167")

plot of chunk unnamed-chunk-10

# Geometric CDF
plotDist('geom', kind="cdf", params=list(prob=74/444), xlim=c(-1,20), main="Geometric CDF")

plot of chunk unnamed-chunk-10

To calculate the probability of the first win ocurring on the 3rd trial, we would use:

# The first input is the number of trials BEFORE the first win
# So P(X=1) would be...
# dgeom(0, 74/444, log=FALSE)
# P(X=3) would be...
dgeom(2, 74/444, log=FALSE)
## [1] 0.1157


Interactive Geometric Distribution Plot

Running this code in RStudio will give you an interactive plot of the Geometric Distribution:

require(manipulate)

manipulate(
  plotDist('geom',
    kind=type,
    params=list(prob=P),
    xlim=c(-1, N)),
  N=slider(5,50, step=1, initial=10),
  P=slider(0,1, step=0.1, initial=0.5),
  type=picker("density"="density", "cdf"="cdf", "qq"="qq", "histogram"="histogram"))


9) A student sneaks into a professor’s office in order to steal the answers to next week’s test. The student has only 10 minutes to get the answers, but finds they are safely hidden inside a safe. To open the safe, the student must guess a 4-digit code (using the digits 0-9). If the student only has time to guess 100 codes at random, what’s the probability the student will open the safe?

We want P(X≤100) for a Geometric Distribution with p=1/10000

# P(X≤100) would be...
pgeom(99, 1/10000, lower.tail = TRUE, log.p = FALSE)
## [1] 0.009951


Generating data from the geometric distribution

rgeom(n, prob)

Let’s generate 5,000 numbers from a geometric distribution with p=0.3:

randomGeometric <- rgeom(5000, .3)

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the geometric distribution (in blue):

histogram(~randomGeometric, width=1, xlim=c(-1,20), col = "lightyellow", border="yellow")
plotDist('geom', params=list(prob=0.300), xlim=c(-1,20), lwt=5, col="blue", add=TRUE)

plot of chunk unnamed-chunk-15


10) What’s the probability that the professor’s 2nd win is on his 3rd attempt? 3rd win on 4th attempt?

We need the Negative Binomial Distribution to answer this. To get help, I type: help(pnbinom)

That shows me two commands of interest: dnbinom(x, size, prob, mu, log = FALSE)
pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE)

Let’s plot the Negative Binomial Distribution with p=.167:

# NegBin Distribution with p=.167
# Note the x-axis represents trials before 3 successes
plotDist('nbinom', params=list(size=3, prob=74/444), xlim=c(-1,41), main="Negative Binomial Distribution with p=.167")

plot of chunk unnamed-chunk-16

# Geometric CDF
plotDist('nbinom', kind="cdf", params=list(size=3, prob=74/444), xlim=c(-1,41), main="Negative Binomial Distribution with p=.167")

plot of chunk unnamed-chunk-16

To calculate the probability of getting 2 wins on the 3rd trial, we would use:

# x = (Total number of trials - number of successes desired)
# size = number of successes desired
# So P(2nd win on 3rd trial) would be...
# P(1 trial before 2 wins)
dnbinom(1, 2, 74/444, log=FALSE)
## [1] 0.0463

To calculate the probability of getting 3 wins on the 4th trial, we would use:

dnbinom(1, size=3, prob=74/444, log=FALSE)
## [1] 0.01157


Generating data from the negative binomial distribution

rnbinom(n, size, prob)

Let’s generate 5,000 numbers from a negative binomial distribution looking for the 3rd win with p=0.3:

randomNegBin <- rnbinom(5000, 3, .3)

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the negative binomial distribution (in blue):

histogram(~randomNegBin, width=1, xlim=c(-1,23), col = "lightyellow", border="yellow")
plotDist('nbinom', params=list(size=3, prob=0.300), xlim=c(-1,23), lwt=5, col="blue", add=TRUE)

plot of chunk unnamed-chunk-20


13) Calculate the probability of selecting exactly 5 men and 7 women at random in this scenario.

We’ve seen the syntax for the Hypergeometric Distribution in Activity #3. Recall that the function is: dhyper(x, m, n, k, log = FALSE)

where: x = number of type 1 objects chosen
m = number of type 1 objects
n = number of type 2 objects
k = number of objects selected

For this question, we would use:

dhyper(7, 18, 16, 12, log = FALSE)
## [1] 0.2535


Generating data from the hypergeometric distribution

rhyper(nn, m, n, k)

Let’s generate 5,000 numbers from a hypergeometric distribution with:
20 objects of type 1
10 objects of type 2
8 objects chosen at random

randomHyper <- rhyper(5000, 20, 10, 8)

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the negative binomial distribution (in blue). The values represent the number of objects of type 1 selected:

histogram(~randomHyper, width=1, xlim=c(-1,9), col = "lightyellow", border="yellow")
plotDist('hyper', params=list(20, 10, 8), xlim=c(-1,9), lwt=5, col="blue", add=TRUE)

plot of chunk unnamed-chunk-23


16) Suppose there are 15 accidents each year on the corner of Gaines and Locust. Calculate the probability that we will have no accidents during a one week period.

We need the Poisson Distribution, so let’s see the commands: help(ppois)

dpois(x, lambda, log = FALSE)
ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)

In our scenario, our expected value is 15 per year. We’re interested in a single week, so we can convert our expected value to find lambda = 15/52.

Let’s look at a plot of our Poisson Distribution:

# Poisson distribution with lambda=15/52
plotDist('pois', params=list(lambda=15/52), xlim=c(-1,6), main="Poisson Distribution with Lambda=15/52")

plot of chunk unnamed-chunk-24

# Geometric CDF
plotDist('pois', kind="cdf", params=list(lambda=15/52), xlim=c(-1,6), main="Poisson Distribution with Lambda=15/52")