Let’s load the *Mosaic package*:

`require(mosaic)`

We have a binomial distribution with: `p = 0.300`

, `n = 25`

, and `x ≤ 8`

```
# Binomial Distribution
plotDist('binom', params=list(size=25, prob=0.300), xlim=c(-1,26), main="Binomial Probabilities with n=25, p=0.30")
```

That’s the binomial distribution. We can also look at the cdf:

```
# Binomial CDF
plotDist('binom', kind="cdf", params=list(size=25, prob=0.300), xlim=c(-1,26), main="Binomial Probabilities with n=25, p=0.30")
```

```
# Calculate P(X≤8)
pbinom(8, 25, 0.30, lower.tail=TRUE, log.p=FALSE)
```

`## [1] 0.6769`

If we wanted to reduce this probability, we can either give him more at-bats or lower our standard. Suppose we lower our standard so that they player only needs 7 or more hits to get a contract. The probability we make a mistake is nearly cut in half:

```
# P(X≤6)
pbinom(6, 25, 0.30, lower.tail=TRUE, log.p=FALSE)
```

`## [1] 0.3407`

The problem with lowering our standard is that a player who cannot hit .300 may be able to get 7 hits in 25 at-bats. As an example, here’s the probability that a .250 hitter gets 7 or more hits in 25 at-bats:

```
# P(X≤6)
1-pbinom(6, 25, 0.250, lower.tail=TRUE, log.p=FALSE)
```

`## [1] 0.4389`

That gives us a 44% chance of making the mistake of signing a .250 hitter to a contract.

rbinom(n, size, prob)

Let’s generate 5,000 numbers from a binomial distribution with p=0.3 and n=10. To make sure our code will work, let’s first try to get 5 numbers:

`rbinom(5, 12, .3)`

`## [1] 5 7 1 3 4`

Each of those values represents the number of successes in 12 trials. It looks like the code works, so let’s get our 5,000 values:

`randomBinomial <- rbinom(5000, 12, .3)`

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the binomial distribution (in blue):

```
histogram(~randomBinomial, width=1, col = "lightyellow", border="yellow")
plotDist('binom', params=list(size=12, prob=0.300), xlim=c(-1,13), lwt=5, col="blue", add=TRUE)
```

We know the professor has a .167 (74/444) chance of winning any game. We can call on the *Geometric Distribution* to calculate the probabilities of interest.

To see how to use the geometric distribution in R, we can ask for help: `help(pgeom)`

The help file shows lots of information, including these four commands:

`dgeom(x, prob, log = FALSE)`

`pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)`

`qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)`

`rgeom(n, prob)`

Let’s plot the Geometric Distribution with p=.167:

```
# Geometric Distribution with p=.167
plotDist('geom', params=list(prob=74/444), xlim=c(-1,21), main="Geometric Distribution with p=.167")
```

```
# Geometric CDF
plotDist('geom', kind="cdf", params=list(prob=74/444), xlim=c(-1,20), main="Geometric CDF")
```

To calculate the probability of the first win ocurring on the 3rd trial, we would use:

```
# The first input is the number of trials BEFORE the first win
# So P(X=1) would be...
# dgeom(0, 74/444, log=FALSE)
# P(X=3) would be...
dgeom(2, 74/444, log=FALSE)
```

`## [1] 0.1157`

Running this code in RStudio will give you an interactive plot of the Geometric Distribution:

```
require(manipulate)
manipulate(
plotDist('geom',
kind=type,
params=list(prob=P),
xlim=c(-1, N)),
N=slider(5,50, step=1, initial=10),
P=slider(0,1, step=0.1, initial=0.5),
type=picker("density"="density", "cdf"="cdf", "qq"="qq", "histogram"="histogram"))
```

rgeom(n, prob)

Let’s generate 5,000 numbers from a geometric distribution with p=0.3:

`randomGeometric <- rgeom(5000, .3)`

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the geometric distribution (in blue):

```
histogram(~randomGeometric, width=1, xlim=c(-1,20), col = "lightyellow", border="yellow")
plotDist('geom', params=list(prob=0.300), xlim=c(-1,20), lwt=5, col="blue", add=TRUE)
```

We need the *Negative Binomial Distribution* to answer this. To get help, I type: `help(pnbinom)`

That shows me two commands of interest: `dnbinom(x, size, prob, mu, log = FALSE)`

`pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE)`

Let’s plot the Negative Binomial Distribution with p=.167:

```
# NegBin Distribution with p=.167
# Note the x-axis represents trials before 3 successes
plotDist('nbinom', params=list(size=3, prob=74/444), xlim=c(-1,41), main="Negative Binomial Distribution with p=.167")
```

```
# Geometric CDF
plotDist('nbinom', kind="cdf", params=list(size=3, prob=74/444), xlim=c(-1,41), main="Negative Binomial Distribution with p=.167")
```

To calculate the probability of getting 2 wins on the 3rd trial, we would use:

```
# x = (Total number of trials - number of successes desired)
# size = number of successes desired
# So P(2nd win on 3rd trial) would be...
# P(1 trial before 2 wins)
dnbinom(1, 2, 74/444, log=FALSE)
```

`## [1] 0.0463`

To calculate the probability of getting 3 wins on the 4th trial, we would use:

`dnbinom(1, size=3, prob=74/444, log=FALSE)`

`## [1] 0.01157`

rnbinom(n, size, prob)

Let’s generate 5,000 numbers from a negative binomial distribution looking for the 3rd win with p=0.3:

`randomNegBin <- rnbinom(5000, 3, .3)`

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the negative binomial distribution (in blue):

```
histogram(~randomNegBin, width=1, xlim=c(-1,23), col = "lightyellow", border="yellow")
plotDist('nbinom', params=list(size=3, prob=0.300), xlim=c(-1,23), lwt=5, col="blue", add=TRUE)
```

We’ve seen the syntax for the *Hypergeometric Distribution* in Activity #3. Recall that the function is: `dhyper(x, m, n, k, log = FALSE)`

where: `x = number of type 1 objects chosen`

`m = number of type 1 objects`

`n = number of type 2 objects`

`k = number of objects selected`

For this question, we would use:

`dhyper(7, 18, 16, 12, log = FALSE)`

`## [1] 0.2535`

rhyper(nn, m, n, k)

Let’s generate 5,000 numbers from a hypergeometric distribution with:

`20 objects of type 1`

`10 objects of type 2`

`8 objects chosen at random`

`randomHyper <- rhyper(5000, 20, 10, 8)`

Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the negative binomial distribution (in blue). The values represent the number of objects of type 1 selected:

```
histogram(~randomHyper, width=1, xlim=c(-1,9), col = "lightyellow", border="yellow")
plotDist('hyper', params=list(20, 10, 8), xlim=c(-1,9), lwt=5, col="blue", add=TRUE)
```

We need the Poisson Distribution, so let’s see the commands: `help(ppois)`

`dpois(x, lambda, log = FALSE)`

`ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)`

In our scenario, our expected value is 15 per year. We’re interested in a single week, so we can convert our expected value to find lambda = 15/52.

Let’s look at a plot of our Poisson Distribution:

```
# Poisson distribution with lambda=15/52
plotDist('pois', params=list(lambda=15/52), xlim=c(-1,6), main="Poisson Distribution with Lambda=15/52")
```

```
# Geometric CDF
plotDist('pois', kind="cdf", params=list(lambda=15/52), xlim=c(-1,6), main="Poisson Distribution with Lambda=15/52")
```