require(mosaic)

#### Examples from Activity #15

##### 2) Sampling distributions of the mean age of sheep

First, I’ll recreate the graphs we made in class. We’ll repeatedly (100,000 times) sample 1, 2, 3, and 4 sheep from our population. For each sample, we’ll calculate the mean. Then, we’ll graph all the means.

# Create sheep ages
sheep=c(10:14)
# We can calculate the mean from a single sample with:
# mean(sample(sheep, n))   where n=sample size
# We can repeat this process with the do() command

means1<- do(100000) * mean(sample(sheep, 1))
## Loading required package: parallel
means2<- do(100000) * mean(sample(sheep, 2))
means3<- do(100000) * mean(sample(sheep, 3))
means4<- do(100000) * mean(sample(sheep, 4))
par(mfrow=c(2,2))
histogram(~result, data = means1, col="grey", xlim=c(9, 15), xlab = "n=1", width=.25)

histogram(~result, data = means2, col="grey", xlim=c(9, 15), xlab = "n=2", width=.25)

histogram(~result, data = means3, col="grey", xlim=c(9, 15), xlab = "n=3", width=.25)

histogram(~result, data = means4, col="grey", xlim=c(9, 15), xlab = "n=4", width=.25)

##### 3) Mean of our means; Standard error of our means

The mean of our population of sheep is 12. Let’s see how the means of our means compare:

# Calculate the mean of each sampling distribution
mean(means1)
## [1] 12
mean(means2)
## [1] 12
mean(means3)
## [1] 12
mean(means4)
## [1] 12

The standard deviation should shrink as our sample size increased. Let’s see:

# Calculate the mean of each sampling distribution
sd(means1)
## [1] 1.413
sd(means2)
## [1] 0.8654
sd(means3)
## [1] 0.5757
sd(means4)
## [1] 0.354

##### 4) Body temperatures

I have a confession to make. I didn’t write down the source of the body temperature dataset, so I don’t know where I can find that data. Rather than searching for it, I’ll just simulate 10,000 values from a normal distribution with a mean of 98.25 and a standard deviation of 0.733.

# Create a simulated dataset
bodytemp=rnorm(10000, 98.25, 0.733)
mean(bodytemp)
## [1] 98.26
sd(bodytemp)
## [1] 0.7294

Close enough.

To estimate P(97.517 < x < 98.983), I can simply see what proportion of this dataset lies between those values:

# Create a vector of all temperatures between 97.517 and 98.983
temp1sd <- subset(bodytemp, bodytemp>97.517 & bodytemp<98.983)
# Find the proportion of temperatures between those values
length(temp1sd)/length(bodytemp)
## [1] 0.6892

From that, we see around 68% of values lie within one standard deviation of the mean.

We can also estimate the probability of selecting an individual with a body temperature less than 98:

temp98 <- subset(bodytemp, bodytemp<98)
# Find the proportion of temperatures between those values
length(temp98)/length(bodytemp)
## [1] 0.3582

##### 7-8) If we sample 100 individuals and calculate their average body temperature, what’s the probability their average body temperature is between 97.517 and 98.983? less than 98?

We already found P(97.517 < x < 98.983) = 68%. Since that’s a “usual” result for a single individual, it should even be more usual for a sample of 100 people. Let’s see:

# Create the estimated sampling distribution (n=100) with 10,000 samples
tempmeans<- do(10000) * mean(sample(bodytemp, 100))
histogram(~result, data = tempmeans, col="grey", xlim=c(97.5, 99), xlab = "n=100")

# Create a vector of all mean temperatures between 97.517 and 98.983
tempmeans1sd <- subset(tempmeans, result>97.517 & result<98.983)
# Find the proportion of temperatures between those values
nrow(tempmeans1sd)/nrow(tempmeans)
## [1] 1

It was 100%. Now to find the probability of an average body temperature (for 100 people) less than 98:

# Create a vector of all mean temperatures between 97.517 and 98.983
tempmeans98 <- subset(tempmeans, result<98)
# Find the proportion of temperatures between those values
nrow(tempmeans98)/nrow(tempmeans)
## [1] 1e-04

We estimate this probability to be 0.0002.