Loading the Mosaic Package

Let’s load the Mosaic package:

require(mosaic)



Examples from Activity #22

1) Does the amount of time doctors spend with a patient depend on whether the patient is obese?

Let’s first load the dataset, examine the first several rows, and generate some plots:

doctors <- read.csv(url("http://bradthiessen.com/html5/stats/m300/doctors.csv"))
head(doctors)
##    weight time
## 1 average   15
## 2 average   15
## 3 average   45
## 4 average   40
## 5 average   45
## 6 average   20
favstats(time ~ weight, data=doctors)
##    .group min Q1 median Q3 max  mean    sd  n missing
## 1 average  15 25     30 40  50 31.36 9.864 33       0
## 2   obese   5 20     25 30  60 24.74 9.653 38       0
bwplot(time ~ weight, data = doctors)

plot of chunk unnamed-chunk-3

xyplot(jitter(time) ~ weight, data = doctors, alpha = 0.6, cex = 1.4)

plot of chunk unnamed-chunk-3

In our sample, doctors did report spending more time with non-obese patients. The observed standard deviations are similar, so we’ll probably be ok with our equal variances assumption. Let’s run a t-test:

# One-sample t-test; alternate hypothesis is less than
t.test(time ~ weight, var.equal=TRUE, alternative="greater", sig.level=0.05, data=doctors)
## 
##  Two Sample t-test
## 
## data:  time by weight
## t = 2.856, df = 69, p-value = 0.002831
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.758   Inf
## sample estimates:
## mean in group average   mean in group obese 
##                 31.36                 24.74

There’s our p-value of 0.002831 (which matches the Stata output from question #7 in Activity #22).

What would happen if we didn’t make the equal variances assumption?

# One-sample t-test; alternate hypothesis is less than
t.test(time ~ weight, var.equal=FALSE, alternative="greater", sig.level=0.05, data=doctors)
## 
##  Welch Two Sample t-test
## 
## data:  time by weight
## t = 2.852, df = 67.17, p-value = 0.002887
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.751   Inf
## sample estimates:
## mean in group average   mean in group obese 
##                 31.36                 24.74

Our p-value doesn’t change too much, but notice the degrees of freedom are now 67.174 (hopefully, I explained this in class).

We can easily get a 90% confidence interval for the difference in means:

t.test(time ~ weight, var.equal=TRUE, conf.level=0.90, data=doctors)$conf.int
## [1]  2.758 10.495
## attr(,"conf.level")
## [1] 0.9

The CI is (2.7583, 10.4953).


We can also compare the group means using a randomization (permutation) test (like we did in Activity 22 for other datasets).
# Find the mean of each group
mean(time ~ weight, data = doctors)
## average   obese 
##   31.36   24.74
# Create our test statistic (difference in means)
test.stat <- compareMean(time ~ weight, data = doctors)
# Randomly shuffle the "weight" groups 10,000 times and calculate our test statistic
rtest.stats = do(10000) * compareMean(time ~ shuffle(weight), data=doctors)
## Loading required package: parallel

Now that we’ve run 10,000 randomizations, let’s take a look at the distribution of mean differences:

histogram(~ result, xlim=c(-12,12), groups=result <= test.stat, data=rtest.stats, width=1)
ladd(panel.abline(v=test.stat))

plot of chunk unnamed-chunk-8

prop(~ (result<= test.stat), data=rtest.stats)
##   TRUE 
## 0.0027

That last value, 0.0031, is our p-value. It’s similar to the p-value from our t-test (which provides evidence that our assumptions we reasonable).


Will a smiling person accused of a crime be treated more leniently than one who is not smiling?

Let’s first load the dataset, examine the first several rows, and generate some plots:

smile1 <- read.csv(url("http://bradthiessen.com/html5/stats/m300/smile1.csv"))
head(smile1)
##   smile leniency
## 1 smile      7.0
## 2 smile      3.0
## 3 smile      6.0
## 4 smile      4.5
## 5 smile      3.5
## 6 smile      4.0
favstats(leniency ~ smile, data=smile1)
##   .group min  Q1 median    Q3 max  mean    sd   n missing
## 1     no 2.0 3.0      4 4.875   8 4.118 1.523  34       0
## 2  smile 2.5 3.5      5 6.000   9 5.064 1.659 102       0
bwplot(leniency ~ smile, data = smile1)

plot of chunk unnamed-chunk-9

xyplot(jitter(leniency) ~ smile, data = smile1, alpha = 0.6, cex = 1.4)

plot of chunk unnamed-chunk-9

From the sample statistics, it looks like our equal variances assumption is reasonable. Let’s run a t-test with that assumption:

# One-sample t-test; alternate hypothesis is less than
t.test(leniency ~ smile, var.equal=TRUE, alternative="less", sig.level=0.05, data=smile1)
## 
##  Two Sample t-test
## 
## data:  leniency by smile
## t = -2.938, df = 134, p-value = 0.001946
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##     -Inf -0.4127
## sample estimates:
##    mean in group no mean in group smile 
##               4.118               5.064

That small p-value tells us it was highly unlikely to observe this data if the null hypothesis were true. [… but, again, would any reasonable person think this null hypothesis has any chance of being true? Why do we test against null hypotheses when we know they cannot be true? For answers to this, and other interesting questions, take STAT 301 or 305]

Let’s get a 90% confidence interval for the difference in means:

t.test(leniency ~ smile, var.equal=TRUE, conf.level=0.90, data=smile1)$conf.int
## [1] -1.4795 -0.4127
## attr(,"conf.level")
## [1] 0.9

The CI is (-1.479, -0.413).


Randomization method:
# Find the mean of each group
mean(leniency ~ smile, data = smile1)
##    no smile 
## 4.118 5.064
# Create our test statistic (difference in means)
test.stat <- compareMean(leniency ~ smile, data = smile1)
# Randomly shuffle the "weight" groups 10,000 times and calculate our test statistic
smileTestStats = do(10000) * compareMean(leniency ~ shuffle(smile), data=smile1)

Now that we’ve run 10,000 randomizations, let’s take a look at the distribution of mean differences:

histogram(~ result, xlim=c(-1.5,1.5), groups=result >= test.stat, data=smileTestStats, width=.1)
ladd(panel.abline(v=test.stat))

plot of chunk unnamed-chunk-13

prop(~ (result>= test.stat), data=smileTestStats)
##   TRUE 
## 0.0015

The p-value, 0.0019, is similar to the p-value we obtained from our t-test.


Bootstrap method:

We could also use a bootstrap method to estimate a confidence interval of the difference in means (between the smile and no-smile groups). To do this, we first have the computer resample our data (with replacement) 10,000 times and calculate our test statistic (the mean of the smile group minus the mean of the no-smile group). Then, we find the 5th and 95th percentiles of our test statistic.

trials = do(10000) * compareMean(leniency~smile, data = resample(smile1))
with(trials, quantile(result, c(0.025, 0.975)))
##   2.5%  97.5% 
## 0.3469 1.5420

Let’s compare that confidence interval to one we’d get from our parametric methods:

t.test(leniency ~ smile, var.equal=TRUE, conf.level=0.95, data=smile1)$conf.int
## [1] -1.5830 -0.3091
## attr(,"conf.level")
## [1] 0.95

Bootstrap Method: (0.33605, 1.52669) Parametric Method: (0.30915, 1.58301)


Power analysis

Let’s do a quick power analysis for our two-sample t-test for this smile/leniency dataset:

power.t.test(n = 68, delta = 0.94608, sd = 1.6, sig.level = 0.05, alternative = "one.sided", type = "two.sample")$power
## [1] 0.9629


Bayesian Estimation

Again, I won’t explain this method here, but I will show some results:

# Load the package
require(BEST)
## Loading required package: BEST
## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 3.3.0
## Loaded modules: basemod,bugs
## Comparison of two groups:
## =========================
smile <- c(7, 3, 6, 4.5, 3.5, 4, 3, 3, 3.5, 4.5, 7, 5, 5, 7.5, 2.5, 5, 5.5, 5.5, 5, 4, 5, 6.5, 6.5, 7, 3.5, 5, 3.5, 9, 2.5, 8.5, 3.5, 4.5, 3.5, 4.5, 2.5, 5.5, 6.5, 3.5, 3, 3.5, 6, 5, 4, 4.5, 5, 5.5, 3.5, 6, 6.5, 3, 8, 6.5, 8, 6, 6, 3, 7, 8, 4, 3, 2.5, 8, 4.5, 5.5, 7.5, 6, 9, 6.5, 5.5, 4, 4, 5, 6, 3.5, 3.5, 3.5, 4, 5.5, 5.5, 4.5, 2.5, 5.5, 4.5, 3, 3.5, 8, 5, 7.5, 8, 4, 5.5, 6.5, 5, 4, 3, 5, 4, 4, 6, 8, 4.5, 5.5)
nosmile <- c(2, 4, 4, 3, 6, 4.5, 2, 6, 3, 3, 4.5, 8, 4, 5, 3.5, 4.5, 6.5, 3.5, 4.5, 4.5, 2.5, 2.5, 4.5, 2.5, 6, 6, 2, 4, 5.5, 4, 2.5, 2.5, 3, 6.5)

# Run an analysis, takes up to 2 min.
BESTout <- BESTmcmc(smile, nosmile)
## Setting up the JAGS model...
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 291
## 
## Initializing model
## 
## Burning in the MCMC chain...
## Sampling final MCMC chain...
# Look at the result:
BESTout
## MCMC fit results for BEST analysis:
## 100002 simulations saved.
##          mean      sd median HDIlo   HDIup  Rhat n.eff
## mu1     5.043  0.1685  5.043 4.708   5.371 1.000 57817
## mu2     4.095  0.2743  4.094 3.563   4.642 1.000 62423
## nu     48.014 32.4502 39.656 6.031 112.729 1.001 24656
## sigma1  1.644  0.1244  1.638 1.407   1.892 1.000 52073
## sigma2  1.546  0.2083  1.526 1.162   1.963 1.000 47170
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## 'Rhat' is the potential scale reduction factor (at convergence, Rhat=1).
## 'n.eff' is a crude measure of effective sample size.
plot(BESTout)

plot of chunk unnamed-chunk-17

plot(BESTout, "sd")

plot of chunk unnamed-chunk-17

plotPostPred(BESTout)

plot of chunk unnamed-chunk-17

pairs(BESTout)

plot of chunk unnamed-chunk-17

The output and plots show: * There’s a 95% chance the difference in means is between 0.31 and 1.58. That’s similar to what we got from our parametric and bootstrap confidence intervals. * There’s a 95% chance the difference in standard deviations between the groups is between -0.388 and 0.553. Since that interval contains zero, it may be reasonable to assume equal variances.

ANOVA

As explained in the activity, this smiles/leniency dataset actually contains 4 different smile groups:

smiles2 <- read.csv(url("http://bradthiessen.com/html5/stats/m300/smiles2.csv"))
favstats(leniency ~ expression, data=smiles2)
##      .group min    Q1 median    Q3 max  mean    sd  n missing
## 1      felt 2.5 3.500   4.75 5.875   9 4.912 1.681 34       0
## 2 feltfalse 2.5 3.625   5.50 6.500   9 5.368 1.827 34       0
## 3 miserable 2.5 4.000   4.75 5.500   8 4.912 1.454 34       0
## 4   neutral 2.0 3.000   4.00 4.875   8 4.118 1.523 34       0
bwplot(leniency ~ expression, data = smiles2)

plot of chunk unnamed-chunk-18

The sample means differ (ranging from 4.1 to 5.4), but does that give us evidence that the population means differ?

As we discussed in class, running multiple t-tests would inflate the overall alpha error. Instead, we can conduct an ANOVA (ANalysis Of VAriance).

ANOVA is one of the big topics in STAT 301, so I won’t try to explain it in any detail here. As I explained in class, ANOVA “works” by comparing the variation between groups (distance among the group means) to the variation within each group (the average standard deviation of the groups).

I’ll show you the output from an ANOVA here:

mod <- aov(leniency ~ expression, data = smiles2)
summary(mod)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## expression    3     28    9.18    3.46  0.018 *
## Residuals   132    350    2.65                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The value on the top-right (0.018) represents our p-value. From this, we could conclude the group means differ. To find which group means differ from one another, we could run post-hoc tests. Here’s an example of Tukey’s Honestly Significant Differences

smiles2 <- transform(smiles2, subgrp = factor(expression, levels=c("felt", "feltfalse", "miserable", "neutral"), labels=c("F", "FF", "M", "N")))
mod <- lm(leniency ~ subgrp, data=smiles2)
compare <- TukeyHSD(mod, "subgrp")
compare
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = x)
## 
## $subgrp
##            diff     lwr     upr  p adj
## FF-F  4.559e-01 -0.5712  1.4830 0.6562
## M-F  -8.882e-16 -1.0271  1.0271 1.0000
## N-F  -7.941e-01 -1.8212  0.2330 0.1889
## M-FF -4.559e-01 -1.4830  0.5712 0.6562
## N-FF -1.250e+00 -2.2771 -0.2229 0.0102
## N-M  -7.941e-01 -1.8212  0.2330 0.1889

This shows, among other things, that the only groups that differ are Neutral and FeltFalse. You can see how they differ in the boxplot we created above.