Scenario: Accounting Fraud?

Drawing


In 1993, Wayne James Nelson was accused of defrauding the state of Arizona out of $2 million by diverting funds to a bogus vendor. Nelson claimed he diverted random payments to the bogus vendor to demonstrate the absence of safeguards in a new computer system. The prosecutors argued that the diverted funds were not randomly selected payments and, in fact, Nelson created the payments in attempt to make them appear random.

Here are the amounts of the 23 checks Nelson diverted to the bogus vendor:

arizona
##             date   amount
## 1  Oct.  9, 1992  1927.46
## 2  Oct.  9, 1992 27902.31
## 3  Oct. 14, 1992 86241.90
## 4  Oct. 14, 1992 72117.46
## 5  Oct. 14, 1992 81321.75
## 6  Oct. 14, 1992 97473.96
## 7  Oct. 19, 1992 93249.11
## 8  Oct. 19, 1992 89658.17
## 9  Oct. 19, 1992 87776.89
## 10 Oct. 19, 1992 92105.83
## 11 Oct. 19, 1992 79949.16
## 12 Oct. 19, 1992 87602.93
## 13 Oct. 19, 1992 96879.27
## 14 Oct. 19, 1992 91806.47
## 15 Oct. 19, 1992 84991.67
## 16 Oct. 19, 1992 90831.83
## 17 Oct. 19, 1992 93776.67
## 18 Oct. 19, 1992 88336.72
## 19 Oct. 19, 1992 94639.69
## 20 Oct. 19, 1992 83709.28
## 21 Oct. 19, 1992 96412.21
## 22 Oct. 19, 1992 88432.86
## 23 Oct. 19, 1992 71552.15

If you look closely, you’ll notice:

   • The amounts start small and then increase. This may be expected in the case of fraud.

   • Many amounts are just below $100,000 (which may be the limit at which transactions receive greater scrutiny)


If we look at the distribution of the first digits of each of those amounts, we find:

# Extract first digit of each amount and store as "firstdigit"
arizona$firstdigit <- as.numeric(substr(arizona$amount, 1, 1))

# Get table of first digits
tally(~firstdigit, margins=TRUE, data=arizona)
## 
##     1     2     7     8     9 Total 
##     1     1     3     9     9    23
# Bar chart of first digits
histogram(~firstdigit, data=arizona, col="lightblue", border="white", xlab="First Digit", width=1)


Does the distribution of these first digits give us evidence that the amounts were not randomly selected? If so, it may support the accusation that Nelson created these amounts in an attempt to defraud the state.


What do you expect?

What should the distribution of leading digits be in a set of financial transactions? Without giving it much thought, we might expect a uniform distribution:


It turns out, however, that the distribution of leading digits of financial records looks like this:


Benford’s Law

This distribution, generated from Benford’s Law, applies to first digits of numeric data that have a wide distribution and are generated by a natural process (as opposed to being generated by a human). Some examples of data that follow Benford’s Law include: stock market volumes, populations of cities, distances to stars in the universe, house addresses, and numbers of followers on Twitter.

What is Benford’s Law? It originates back in the nineteenth century, when Simon Newcombe visited the library. Like most mathematicians of the time, Newcombe relied on reference books like Arithmetica Logarithmica to help with calculations. The book provided page after page of logarithms calculated to 10 decimal places of every number up to 100,000.


Drawing


As Newcombe used this book, he noticed the pages at the beginning of the book were dirtier than the pages towards the end. In fact, he noticed the pages with numbers beginning with 1 were the dirtiest, followed by pages beginning with the number 2, and so on. The cleanest pages were for numbers beginning with 9. Newcombe, realizing the dirt came from people using some pages more than others, argued:

“That the ten digits do not occur with equal frequency must be evident to anyone making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones. The first significant figure is oftener 1 than any other digit, and the frequency diminishes up to 9.”

Newcombe published his findings in an 1881 article, describing a law for the probability that any number begins with a particular digit:

Probability of a number beginning with digit d is:    \(\textrm{log}_{10}\left ( d+1 \right )-\textrm{log}_{10}\left ( d \right )=\textrm{log}_{10}\left ( 1+\frac{1}{d} \right )\)


Why does it work (and why isn’t it called Newcombe’s Law)?

Imagine you have a pencil that is one unit long. It doesn’t matter what one unit means to you – it could be centimeters, inches, cubits, or anything else.

Now imagine that pencil slowly and continuously growing in length. For a long time, the length will be 1.x units long. In fact, it will need to double in length (a 100% change) before the leading digit changes from 1 to 2.

Once the leading digit is 2, however, it only needs to change in length by 50% to get a leading digit of 3. Once at 3, a change of 33.3% will bring the leading digit to a 4.

The following figure shows that as we move along a logarithmic scale, there’s a shorter distance between each subsequent digit. In fact, once we get to a leading digit of 9, we only need an 11% increase to go back to a leading digit of 1.

Drawing


In the figure displayed below, the amount of “time” the leading digit is a 1 has been shaded red. You can see the shaded area represents roughly 30% of the entire figure:

Drawing


If each leading digit is given its own color, we can compare the relative widths of each shaded region:

Drawing


It turns out that the width of each colored segment is proportional to   \(\textrm{log}_{10}\left ( d+1 \right )-\textrm{log}_{10}\left ( d \right )=\textrm{log}_{10}\left ( 1+\frac{1}{d} \right )\)

Drawing


These probabilities tell us what to expect if we examine the first digits of a series of financial records.


Unfortunately for Newcombe, his article went largely unnoticed. Fifty-seven years later, a physicist named Frank Benford made the exact same discovery as Newcome through the exact same observation of dirtier pages at the beginning of logarithm books, and derived the exact same formula. Benford, though, had the resources to test his theory empirically.

Benford spent years gathering 20,229 observations from a diverse range of data sets: baseball statistics, areas of rivers, atomic weights of elements–even arbitrary numbers plucked from Reader’s Digest articles. This mix of all kinds of different datasets happened to be a near-perfect fit with his theory.

With these additional findings, Benford’s article quickly gained popularity and the distribution of first-digits became known as Benford’s Law.




Is he guilty?

Now that we know what we should expect for the distribution of leading digits, we simply need to compare it to what we observed.


  1. Explain why we can’t look at the previous plot and conclude that Nelson committed fraud? Why aren’t we certain that Nelson did not randomly choose 23 transactions?









How can we calculate the likelihood of observing these first-digits if, in fact, Nelson had selected a random sample of 23 transactions (from a population that follows Benford’s Law)?

To derive some methods to estimate this likelihood, let’s turn to a simpler scenario.



Scenario: Rolling a die

Suppose we roll a single 6-sided die 120 times. We might get results like:


Drawing





  1. Assuming the die is fair, fill-in the expected frequencies.



  1. We need to think of a method to determine how “far off” our observations are from our expectations. Ideally, it would be a single value we could calculate from the table displayed above. What ideas do you have?











  1. Use your method to calculate the “distance” of the observations to the expectations for each of the tables displayed below. Are the observed values in each table the same “distance” from their expectations? Should they be?


Drawing











Chi-squared Statistic (goodness-of-fit)

To measure the distance from an observed set of frequencies to a theoretical distribution of expectations, we can use a chi-squared statistic:

      \(\chi _{c-1}^{2}=\sum_{c=1}^{c}\frac{\left ( \textrm{observed}_{c} -\textrm{expected}_{c}\right )^{2}}{\textrm{expected}_{c}}=\sum \frac{\left ( \textrm{O}_{c}-\textrm{E}_{c} \right )^{2}}{\textrm{E}_{c}}\)


To estimate a p-value, we compare this value to a chi-squared distribution.


  1. Calculate this chi-squared goodness-of-fit test for the following table of die rolls. Estimate the p-value by using this applet?


Drawing


We can calculate this chi-square goodness-of-fit test in R:

# Input the observed frequencies of each digit
die <- c(23, 15, 19, 24, 21, 18)

# Input the expected probabilities of each digit
dieprob <- c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)

# Run the chi-square test using chisq.test
# rescale.p = TRUE just makes sure the probabilities sum to 1
chisq.test(die, p = dieprob, rescale.p = TRUE)
## 
##  Chi-squared test for given probabilities
## 
## data:  die
## X-squared = 2.8, df = 5, p-value = 0.7308


Randomization-based chi-squared goodness-of-fit test

We could also conduct this goodness-of-fit test using randomization-based methods. In this simple example with rolling a die 120 times, we expect to see an equal proportion of 1s, 2s, 3s, 4s, 5s, and 6s. In other words, we expect P(1) = P(2) = ... = P(6) = 1/6.

With this assumption, we can randomly sample 120 values from the discrete distribution we just identified with those probabilities. In this case, we’ll have the computer randomly sample 120 values from a distribution in which there is an equal chance of choosing any number between 1-6:

# Run the chi-square test using chisq.test
# Set simulate.p.value = TRUE to run the randomization-based test
# B = the number of randomizations to run
chisq.test(die, p = dieprob, rescale.p = TRUE, simulate.p.value = TRUE, B=5000)
## 
##  Chi-squared test for given probabilities with simulated p-value
##  (based on 5000 replicates)
## 
## data:  die
## X-squared = 2.8, df = NA, p-value = 0.7421


  1. The p-value from the randomization-based method differs slightly from that from the theory-based chi-squared distribution. What conditions are necessary in order to use the theoretical chi-squared distribution? Are those conditions met in this scenario?













Back to the fraud case: Is he guilty?

Let’s conduct this goodness-of-fit test on our potential fraud case:

# First, let's input the frequency of each first digit that we observed
nelsondigits <- c(1, 1, 0, 0, 0, 0, 3, 9, 1)

# Now, let's calculate the expected probabilities of each digit
#  First, I'll input the digits 1-9
digits <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
#  Now, I'll calculate the probabilities using Benford's Law
benford <- log10(1 + 1/digits)
#  Let's take a look at those probabilities
benford
## [1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679
## [7] 0.05799195 0.05115252 0.04575749
# Now we can run the chi-squared goodness-of-fit test
#  First via theory-based methods:
chisq.test(nelsondigits, p = benford)
## Warning in chisq.test(nelsondigits, p = benford): Chi-squared approximation
## may be incorrect
## 
##  Chi-squared test for given probabilities
## 
## data:  nelsondigits
## X-squared = 102.97, df = 8, p-value < 2.2e-16
#  Next with randomization-based methods
chisq.test(nelsondigits, p = benford, simulate.p.value = TRUE, B=5000)
## 
##  Chi-squared test for given probabilities with simulated p-value
##  (based on 5000 replicates)
## 
## data:  nelsondigits
## X-squared = 102.97, df = NA, p-value = 2e-04
  1. What conclusions can we draw? Explain why the p-value from the theory-based method differs so much from the randomization-based method?














Another Goodness-of-Fit Example: Exponential distribution

Testing for goodness-of-fit allows us to test whether a sample of data come from a theoretical distribution. Suppose the following data represent time between breakdowns for a machine:

times <- c(0.1, 0.1, 0.6, 0.6, 1.3, 1.6, 2.4, 2.4, 3.7, 3.7, 4.9,
           5.0, 6.2, 6.3, 10.3, 10.6, 0.1, 0.2, 0.7, 0.7, 1.7, 1.8,
           2.4, 2.7, 4.1, 4.1, 5.2, 5.3, 7.4, 7.5, 10.9, 12.3, 0.2,
           0.3, 0.3, 0.4, 0.8, 0.9, 0.9, 1.0, 1.9, 1.9, 2.0, 2.0,
           2.8, 2.8, 2.8, 3.0, 4.2, 4.2, 4.3, 4.3, 5.3, 5.7, 5.7,
           5.9, 7.5, 7.7, 8.1, 8.6, 13.9, 13.7, 13.9, 14.8, 0.5, 0.6,
           0.6, 1.1, 1.2, 1.3, 2.1, 2.2, 2.2, 3.1, 3.5, 3.7, 4.5, 
           4.9, 4.9, 6.1, 6.2, 6.2, 9.2, 9.5, 10.0, 14.9, 17.3, 17.6)


We want to know if the exponential distribution is an appropriate model for the time we wait until the machine breaks down.

To test this, let’s place our sample data into bins. I’ll arbitrarily choose bins of width = 3 units:


Drawing


That last bin only has 2 observations, so I’ll merge it with the preceding bin:


Drawing


Now that we have our observed data in bins, we need to calculate our expected frequencies (or probabilities). We could do this by hand, using the exponential distribution:


Drawing


We could then calculate our chi-square goodness-of-fit statistic:


Drawing


We could also do this in R:

# The exponential distribution is defined by lambda = rate = 1 / E(x)
# Calculate probabilities for each bin
# This calculates probabilities under the exponential distribution
p1 <- pexp(3, rate = 1/mean(times), lower.tail = TRUE)  #P(0 < x < 3)
p2 <- pexp(6, rate = 1/mean(times)) - pexp(3, rate = 1/mean(times)) #P(3<x<6)
p3 <- pexp(9, rate = 1/mean(times)) - pexp(6, rate = 1/mean(times)) #P(6<x<9)
p4 <- pexp(12, rate = 1/mean(times)) - pexp(9, rate = 1/mean(times)) #P(9<x<12)
p5 <- pexp(12, rate = 1/mean(times), lower.tail = FALSE) #P(X > 12)

# Store all the probabilities as "probs"
probs <- c(p1, p2, p3, p4, p5)

# There's an easier way to do the following, but I think this makes sense
# Let's store our observed data as frequencies in each bin
# To do this, I'm going to "cut" my data into 5 bins
#  The bins are defined by the endpoints 0, 3, 6, 9, 12, and infinity
#  I name the bins "1", "2", "3", "4", and "5"
cuts <- cut(times, breaks = c(0, 2.99, 6, 9, 12, Inf), labels=1:5)

# Now I can get frequencies for each bin
tally(~cuts, margins=TRUE)
## 
##     1     2     3     4     5 Total 
##    40    23    11     6     8    88
# I'll store those frequencies in a vector called "timebins"
# This code does it automatically
timebins <- c(length(cuts[cuts==1]), length(cuts[cuts==2]), 
              length(cuts[cuts==3]), length(cuts[cuts==4]),
              length(cuts[cuts==5]))

# Now I run the chi-squared test
chisq.test(timebins, p = probs)
## 
##  Chi-squared test for given probabilities
## 
## data:  timebins
## X-squared = 0.36087, df = 4, p-value = 0.9856
#  Next with randomization-based methods
chisq.test(timebins, p = probs, simulate.p.value = TRUE, B=5000)
## 
##  Chi-squared test for given probabilities with simulated p-value
##  (based on 5000 replicates)
## 
## data:  timebins
## X-squared = 0.36087, df = NA, p-value = 0.9886


  1. What conclusions can we draw? Explain how this method could be used to assess the normality assumption (prior to conducting an ANOVA, perhaps).













Scenario: Male pattern heart disease?

Researchers selected a sample of 663 heart disease patients and a control group of 772 males not suffering from heart disease. Each subject was classified into one of 5 baldness categories:

Drawing


Drawing


# Mosaic plot
mosaicplot(table(baldheart))

  1. Do these results give us evidence that heart disease is associated with baldness? Calculate the relative risk of heart disease for each category of baldness. Then, calculate odds ratios.













Can we use a chi-square test to test for a relationship between heart disease in baldness? We can if we’re able to derive our expectations for each cell.


Expectation: Independence

  1. Write out a null hypothesis for the relationship between heart disease and baldness.









  1. What’s it mean for two variables, A and B, to be independent?









  1. If A and B are independent, P(A | B) = ___________ and P(A and B) = ____________________?







  1. Suppose we’re interested in the relationship between gender and the result of a coin toss. We get 50 men and 50 women to each flip a coin and record the result. Assuming gender has no influence on the result of a coin toss, fill-in the expectations for each cell:

Drawing







  1. Derive a general formula to calculate the expectations for any rxc contingency table.











Suppose we conduct this experiment and get the following results:


Drawing


Fisher’s Exact Test


  1. Look at the top-left cell. We expected 25 but observed 30. Using the hypergeometric distribution, calculate the probability of observing 30 or more males flipping heads if gender and coin toss results are independent.









We can conduct this test (Fisher’s Exact Test) in R:

# P(30 or more heads) = 1 - P(29 or fewer heads), where
#   m = 40 heads in the population
#   n = 60 tails in the population
#   k = 50 chosen for males
1 - phyper(29, 40, 60, 50, lower.tail=TRUE)
## [1] 4.154329e-05
# Enter the data for Fisher's Test
coin <- c(rep("heads", 40), rep("tails", 60))
gender <- c(rep("male", 30), rep("female", 10), rep("male", 20), rep("female", 40))
coingender <- data.frame(coin, gender)

# Examine the data
tally(coin ~ gender, margins = TRUE, data=coingender)
##        gender
## coin    female male
##   heads     10   30
##   tails     40   20
##   Total     50   50
# Fisher's test
fisher.test(coingender$gender, coingender$coin, alternative="less")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  coingender$gender and coingender$coin
## p-value = 4.154e-05
## alternative hypothesis: true odds ratio is less than 1
## 95 percent confidence interval:
##  0.0000000 0.3864017
## sample estimates:
## odds ratio 
##  0.1700317


You can see the results from Fisher’s Exact Test match the calculation from the hypergeometric distribution. This method is more difficult to use with larger numbers (or when our table isn’t a 2x2 table).

Chi-square test for independence

Since we have expected and observed observations for categorical variables, we could also calculate a chi-square statistic.

  1. Compute a chi-squared statistic and estimate a p-value. How many degrees-of-freedom does your test statistic have?

Drawing













We can conduct this chi-square test for independence in R:

# Our data are still stored in the "coingender" data.frame
# We use chisq.test(x, y) to run the test
chisq.test(coingender$gender, coingender$coin, correct=FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  coingender$gender and coingender$coin
## X-squared = 16.667, df = 1, p-value = 4.456e-05
# We can estimate the p-value with randomization-based methods
chisq.test(coingender$gender, coingender$coin, simulate.p.value=TRUE, B=5000)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 5000
##  replicates)
## 
## data:  coingender$gender and coingender$coin
## X-squared = 16.667, df = NA, p-value = 2e-04


So… heart disease and baldness?

Let’s enter our heart disease / baldness data into R and run both Fisher’s Exact Test and a Chi-square Test for Independence:

# Enter data
baldness <- c(rep("none", 582), rep("little", 386), rep("some", 380), rep("much", 84), rep("extreme", 3))

disease <- c(rep("yes", 251), rep("no", 331), rep("yes", 165), rep("no", 221), rep("yes", 195), rep("no", 185), rep("yes", 50), rep("no", 34), rep("yes", 2), rep("no", 1))

baldheart <- data.frame(baldness, disease)

# Check for entry errors
tally(disease ~ baldness, margins = TRUE, data=baldheart)
##        baldness
## disease extreme little much none some
##   no          1    221   34  331  185
##   yes         2    165   50  251  195
##   Total       3    386   84  582  380
# Run Fisher's Exact Test
fisher.test(baldheart$baldness, baldheart$disease, alternative="less")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  baldheart$baldness and baldheart$disease
## p-value = 0.003818
## alternative hypothesis: less
# Chi-square test for independence
chisq.test(baldheart$baldness, baldheart$disease, correct=FALSE)
## Warning in chisq.test(baldheart$baldness, baldheart$disease, correct =
## FALSE): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  baldheart$baldness and baldheart$disease
## X-squared = 14.57, df = 4, p-value = 0.005682
# We can estimate the p-value with randomization-based methods
chisq.test(baldheart$baldness, baldheart$disease, simulate.p.value=TRUE, B=5000)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 5000
##  replicates)
## 
## data:  baldheart$baldness and baldheart$disease
## X-squared = 14.57, df = NA, p-value = 0.005199


  1. Based on these results, what conclusions can you make?










Chi-squared tests - notes and warnings

Conditions of the chi-square test

One problem with the chi-square test is that the sampling distribution of the test statistic we calculate has an approximate chi-square distribution. With larger sample sizes, the approximation is good enough. With smaller sample sizes, however, p-values estimated from the chi-square distribution can be inaccurate.

This is why many textbooks recommend you only use chi-square tests when the expected frequencies in each cell are at least 5. Even with large overall sample sizes, expected cell frequencies below 5 result in a loss of statistical power.

When you have cells with less than 5 observations, you can:

    • Use Fisher’s Exact Test

    • Use a likelihood ratio test (like the g-test we’ll see in a bit)

    • Merge cells into larger bins to ensure observed frequencies are at least 5

Another condition of the chi-square test is independence. All observations must contribute to only one cell of the contingency table. You cannot use chi-square tests on repeated-measures designs.


Effect sizes

If you have a large sample size, the chi-square test will produce a low p-value… even if the difference in proportions is small. It’s imperative that whenever you analyze contingency tables, you calculate some sort of effect size. You could calculate relative risk or odds ratios to give an estimate of the differences among categories.


Yates’s Correction

When you have a 2x2 contingency table, the chi-square test (with one degree of freedom) tends to produce p-values that are too low (i.e., it makes Type I errors). One suggestion to address this issue is Yates’s continuity correction.

Yates’s correction simply has you subtract 0.5 from the absolute value of the deviation in the numerator of our chi-square test statistic:

      \(\chi _{\textrm{Yates}}^{2}=\frac{\left ( \left | O_{i}-E_{i} \right |-0.5 \right )^{2}}{E_{i}}\)

This correction lowers our chi-square test statistic, thereby yielding a higher p-value. Although this may seem like a nice solution, there is evidence that this correction over-corrects and produces chi-square values that are too small.

If you want to use Yates’s correction in R, you simply tell R that correct=TRUE:

# Chi-square test for that baldness data
chisq.test(baldheart$baldness, baldheart$disease, correct=TRUE)
## Warning in chisq.test(baldheart$baldness, baldheart$disease, correct =
## TRUE): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  baldheart$baldness and baldheart$disease
## X-squared = 14.57, df = 4, p-value = 0.005682


Likelihood ratio tests (G-test)

An alternative to the chi-square test (that happens to produce results similar to the chi-square test) is the G-test.

The G-test is an example of a likelihood ratio test (LRT). We’ll see other likelihood ratio tests later in the course. For now, let’s conduct an LRT on a simple dataset.

Scenario: Graduating from college

Suppose you hypothesize 75% of St. Ambrose students graduate within 6 years. You then collect a sample of 1000 students and find 770 of them graduated within 6 years.

You can think of this data as frequencies across two cells: 770 (graduated) and 230 (did not).

With this, we could calculate a chi-square goodness-of-fit test:

# Input data
graduated <- c(770, 230)

# Input probabilities
probabilities <- c(.75, .25)

# Run the test
chisq.test(graduated, p = probabilities)
## 
##  Chi-squared test for given probabilities
## 
## data:  graduated
## X-squared = 2.1333, df = 1, p-value = 0.1441


We could also calculate the likelihood of observing this data under our hypothesis. If we think of each student as an independent trial with two possible outcomes (1 = graduated; 0 = did not), we can use the binomial distribution.

We want to calculate P(X = 770) under a binomial distribution with n = 1000 trials and p = 0.75 (our hypothesized value). Let’s store this likelihood as null:

# P(X = 770) in binomial distribution with n=1000, p=0.75
null <- dbinom(770, size=1000, prob=0.75)
null
## [1] 0.0101099


Now let’s an alternative hypothesis that the true proportion of students who graduate is 77%. We’ll store this result as alternative:

# P(X = 770) in binomial distribution with n=1000, p=0.77
alternative <- dbinom(770, size=1000, prob=0.77)
alternative
## [1] 0.02996627


From these results, we’d conclude the data we obtained were more likely to be obtained if the true proportion of St. Ambrose students that graduate within 6 years is 77%.

Let’s take the ratio of the null likelihood to the alternative likelihood:

# Likelihood ratio
LR <- null / alternative
LR
## [1] 0.337376


This ratio will get smaller as the observed results get further away from our null hypothesis. If we take the natural log of this ratio and multiply it by -2, we get the log-likelihood (or G-statistic).

      \(G=-2\textrm{ln}\frac{L_{\textrm{null}}}{L_{\textrm{alternative}}}\)

We multiply by -2 and take the natural log because it results in a test statistic that approximates a chi-square distribution.

Let’s calculate this statistic for our graduation example and determine its likelihood from a chi-square distribution:

# Calculate G-statistic
G <- -2 * log(LR)
G
## [1] 2.173115
# Compare it to chi-square distribution with df = 2-1
pchisq(G, df=1, lower.tail=FALSE)
## [1] 0.1404415


For this simple example, our p-value is 0.140. From this, we have very little evidence to support the belief that 77% of students graduate from St. Ambrose within 6 years.


Calculating the G-statistic can be tedious. Thankfully, there’s an easier formula to use (based on the fact that in the ratio of likelihoods, a lot of stuff cancels out):

      \(G=2\sum_{i=1}^{c}\left [ O_{i}-\textrm{ln}\frac{O_{i}}{E_{i}} \right ]\)


The G-test can also be used to test for independence. You can see how to conduct this test in R here.


G-test vs. Chi-square test

Since both the G-test and the chi-squared test can be used for both goodness-of-fit and independence, which test should you use? It depends.

The G-test is more flexible and can be used for more elaborate (and additive) experimental designs. The chi-square test, on the other hand, is more easily understood and explained.

The G-test uses the multinomial distribution in its calculation, whereas the chi-square test provides only an approximation.




Your turn

  1. Criminologists have long debated whether there is a relationship between weather conditions and the incidence of violent crime. The article Is There a Season for Homocide? (Criminology, 1988: 287-296) classified 1361 homicides according to season, resulting in table displayed below. Conduct a chi-square goodness-of-fit test (and the randomization-based chi-square test) to determine if seasons are associated with different numbers of violent crimes.


Drawing


  1. You may remember the dataset we analyzed in lesson #3 regarding the amount of weight lost by subjects on various diets (e.g., Atkins, WeightWatchers). Prior to analyzing that data, we tested the homogeneity of variance assumption with Levene’s test. Let’s test for normality using a chi-square goodness-of-fit test.



  1. Is acupuncture effective in reducing pain? A 2007 study looked at 1,162 patients with chronic lower back pain. The subjects were randomly assigned to one of 3 groups:
      • Verum acupuncture practiced according to traditional Chinese principles,
      • Sham acupuncture where needles were inserted into the skin but not at acupuncture points,
      • Traditional (non-acupuncture) therapy consisting of drugs, physical therapy, and exercise.

    Results are displayed in the table below.

    (a) Calculate the probability that an individual undergoing verum acupuncture experienced a substantial reduction in pain.

    (b) Calculate this same probability for the subjects in the traditional group.

    (c) Calculate and interpret the ratio of the two probabilities you just calculated.

    (d) Run a chi-square test for independence and interpret the result.


Drawing



  1. The table displayed below shows the number of black and white defendants sentenced to death in the 1960s-70s for murdering a white victim. As you can see, a higher proportion of black defendants were sentenced to death.

    (a) Run Fisher’s Exact Test to calculate the likelihood of observing results as or more extreme than these results if, in fact, race and the death penalty were independent.

    (b) Run a chi-square test for independence.

Drawing




Publishing your solutions

When you’re ready to try the Your Turn exercises: