8.5: Matched or Paired Samples

Last updated
Save as PDF

Page ID: 79118

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In most cases of economic or business data we have little or no control over the process of how the data are gathered. In this sense the data are not the result of a planned controlled experiment. In some cases, however, we can develop data that are part of a controlled experiment. This situation occurs frequently in quality control situations. Imagine that the production rates of two machines built to the same design, but at different manufacturing plants, are being tested for differences in some production metric such as speed of output or meeting some production specification such as strength of the product. The test is the same in format to what we have been testing, but here we can have matched pairs for which we can test if differences exist. Each observation has its matched pair against which differences are calculated. First, the differences in the metric to be tested between the two lists of observations must be calculated, and this is typically labeled with the letter "d." Then, the average of these matched differences, \(\overline{x}_{d}\) is calculated as is its standard deviation, \(s_d\). We expect that the standard deviation of the differences of the matched pairs will be smaller than unmatched pairs because presumably fewer differences should exist because of the correlation between the two groups.

When using a hypothesis test for matched or paired samples, the following characteristics may be present:

Simple random sampling is used.
Sample sizes are often small.
Two measurements (samples) are drawn from the same pair of individuals or objects.
Differences are calculated from the matched or paired samples.
The differences form the sample that is used for the hypothesis test.
Either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so that distribution of the sample mean of differences is approximately normal.

In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are calculated. The differences are the data. The population mean for the differences, \(\mu_d\), is then tested using a Student's t-test for a single population mean with \(n – 1\) degrees of freedom, where \(n\) is the number of differences, that is, the number of pairs not the number of observations.

\[\textbf{The null and alternative hypotheses for this test are:}\nonumber\]

\[H_{0} : \mu_{d}=0\nonumber\]

\[H_{a} : \mu_{d} \neq 0\nonumber\]

\[\textbf{The test statistic is:}\nonumber\]

\[t_{obs}=\frac{\overline{x}_{d}-\mu_{d}}{\left(\frac{s_{d}}{\sqrt{n}}\right)}\nonumber\]

Example \(\PageIndex{1}\)

A company has developed a training program for its entering employees because they have become concerned with the results of the six-month employee review. They hope that the training program can result in better six-month reviews. Each trainee constitutes a “pair”, the entering score the employee received when first entering the firm and the score given at the six-month review. The difference in the two scores were calculated for each employee and the means for before and after the training program was calculated. The sample mean before the training program was 20.4 and the sample mean after the training program was 23.9. The standard deviation of the differences in the two scores across the 20 employees was 3.8 points. Test at the 5% significance level the null hypothesis that the training program makes no difference or worsens employees’ scores against the alternative that the training program helps improve the employees’ scores.

Answer

The first step is to identify this as a two sample case: before the training and after the training. This differentiates this problem from simple one sample issues. Second, we determine that the two samples are "paired." Each observation in the first sample has a paired observation in the second sample. This information tells us that the null and alternative hypotheses should be:

\[H_{0} : \mu_{d} \leq 0\nonumber\]

\[H_{a} : \mu_{d}>0\nonumber\]

This form reflects the implied claim that the training course improves scores; the test is one-tailed and the claim is in the alternative hypothesis. Because the experiment was conducted as a matched paired sample rather than simply taking scores from people who took the training course those who didn't, we use the matched pair test statistic:

\[\text{Test Statistic: }t_{obs}=\frac{\overline{x}_{d}-\mu_{d}}{\frac{s_{d}}{\sqrt{n}}}=\frac{(23.9-20.4)-0}{\left(\frac{3.8}{\sqrt{20}}\right)}=4.12\nonumber\]

In order to solve this equation, the individual scores (not provided here), pre-training course and post-training course would be needed to be used to calculate the standard deviation across the individual differences. From these differences we would calculate the standard deviation across the individual differences as follows:

\[s_{d}=\sqrt{\frac{\Sigma\left(x_{d}-\bar{x}_{d}\right)^{2}}{n_{d}-1}}\nonumber\]

We can now compare the calculated value of the test statistic, 4.12, with the critical value. The critical value is a Student's t with degrees of freedom equal to the number of pairs, not observations, minus 1. In this case 20 pairs and at 95% confidence level \(t_{a/2} = 1.729\) at \(df = 20 - 1 = 19\). The calculated test statistic is most certainly in the tail of the distribution and thus we reject the null hypothesis that there is no difference from the training program. Evidence seems indicate that the training aids employees in gaining higher scores.

Example \(\PageIndex{2}\)

A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results for randomly selected subjects are shown in the table below. A lower score indicates less pain. The "before" value is matched to an "after" value and the differences are calculated. Are the sensory measurements, on average, lower after hypnotism? Test at a 5% significance level.

Subject:	A	B	C	D	E	F	G	H
Before	6.6	6.5	9.0	10.3	11.3	8.1	6.3	11.6
After	6.8	2.4	7.4	8.5	8.1	6.1	3.4	2.0

Table \(\PageIndex{1}\)

Answer

Corresponding "before" and "after" values form matched pairs. (Calculate "after" – "before".)

After data	Before data	Difference
6.8	6.6	0.2
2.4	6.5	-4.1
7.4	9	-1.6
8.5	10.3	-1.8
8.1	11.3	-3.2
6.1	8.1	-2
3.4	6.3	-2.9
2	11.6	-9.6

Table \(\PageIndex{2}\)

The data for the test are the differences: \(\{0.2,-4.1,-1.6,-1.8,-3.2,-2,-2.9,-9.6\}\)

The sample mean and sample standard deviation of the differences are: \(\overline{x}_{d}=-3.13\) and \(s_{d}=2.91\). Verify these values.

Let \(\mu_d\) be the population mean for the differences. We use the subscript d to denote "differences."

Random variable: \(\overline{x}_{d}\) = the mean difference of the sensory measurements

\(H_{0} : \mu_{d} \geq 0\)

The null hypothesis is zero or positive, meaning that there is the same or more pain felt after hypnotism. That means the subject shows no improvement. (\(\mu_d\) is the population mean of the differences.)

\(H_{a} : \mu_{d}<0\)

The alternative hypothesis is negative, meaning there is less pain felt after hypnotism. That means the subject shows improvement. The score should be lower after hypnotism, so the difference ought to be negative to indicate improvement.

Distribution for the test: The distribution is a Student's \(t\) with \(d f=n-1=8-1=7\). Use \(t_7\). (Notice that the test is for a single population mean.)

Calculate the test statistic and look up the critical value using the Student's t-distribution: The calculated value of the test statistic is -3.06 and the critical value of the \(t\) distribution with 7 degrees of freedom at the 5% level of confidence is -1.895 with a one-tailed (in this case, left-tailed) test.

Normal distribution curve of the average difference of sensory measurements with values of -3.13 and 0. A vertical upward line extends from -3.13 to the curve, and the p-value is indicated in the area to the left of this value.

Compare the critical value for alpha against the calculated test statistic.

The conclusion from using the comparison of the calculated test statistic and the critical value will gives us the result. In this question the calculated test statistic is 3.06 and the critical value is 1.895. The test statistic is clearly in the tail and thus we must reject the null hypotheses that there is no difference between the two situations, hypnotized and not hypnotized.

Make a decision: Reject the null hypothesis, \(H_0\). This means that \(\mu_{d}<0\) and there is a statistically significant improvement.

Conclusion: At a 5% level of significance, from the sample data, there is sufficient evidence to conclude that the sensory measurements, on average, are lower after hypnotism. Hypnotism appears to be effective in reducing pain.

Example \(\PageIndex{3}\)

A college football coach was interested in whether the college's strength development class increased his players' maximum lift (in pounds) on the bench press exercise. He asked four of his players to participate in a study. The amount of weight they could each lift was recorded before they took the strength development class. After completing the class, the amount of weight they could each lift was again measured. The data are as follows:

Weight (in pounds)	Player 1	Player 2	Player 3	Player 4
Amount of weight lifted prior to the class	205	241	338	368
Amount of weight lifted after the class	295	252	330	360

Table \(\PageIndex{3}\)

Answer

The coach wants to know if the strength development class makes his players stronger, on average.
Record the differences data. Calculate the differences by subtracting the amount of weight lifted prior to the class from the weight lifted after completing the class. The data for the differences are: \(\{90, 11, -8, -8\}\).

\(\overline{x}_{d}=21.3, s_{d}=46.7\)

Using the difference data, this becomes a test of a single mean.

Define the random variable: \(\overline{x}_{d}\) mean difference in the maximum lift per player.

The distribution for the hypothesis test is a Student's t with 3 degrees of freedom.

\(H_{0} : \mu_{d} \leq 0, H_{a} : \mu_{d}>0\)

Normal distribution curve with values of 0 and 21.3. A vertical upward line extends from 21.3 to the curve and the p-value is indicated in the area to the right of this value.

Calculate the test statistic look up the critical value: The calculated value of the test statistic is 0.91. The critical value of the Student's t at 5% level of significance, a one-tailed test, and 3 degrees of freedom is 2.353.

Decision: If the level of significance is 5%, we cannot reject the null hypothesis, because the calculated value of the test statistic is not in the tail.

What is the conclusion?

At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the strength development class helped to make the players stronger, on average.