# 10.9: Chapter 13 Practice

- Page ID
- 51871

## 13.1 The Correlation Coefficient r

**1**.

In order to have a correlation coefficient between traits \(A\) and \(B\), it is necessary to have:

- one group of subjects, some of whom possess characteristics of trait \(A\), the remainder possessing those of trait \(B\)
- measures of trait \(A\) on one group of subjects and of trait \(B\) on another group
- two groups of subjects, one which could be classified as \(A\) or not \(A\), the other as \(B\) or not \(B\)
- two groups of subjects, one which could be classified as \(A\) or not \(A\), the other as \(B\) or not \(B\)

**2**.

Define the Correlation Coefficient and give a unique example of its use.

**3**.

If the correlation between age of an auto and money spent for repairs is +.90

- 81% of the variation in the money spent for repairs is explained by the age of the auto
- 81% of money spent for repairs is unexplained by the age of the auto
- 90% of the money spent for repairs is explained by the age of the auto
- none of the above

**4**.

Suppose that college grade-point average and verbal portion of an IQ test had a correlation of .40. What percentage of the variance do these two have in common?

**5**.

True or false? If false, explain why: The coefficient of determination can have values between -1 and +1.

**6**.

True or False: Whenever r is calculated on the basis of a sample, the value which we obtain for r is only an estimate of the true correlation coefficient which we would obtain if we calculated it for the entire population.

**7**.

Under a "scatter diagram" there is a notation that the coefficient of correlation is .10. What does this mean?

- plus and minus 10% from the means includes about 68% of the cases
- one-tenth of the variance of one variable is shared with the other variable
- one-tenth of one variable is caused by the other variable
- on a scale from -1 to +1, the degree of linear relationship between the two variables is +.10

**8**.

The correlation coefficient for \(X\) and \(Y\) is known to be zero. We then can conclude that:

- X and \(Y\) have standard distributions
- the variances of \(X\) and \(Y\) are equal
- there exists no relationship between \(X\) and Y
- there exists no linear relationship between \(X\) and Y
- none of these

**9**.

What would you guess the value of the correlation coefficient to be for the pair of variables: "number of man-hours worked" and "number of units of work completed"?

**10**.

In a given group, the correlation between height measured in feet and weight measured in pounds is +.68. Which of the following would alter the value of r?

## 13.2 Testing the Significance of the Correlation Coefficient

**11**.

Define a \(t\) Test of a Regression Coefficient, and give a unique example of its use.

**12**.

The correlation between scores on a neuroticism test and scores on an anxiety test is high and positive; therefore

## 13.3 Linear Equations

**13**.

True or False? If False, correct it: Suppose a 95% confidence interval for the slope \(\beta\) of the straight line regression of \(Y\) on \(X\) is given by \(-3.5 < \beta < -0.5\). Then a two-sided test of the hypothesis \(H_{0} : \beta=-1\) would result in rejection of \(H_0\) at the 1% level of significance.

**14**.

True or False: It is safer to interpret correlation coefficients as measures of association rather than causation because of the possibility of spurious correlation.

**15**.

We are interested in finding the linear relation between the number of widgets purchased at one time and the cost per widget. The following data has been obtained:

\(X\): Number of widgets purchased – 1, 3, 6, 10, 15

\(Y\): Cost per widget(in dollars) – 55, 52, 46, 32, 25

Suppose the regression line is \(\hat{y}=-2.5 x+60\). We compute the average price per widget if 30 are purchased and observe which of the following?

- \(\hat{y}=15 \text { dollars }\); obviously, we are mistaken; the prediction \(\hat y\) is actually +15 dollars.
- \(\hat{y}=15 \text { dollars }\), which seems reasonable judging by the data.
- \(\hat{y}=-15 \text { dollars }\, which is obvious nonsense. The regression line must be incorrect.
- \(\hat{y}=-15 \text { dollars }\), which is obvious nonsense. This reminds us that predicting \(Y\) outside the range of \(X\) values in our data is a very poor practice.

**16**.

Discuss briefly the distinction between correlation and causality.

**17**.

True or False: If \(r\) is close to + or -1, we shall say there is a strong correlation, with the tacit understanding that we are referring to a linear relationship and nothing else.

## 13.4 The Regression Equation

**18**.

Suppose that you have at your disposal the information below for each of 30 drivers. Propose a model (including a very brief indication of symbols used to represent independent variables) to explain how miles per gallon vary from driver to driver on the basis of the factors measured.

Information:

- miles driven per day
- weight of car
- number of cylinders in car
- average speed
- miles per gallon
- number of passengers

**19**.

Consider a sample least squares regression analysis between a dependent variable (\(Y\)) and an independent variable (\(X\)). A sample correlation coefficient of −1 (minus one) tells us that

- there is no relationship between \(Y\) and \(X\) in the sample
- there is no relationship between \(Y\) and \(X\) in the population
- there is a perfect negative relationship between \(Y\) and \(X\) in the population
- there is a perfect negative relationship between \(Y\) and \(X\) in the sample.

**20**.

In correlational analysis, when the points scatter widely about the regression line, this means that the correlation is

## 13.5 Interpretation of Regression Coefficients: Elasticity and Logarithmic Transformation

**21**.

In a linear regression, why do we need to be concerned with the range of the independent (\(X\)) variable?

**22**.

Suppose one collected the following information where \(X\) is diameter of tree trunk and \(Y\) is tree height.

X |
Y |

4 | 8 |

2 | 4 |

8 | 18 |

6 | 22 |

10 | 30 |

6 | 8 |

Regression equation: \(\hat{y}_{i}=-3.6+3.1 \cdot X_{i}\)

What is your estimate of the average height of all trees having a trunk diameter of 7 inches?

**23**.

The manufacturers of a chemical used in flea collars claim that under standard test conditions each additional unit of the chemical will bring about a reduction of 5 fleas (i.e. where \(X_{j}=\text { amount of chemical }\) and\(Y_{J}=B_{0}+B_{1} \cdot X_{J}+E_{J}\) ,\(H_0:B_1=−5\)

Suppose that a test has been conducted and results from a computer include:

Intercept = 60

Slope = −4

Standard error of the regression coefficient = 1.0

Degrees of Freedom for Error = 2000

95% Confidence Interval for the slope −2.04, −5.96

Is this evidence consistent with the claim that the number of fleas is reduced at a rate of 5 fleas per unit chemical?

## 13.6 Predicting with a Regression Equation

**24**.

True or False? If False, correct it: Suppose you are performing a simple linear regression of \(Y\) on \(X\) and you test the hypothesis that the slope \(\beta\) is zero against a two-sided alternative. You have \(n=25\) observations and your computed test (\(t\)) statistic is 2.6. Then your P-value is given by \(.01 < P < .02\), which gives borderline significance (i.e. you would reject \(H_0\) at \(\alpha=.02\) but fail to reject \(H_0\) at \(\alpha=.01\)).

**25**.

An economist is interested in the possible influence of "Miracle Wheat" on the average yield of wheat in a district. To do so he fits a linear regression of average yield per year against year after introduction of "Miracle Wheat" for a ten year period.

The fitted trend line is

\(\hat{y}_{j}=80+1.5 \cdot X_{j}\)

(\(Y_j\): Average yield in \(j\) year after introduction)

(\(X_j\): \(j\) year after introduction).

- What is the estimated average yield for the fourth year after introduction?
- Do you want to use this trend line to estimate yield for, say, 20 years after introduction? Why? What would your estimate be?

**26**.

An interpretation of \(r=0.5\) is that the following part of the \(Y\)-variation is associated with which variation in \(X\):

**27**.

Which of the following values of \(r\) indicates the most accurate prediction of one variable from another?

## 13.7 How to Use Microsoft Excel® for Regression Analysis

**28**.

A computer program for multiple regression has been used to fit \(\hat{y}_{j}=b_{0}+b_{1} \cdot X_{1 j}+b_{2} \cdot X_{2 j}+b_{3} \cdot X_{3 j}\).

Part of the computer output includes:

i |
\(b_i\) | \(S_{b_i}\) |

0 | 8 | 1.6 |

1 | 2.2 | .24 |

2 | -.72 | .32 |

3 | 0.005 | 0.002 |

- Calculation of confidence interval for \(b_2\) consists of _______\(\pm\) (a student's \(t\) value) (_______)
- The confidence level for this interval is reflected in the value used for _______.
- The degrees of freedom available for estimating the variance are directly concerned with the value used for _______

**29**.

An investigator has used a multiple regression program on 20 data points to obtain a regression equation with 3 variables. Part of the computer output is:

Variable |
Coefficient |
Standard Error of \(bf{b_i}\) |

1 | 0.45 | 0.21 |

2 | 0.80 | 0.10 |

3 | 3.10 | 0.86 |

- 0.80 is an estimate of ___________.
- 0.10 is an estimate of ___________.
- Assuming the responses satisfy the normality assumption, we can be 95% confident that the value of \(\beta_2\) is in the interval,_______ ± [\(t_{.025} \cdot \) _______], where \(t_{.025}\) is the critical value of the student's t distribution with ____ degrees of freedom.