# 10.9: Chapter 10 Homework

- Page ID
- 79098

## 10.2 The Correlation Coefficient *r*

**1**. In order to have a correlation coefficient between traits \(A\) and \(B\), it is necessary to have:

- one group of subjects, some of whom possess characteristics of trait \(A\), the remainder possessing those of trait \(B\)
- measures of trait \(A\) on one group of subjects and of trait \(B\) on another group
- two groups of subjects, one which could be classified as \(A\) or not \(A\), the other as \(B\) or not \(B\)

**2**. Define the correlation coefficient and give a unique example of its use.

**3**. If the correlation between age of an auto and money spent for repairs is +.90.

- 81% of the variation in the money spent for repairs is explained by the age of the auto
- 81% of money spent for repairs is unexplained by the age of the auto
- 90% of the money spent for repairs is explained by the age of the auto
- none of the above

**4**. Suppose that college grade-point average and verbal portion of an IQ test had a correlation of .40. What percentage of the variance do these two have in common?

- 20
- 16
- 40
- 80

**5**. True or false? If false, explain why: The coefficient of determination can have values between -1 and +1.

**6**. True or False: Whenever \(r\) is calculated on the basis of a sample, the value which we obtain for \(r\) is only an estimate of the true correlation coefficient which we would obtain if we calculated it for the entire population.

**7**. Under a "scatter diagram" there is a notation that the coefficient of correlation is .10. What does this mean?

- plus and minus 10% from the means includes about 68% of the cases
- one-tenth of the variance of one variable is shared with the other variable
- one-tenth of one variable is caused by the other variable
- on a scale from -1 to +1, the degree of linear relationship between the two variables is +.10

**8**. The correlation coefficient for \(X\) and \(Y\) is known to be zero. We then can conclude that:

- X and \(Y\) have standard distributions
- the variances of \(X\) and \(Y\) are equal
- there exists no relationship between \(X\) and Y
- there exists no linear relationship between \(X\) and Y
- none of these

**9**. What would you guess the value of the correlation coefficient to be for the pair of variables: "number of hours worked" and "number of units of work completed"?

- Approximately 0.9
- Approximately 0.4
- Approximately 0.0
- Approximately -0.4
- Approximately -0.9

**10**. In a given group, the correlation between height measured in feet and weight measured in pounds is +.68. Which of the following would alter the value of \(r\)?

- height is expressed centimeters.
- weight is expressed in kilograms.
- both of the above will affect \(r\).
- neither of the above changes will affect \(r\).

## 10.3 Testing the Significance of the Correlation Coefficient

**11. **Use the dataset below to determine whether there is a significant correlation between the following monthly returns. Use the 95% confidence level.

Month |
Apple Inc. |
S&P 500 ETF |
Southern California Edison |

Jan | 1 | 1 | 6 |

Feb | 4 | 3 | 5 |

Mar | 10 | 1 | 2 |

Apr | 6 | 4 | 4 |

May | -13 | -6 | -3 |

Jun | 13 | 6 | 1 |

Jul | 8 | 2 | 5 |

Aug | -2 | -2 | 3 |

Sep | 7 | 1 | 0 |

Oct | 11 | 2 | -4 |

Nov | 7 | 4 | 0 |

Dec | 10 | 2 | 5 |

Table \(\PageIndex{1}\)

- Write the pair of hypotheses that would test whether there is a significant correlation between the monthly returns of Apple Inc. and S&P 500.
- Calculate the relevant correlation coefficient \(r\).
- Test the above hypotheses using test statistics. Interpret your results.
- Write the pair of hypotheses that would test whether there is a significant correlation between the monthly returns of Southern California Edison and S&P 500.
- Calculate the relevant correlation coefficient \(r\).
- Test the above hypotheses using test statistics. Interpret your results.
- Explain why there would be a difference between the results in part c. and part f.

**12**. The correlation between scores on a neuroticism test and scores on an anxiety test is high and positive. Therefore, we can conclude that:

- anxiety causes neuroticism.
- those who score low on one test tend to score high on the other.
- those who score low on one test tend to score low on the other.
- no prediction from one test to the other can be meaningfully made.

## 10.4 Linear Equations

**13**. True or False? If False, correct it: Suppose a 95% confidence interval for the slope \(\beta\) of the straight line regression of \(Y\) on \(X\) is given by \(-3.5 < \beta < -0.5\). Then a two-tailed test of the hypothesis \(H_{0} : \beta=-1\) would result in rejection of \(H_0\) at the 1% level of significance.

**14**. True or False: It is safer to interpret correlation coefficients as measures of association rather than causation because of the possibility of spurious correlation.

**15**. We are interested in finding the linear relation between the number of widgets purchased at one time and the cost per widget. The following data has been obtained:

\(X\) = Number of widgets purchased: 1, 3, 6, 10, 15

\(Y\) = Cost per widget (in dollars): 55, 52, 46, 32, 25

Suppose the regression line is \(\hat{Y}=-2.5 X+60\). We compute the average price per widget if 30 are purchased and observe which of the following?

- \(\hat{Y}=15 \text { dollars }\); obviously, we are mistaken; the prediction \(\hat Y\) is actually +15 dollars.
- \(\hat{Y}=15 \text { dollars }\), which seems reasonable judging by the data.
- \(\hat{Y}=-15 \text { dollars }\), which is obvious nonsense. The regression line must be incorrect.
- \(\hat{Y}=-15 \text { dollars }\), which is obvious nonsense. This reminds us that predicting \(Y\) outside the range of \(X\) values in our data is a very poor practice.

**16**. Discuss briefly the distinction between correlation and causality.

**17**. True or False: If \(r\) is close to + or -1, we shall say there is a strong correlation, with the tacit understanding that we are referring to a linear relationship and nothing else.

## 10.5 The Regression Equation

**18**. Suppose that you have at your disposal the information below for each of 30 drivers. Propose a model (including a very brief indication of symbols used to represent independent variables) to explain how miles per gallon vary from driver to driver on the basis of the factors measured.

Information:

- miles driven per day
- weight of car
- number of cylinders in car
- average speed
- miles per gallon
- number of passengers

**19**. Consider a sample least squares regression analysis between a dependent variable (\(Y\)) and an independent variable (\(X\)). A sample correlation coefficient of −1 (minus one) tells us that

- there is no relationship between \(Y\) and \(X\) in the sample
- there is no relationship between \(Y\) and \(X\) in the population
- there is a perfect negative relationship between \(Y\) and \(X\) in the population
- there is a perfect negative relationship between \(Y\) and \(X\) in the sample.

**20**. In correlational analysis, when the points scatter widely about the regression line, this means that the correlation is

- negative.
- low.
- heterogeneous.
- between two measures that are unreliable.

**21**. In a linear regression, why do we need to be concerned with the range of the independent (\(X\)) variable?

**22**. ABC International wants to explore the relationship between the yearly marketing expenses and sales revenues (in millions USD) (see table below).

Marketing Expenses |
Sales Revenues |

4 | 8 |

2 | 4 |

8 | 18 |

6 | 22 |

10 | 30 |

6 | 8 |

Table \(\PageIndex{2}\)

- Determine the regression equation predicting yearly sales revenues from marketing expenses.
- Write the pair of hypotheses that would test whether marketing expenses are a significant predictor of sales revenues.
- Test the above hypotheses using test statistics at a 95% confidence level.
- What would be your estimate of the average sales revenues for a company that spends 15 million USD on marketing per year?

**23**. An economist is interested in the possible influence of "Miracle Wheat" on the average yield of wheat in a district. To do so he fits a linear regression of average yield per year against year after introduction of "Miracle Wheat" for a ten year period.

The fitted trend line is \(\hat{Y}_{j}=80+1.5 X_{j}\).

(\(Y_j\): Average yield in \(j\) year after introduction)

(\(X_j\): \(j\) year after introduction).

- What is the estimated average yield for the fourth year after introduction?
- Do you want to use this trend line to estimate yield for, say, 20 years after introduction? Why? What would your estimate be?

**24**. An interpretation of \(r=0.5\) is that the following part of the \(Y\)-variation is associated with which variation in \(X\):

- most
- half
- very little
- one quarter
- none of these

**25**. Which of the following values of \(r\) indicates the most accurate prediction of one variable from another?

- \(r=1.18\)
- \(r=−.77\)
- \(r=.68\)