# 10.9: Chapter 10 Homework

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

## 10.2 The Correlation Coefficient r

1. In order to have a correlation coefficient between traits $$A$$ and $$B$$, it is necessary to have:

1. one group of subjects, some of whom possess characteristics of trait $$A$$, the remainder possessing those of trait $$B$$
2. measures of trait $$A$$ on one group of subjects and of trait $$B$$ on another group
3. two groups of subjects, one which could be classified as $$A$$ or not $$A$$, the other as $$B$$ or not $$B$$

2. Define the correlation coefficient and give a unique example of its use.

3. If the correlation between age of an auto and money spent for repairs is +.90.

1. 81% of the variation in the money spent for repairs is explained by the age of the auto
2. 81% of money spent for repairs is unexplained by the age of the auto
3. 90% of the money spent for repairs is explained by the age of the auto
4. none of the above

4. Suppose that college grade-point average and verbal portion of an IQ test had a correlation of .40. What percentage of the variance do these two have in common?

1. 20
2. 16
3. 40
4. 80

5. True or false? If false, explain why: The coefficient of determination can have values between -1 and +1.

6. True or False: Whenever $$r$$ is calculated on the basis of a sample, the value which we obtain for $$r$$ is only an estimate of the true correlation coefficient which we would obtain if we calculated it for the entire population.

7. Under a "scatter diagram" there is a notation that the coefficient of correlation is .10. What does this mean?

1. plus and minus 10% from the means includes about 68% of the cases
2. one-tenth of the variance of one variable is shared with the other variable
3. one-tenth of one variable is caused by the other variable
4. on a scale from -1 to +1, the degree of linear relationship between the two variables is +.10

8. The correlation coefficient for $$X$$ and $$Y$$ is known to be zero. We then can conclude that:

1. X and $$Y$$ have standard distributions
2. the variances of $$X$$ and $$Y$$ are equal
3. there exists no relationship between $$X$$ and Y
4. there exists no linear relationship between $$X$$ and Y
5. none of these

9. What would you guess the value of the correlation coefficient to be for the pair of variables: "number of hours worked" and "number of units of work completed"?

1. Approximately 0.9
2. Approximately 0.4
3. Approximately 0.0
4. Approximately -0.4
5. Approximately -0.9

10. In a given group, the correlation between height measured in feet and weight measured in pounds is +.68. Which of the following would alter the value of $$r$$?

1. height is expressed centimeters.
2. weight is expressed in kilograms.
3. both of the above will affect $$r$$.
4. neither of the above changes will affect $$r$$.

## 10.3 Testing the Significance of the Correlation Coefficient

11. Use the dataset below to determine whether there is a significant correlation between the following monthly returns. Use the 95% confidence level.

 Month Apple Inc. S&P 500 ETF Southern California Edison Jan 1 1 6 Feb 4 3 5 Mar 10 1 2 Apr 6 4 4 May -13 -6 -3 Jun 13 6 1 Jul 8 2 5 Aug -2 -2 3 Sep 7 1 0 Oct 11 2 -4 Nov 7 4 0 Dec 10 2 5

Table $$\PageIndex{1}$$

1. Write the pair of hypotheses that would test whether there is a significant correlation between the monthly returns of Apple Inc. and S&P 500.
2. Calculate the relevant correlation coefficient $$r$$.
3. Test the above hypotheses using test statistics. Interpret your results.
4. Write the pair of hypotheses that would test whether there is a significant correlation between the monthly returns of Southern California Edison and S&P 500.
5. Calculate the relevant correlation coefficient $$r$$.
6. Test the above hypotheses using test statistics. Interpret your results.
7. Explain why there would be a difference between the results in part c. and part f.

12. The correlation between scores on a neuroticism test and scores on an anxiety test is high and positive. Therefore, we can conclude that:

1. anxiety causes neuroticism.
2. those who score low on one test tend to score high on the other.
3. those who score low on one test tend to score low on the other.
4. no prediction from one test to the other can be meaningfully made.

## 10.4 Linear Equations

13. True or False? If False, correct it: Suppose a 95% confidence interval for the slope $$\beta$$ of the straight line regression of $$Y$$ on $$X$$ is given by $$-3.5 < \beta < -0.5$$. Then a two-tailed test of the hypothesis $$H_{0} : \beta=-1$$ would result in rejection of $$H_0$$ at the 1% level of significance.

14. True or False: It is safer to interpret correlation coefficients as measures of association rather than causation because of the possibility of spurious correlation.

15. We are interested in finding the linear relation between the number of widgets purchased at one time and the cost per widget. The following data has been obtained:

$$X$$ = Number of widgets purchased: 1, 3, 6, 10, 15

$$Y$$ = Cost per widget (in dollars): 55, 52, 46, 32, 25

Suppose the regression line is $$\hat{Y}=-2.5 X+60$$. We compute the average price per widget if 30 are purchased and observe which of the following?

1. $$\hat{Y}=15 \text { dollars }$$; obviously, we are mistaken; the prediction $$\hat Y$$ is actually +15 dollars.
2. $$\hat{Y}=15 \text { dollars }$$, which seems reasonable judging by the data.
3. $$\hat{Y}=-15 \text { dollars }$$, which is obvious nonsense. The regression line must be incorrect.
4. $$\hat{Y}=-15 \text { dollars }$$, which is obvious nonsense. This reminds us that predicting $$Y$$ outside the range of $$X$$ values in our data is a very poor practice.

16. Discuss briefly the distinction between correlation and causality.

17. True or False: If $$r$$ is close to + or -1, we shall say there is a strong correlation, with the tacit understanding that we are referring to a linear relationship and nothing else.

## 10.5 The Regression Equation

18. Suppose that you have at your disposal the information below for each of 30 drivers. Propose a model (including a very brief indication of symbols used to represent independent variables) to explain how miles per gallon vary from driver to driver on the basis of the factors measured.

Information:

1. miles driven per day
2. weight of car
3. number of cylinders in car
4. average speed
5. miles per gallon
6. number of passengers

19. Consider a sample least squares regression analysis between a dependent variable ($$Y$$) and an independent variable ($$X$$). A sample correlation coefficient of −1 (minus one) tells us that

1. there is no relationship between $$Y$$ and $$X$$ in the sample
2. there is no relationship between $$Y$$ and $$X$$ in the population
3. there is a perfect negative relationship between $$Y$$ and $$X$$ in the population
4. there is a perfect negative relationship between $$Y$$ and $$X$$ in the sample.

20. In correlational analysis, when the points scatter widely about the regression line, this means that the correlation is

1. negative.
2. low.
3. heterogeneous.
4. between two measures that are unreliable.

21. In a linear regression, why do we need to be concerned with the range of the independent ($$X$$) variable?

22. ABC International wants to explore the relationship between the yearly marketing expenses and sales revenues (in millions USD) (see table below).

 Marketing Expenses Sales Revenues 4 8 2 4 8 18 6 22 10 30 6 8

Table $$\PageIndex{2}$$

1. Determine the regression equation predicting yearly sales revenues from marketing expenses.
2. Write the pair of hypotheses that would test whether marketing expenses are a significant predictor of sales revenues.
3. Test the above hypotheses using test statistics at a 95% confidence level.
4. What would be your estimate of the average sales revenues for a company that spends 15 million USD on marketing per year?

23. An economist is interested in the possible influence of "Miracle Wheat" on the average yield of wheat in a district. To do so he fits a linear regression of average yield per year against year after introduction of "Miracle Wheat" for a ten year period.

The fitted trend line is $$\hat{Y}_{j}=80+1.5 X_{j}$$.

($$Y_j$$: Average yield in $$j$$ year after introduction)

($$X_j$$: $$j$$ year after introduction).

1. What is the estimated average yield for the fourth year after introduction?
2. Do you want to use this trend line to estimate yield for, say, 20 years after introduction? Why? What would your estimate be?

24. An interpretation of $$r=0.5$$ is that the following part of the $$Y$$-variation is associated with which variation in $$X$$:

1. most
2. half
3. very little
4. one quarter
5. none of these

25. Which of the following values of $$r$$ indicates the most accurate prediction of one variable from another?

1. $$r=1.18$$
2. $$r=−.77$$
3. $$r=.68$$

10.9: Chapter 10 Homework is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.