7.5: Exercises
- Page ID
- 138320
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Part A: Conceptual Questions
Classification Trees
- Explain the concept of information gain and its role in building classification trees.
- What is Gini impurity, and how is it used to determine splits in decision trees?
- Compare entropy and Gini impurity. When might one be preferred over the other?
- Define overfitting in the context of classification trees. How can it be avoided?
- What role does pruning play in decision tree models?
- Describe how missing values are handled in classification trees.
- How does a decision tree handle categorical vs. continuous predictors?
- What are the limitations of decision trees for classification tasks?
- Discuss the trade-off between tree depth and prediction accuracy.
- Why might a random forest perform better than a single classification tree?
Regression Trees
- What is the main splitting criterion used in regression trees?
- Explain how mean squared error (MSE) is used in regression tree building.
- Compare regression trees and linear regression. In what situations might a regression tree perform better?
- What are the risks of building very deep regression trees?
- What does pruning accomplish in a regression tree?
- How do regression trees handle outliers in the target variable?
- How are missing values treated in regression trees?
- Discuss the use of regression trees in time series forecasting.
- Can regression trees handle multicollinearity? Why or why not?
- How is prediction made at the leaves of a regression tree?
Part B: Interpretation Questions
Classification Trees
- Interpret the meaning of a leaf node that classifies most instances as 'No Churn'.
- Explain what a split on 'Age > 35' indicates in a decision tree.
- If a model has high training accuracy but low test accuracy, what does this suggest?
- How do you assess feature importance from a decision tree model?
- What does it mean if two branches have similar accuracy but very different depths?
Regression Trees
- Explain how to interpret the predicted value at a leaf node in a regression tree.
- A regression tree has a split on 'Sales Volume > 1000'. What does this mean in business terms?
- If pruning reduces MSE on the test set but not on the training set, what does this imply?
- How do you interpret the importance of features in a regression tree?
- Why might the model suggest similar predicted values for different input paths?
Part C: Hands-On Data Exercises
Classification Tree Applications
- Marketing: Predict whether a customer will respond to a promotion using age, income, prior purchases, website visits, and social media engagement. (Dataset: classification_tree_app_1.xlsx)
- Finance: Classify loan applicants as 'Approved' or 'Denied' based on credit score, income, debt-to-income ratio, employment status, and past delinquencies. (Dataset: classification_tree_app_2.xlsx)
- Operations: Predict whether an order will be delivered late using shipping method, delivery distance, number of items, product weight, and time of order placement. (Dataset: classification_tree_app_3.xlsx)
- Sales: Predict whether a sales lead will convert using variables such as industry, lead source, sales rep experience, contact frequency, and region. (Dataset: classification_tree_app_4.xlsx)
- Customer Service: Classify complaint type using service ticket category, issue description, channel of contact, customer tenure, and service history. (Dataset: classification_tree_app_5.xlsx)
- Quality Control: Predict product defect classification using shift time, machine ID, operator experience, batch temperature, and pressure levels. (Dataset: classification_tree_app_6.xlsx)
- Credit Risk Management: Classify clients as low, medium, or high risk using financial ratios, payment history, account age, credit utilization, and number of recent inquiries. (Dataset: classification_tree_app_7.xlsx)
- Entertainment: Predict whether a user will skip a video ad using genre, video length, prior skips, time of day, and ad relevance rating. (Dataset: classification_tree_app_8.xlsx)
- Manufacturing: Predict machine failure category using vibration level, run time, humidity, load pressure, and last maintenance date. (Dataset: classification_tree_app_9.xlsx)
- Accounting: Classify transactions as legitimate or potentially fraudulent using transaction amount, frequency, category, location, and account history. (Dataset: classification_tree_app_10.xlsx)
Regression Tree Applications
- Marketing: Predict customer lifetime value using acquisition channel, age, purchase frequency, recency, and total spend. (Dataset: regression_tree_app_1.xlsx)
- Finance: Estimate stock price using daily return, volume, moving averages, volatility index, and macroeconomic indicators. (Dataset: regression_tree_app_2.xlsx)
- Operations: Predict delivery time using variables such as delivery distance, vehicle type, weather conditions, package size, and order time. (Dataset: regression_tree_app_3.xlsx)
- Sales: Predict monthly sales revenue from sales team size, ad spend, regional demand, competitor pricing, and seasonal factors. (Dataset: regression_tree_app_4.xlsx)
- Customer Service: Estimate average resolution time using issue type, channel, agent experience, priority level, and time to first response. (Dataset: regression_tree_app_5.xlsx)
- Quality Control: Predict number of defects using production speed, shift, material type, temperature, and humidity. (Dataset: regression_tree_app_6.xlsx)
- Credit Risk Management: Predict expected loss amount using borrower income, loan term, credit score, payment history, and current debt. (Dataset: regression_tree_app_7.xlsx)
- Entertainment: Predict daily streaming minutes per user using plan type, device type, preferred genres, historical usage, and subscription tenure. (Dataset: regression_tree_app_8.xlsx)
- Manufacturing: Estimate equipment maintenance cost using machine age, usage hours, energy consumption, failure history, and technician notes. (Dataset: regression_tree_app_9.xlsx)
- Accounting: Predict monthly expenses using number of transactions, business unit, prior period expenses, budget allocation, and seasonal adjustments. (Dataset: regression_tree_app_10.xlsx)


