Skip to main content
Business LibreTexts

7.3: Regression Trees

  • Page ID
    138319
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Introduction to Regression Trees

    Regression trees are a type of decision tree algorithm used when the target (dependent) variable is continuous rather than categorical. Instead of classifying observations into discrete categories, regression trees predict a numerical outcome by learning decision rules from input features. At each internal node, the dataset is split based on a condition that best minimizes error—typically using criteria like minimizing the sum of squared deviations. Each terminal (leaf) node represents a predicted numeric value that corresponds to the average of the target variable for observations reaching that node. Regression trees are intuitive, visual, and capable of capturing non-linear relationships in the data.

    Differences Between Regression and Classification Trees

    Differences Between Regression and Classification Trees

    Feature

    Classification Trees

    Regression Trees

    Target Variable

    Categorical

    Continuous

    Prediction Output

    Class label (e.g., Yes/No)

    Numeric value

    Splitting Criteria

    Gini Index, Entropy

    Mean Squared Error, MAE

    Evaluation Metrics

    Accuracy, Sensitivity, Specificity

    RMSE, MAE, R-squared

    Example Use Case

    Churn prediction

    Revenue prediction

    Applications of Regression Trees

    Regression trees are useful in a variety of business and analytical scenarios where the outcome variable is numeric. For instance, marketers might predict customer lifetime value, sales teams might estimate future sales based on historical data, real estate agents can assess property prices based on location and features, and logistics managers may forecast delivery times depending on route and traffic patterns. These models help organizations make informed, data-driven decisions with a visual and interpretable output.

    Image of a Regression Tree and Interpretation

    In a regression tree, each node represents a split that attempts to reduce variance in the dependent variable. The branches reflect the conditions for the splits, and the leaf nodes show the predicted numeric values.

    Consider the following hypothetical business scenario:

    A subscription-based telecommunications company wants to better understand the factors influencing customer revenue so it can more accurately forecast income and tailor pricing strategies. Using historical customer data, the company built a regression tree to predict monthly revenue based on two key variables: customer tenure and monthly charges. The model first splits customers by tenure, recognizing that newer customers often have different spending patterns compared to long-term subscribers. Among those with less than 12 months of tenure, monthly charges are the next most important factor, with lower-charged customers predicting an average revenue of $55 and higher-charged customers predicting $72. Longer-tenure customers, regardless of monthly charges, show higher predicted revenue at $95. This analysis helps the company identify high-value segments, adjust retention offers for newer customers, and fine-tune pricing models to maximize revenue potential.

    Following is an illustration of the regression tree based from the example above.

    A regression tree as described above.

    Metrics to Evaluate Regression Trees

    To evaluate how well a regression tree performs, analysts commonly use the following metrics:

    • RMSE (Root Mean Squared Error): Measures the square root of the average squared differences between predicted and actual values.
    • MAE (Mean Absolute Error): Takes the average of the absolute errors without squaring them.
    • R-squared: Indicates the proportion of variance in the dependent variable explained by the model.

    These metrics help determine how closely the model’s predictions match the actual values.

    Splitting Criteria for Regression Trees

    Regression trees use variance reduction methods to determine the best split. At each node, the algorithm evaluates which split yields the greatest decrease in variability of the target variable. This is typically assessed using Mean Squared Error (MSE), which ensures that the selected split results in subsets of data with minimal variation around the mean.

    Advantages and Disadvantages

    Advantages:

    • Easy to interpret and visualize
    • Can model non-linear relationships
    • No need for data normalization or scaling
    • Suitable for both small and large datasets

    Disadvantages:

    • Prone to overfitting unless pruned or regularized
    • Sensitive to minor data changes
    • May underperform compared to ensemble methods like Random Forests

    Summary

    Regression trees are powerful yet intuitive tools for predicting numeric outcomes using decision rule logic. They are particularly valuable when transparency and interpretability are required. However, they may suffer from instability and limited accuracy when used alone.

    To overcome these issues, ensemble methods like Random Forests build multiple trees and aggregate their predictions for more robust results. In Chapter 8, we will explore Random Forests in detail and see how they can improve performance across both regression and classification tasks.


    This page titled 7.3: Regression Trees is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Elbert L. Hearon, M.B.A., M.S..

    • Was this article helpful?