12.10: Exercises

Last updated
Save as PDF

Page ID: 138504

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Part A: Conceptual Questions

K-Means Clustering

What is the objective function minimized by the K-Means algorithm?
How does the choice of K affect the clustering output?
What is the elbow method and how is it used to determine the optimal number of clusters?
What are the limitations of K-Means in terms of cluster shape and scale sensitivity?
How does the initial placement of centroids influence final clusters in K-Means?

Hierarchical Clustering

What is the difference between agglomerative and divisive hierarchical clustering?
What are linkage methods in hierarchical clustering, and how do they affect results?
How do dendrograms help in identifying natural clusters?
What are the advantages and disadvantages of hierarchical clustering compared to K-Means?
Can hierarchical clustering be used with categorical data? If so, how?

DBSCAN

What are the key parameters of DBSCAN, and how do they influence the result?
How does DBSCAN handle noise and outliers?
Compare DBSCAN to K-Means in terms of assumptions and scalability.
What is a core point versus a border point in DBSCAN?
In what situations is DBSCAN preferred over K-Means or hierarchical clustering?

Gaussian Mixture Models (GMM)

How does GMM differ from K-Means in terms of cluster assignment?
What is the role of the Expectation-Maximization (EM) algorithm in GMM?
How are soft assignments useful in GMM clustering?
What assumptions do GMMs make about data distribution?
How can model selection criteria like BIC or AIC be used in choosing the number of components in GMM?

Part B: Interpretation Questions

K-Means Clustering

How would you interpret the centroid of a cluster in a business context?
What does a low within-cluster sum of squares (WCSS) suggest about the clustering?
If two clusters have overlapping data points, what might that indicate about your choice of K?
How do standardized variables affect K-Means clustering outcomes?
What does the distance between centroids imply about cluster separation?

Hierarchical Clustering

How do you decide the optimal number of clusters using a dendrogram?
What does it imply if clusters merge at very high linkage distances?
How would you explain the difference between single and complete linkage visually?
If hierarchical clusters don’t match known group labels, what could be the issue?
How do dendrogram height and shape reflect data structure?

DBSCAN

What does a high number of noise points suggest about your DBSCAN parameter settings?
How would you interpret the presence of small clusters in DBSCAN output?
If DBSCAN returns one large cluster and many outliers, what might that indicate?
How can you visualize the effectiveness of DBSCAN clustering?
What does it mean if DBSCAN identifies more clusters than expected?

Gaussian Mixture Models (GMM)

- How do you interpret soft clustering assignments in a GMM context?
- What does the shape of a GMM ellipse represent in a plot?
- If two Gaussian components overlap, what business insights might that provide?
- How does the log-likelihood of the GMM help evaluate clustering quality?
- Why might BIC be preferred over AIC when evaluating GMMs in large datasets?

Part C: Hands-On Data Exercises

K-Means Clustering Applications

Marketing: Use K-Means clustering to segment customers based on recency, frequency, monetary value (RFM), website behavior, and demographics. (Dataset: marketing_segments.xlsx)
Finance: Use K-Means clustering to group financial instruments based on volatility, liquidity, market cap, sector, and price momentum. (Dataset: financial_instruments.xlsx)
Operations: Use K-Means clustering to segment warehouse inventory items by turnover rate, unit cost, reorder frequency, lead time, and item category. (Dataset: inventory_clusters.xlsx)
Sales: Use K-Means clustering to group sales reps based on total sales, average deal size, sales cycle length, product mix, and close rate. (Dataset: sales_team_clusters.xlsx)
Customer Service: Use K-Means clustering to cluster support tickets by issue complexity, resolution time, escalation count, communication channel, and customer satisfaction score. (Dataset: service_ticket_clusters.xlsx)
Quality Control: Use K-Means clustering to segment manufactured parts based on dimension measurements, weight, temperature tolerance, surface finish, and inspection outcomes. (Dataset: quality_metrics_clusters.xlsx)
Credit Risk Management: Use K-Means clustering to group borrowers by income level, debt-to-income ratio, number of credit lines, credit history length, and payment behavior. (Dataset: borrower_profiles.xlsx)
Entertainment: Use K-Means clustering to cluster users based on streaming habits, such as average watch time, preferred genres, binge frequency, device type, and watch time of day. (Dataset: viewer_segments.xlsx)
Manufacturing: Use K-Means clustering to segment machines by downtime frequency, maintenance hours, age, energy use, and failure types. (Dataset: equipment_profiles.xlsx)
Accounting: Use K-Means clustering to group expense categories or departments based on transaction volume, average transaction size, vendor diversity, and budget variance. (Dataset: expense_patterns.xlsx)

Search

Text Color

Text Size

Margin Size

Font Type