2.3: Data Collection
- Page ID
- 138018
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Data collection is the cornerstone of the analytics process, serving as the foundation upon which all subsequent steps are built. The quality, accuracy, and relevance of collected data directly impact the insights derived from analysis. In today’s digital landscape, data collection extends far beyond traditional methods, encompassing diverse techniques such as Application Programming Interface (API) integrations, Internet of Things (IoT) devices, and web scraping.
Effective data collection not only captures what is happening within an organization but also draws on external data to provide a broader context, enabling organizations to make informed, data-driven decisions. As businesses strive to remain competitive, mastering data collection techniques is essential for uncovering trends, predicting outcomes, and optimizing operations.
Data analytics spans a wide range of industries, each utilizing distinct types of data to drive insights and decisions. The table below highlights key industries, the types of data commonly used within them, and leading sources of these data.
Key Industries and Data Types in Analytics
|
Industry |
Data Types/Applications |
Leading Sources |
|---|---|---|
|
Retail and E-commerce |
Transaction data (sales, returns), customer segmentation, market trends, inventory management |
Amazon, Shopify, Nielsen, Experian |
|
Finance and Banking |
Credit risk data, transactional logs, fraud detection, stock market trends |
Bloomberg, FactSet, Equifax, TransUnion, S&P Global (Standard & Poor's Global) |
|
Healthcare and Life Sciences |
Patient records, genomic data, claims data, treatment outcomes, healthcare utilization |
Centers for Disease Control and Prevention (CDC), World Health Organization (WHO), National Institutes of Health (NIH), Genomics England |
|
Supply Chain and Logistics |
Inventory data, transportation data, supplier performance metrics, route optimization |
Freightos, SAP Supply Chain Solutions, Global Trade Atlas |
|
Marketing and Advertising |
Consumer behavior data, social media analytics, campaign performance, customer segmentation |
Google Analytics, Nielsen, HubSpot, Claritas |
|
Energy and Utilities |
Energy consumption data, pricing trends, weather data for forecasting, renewable energy metrics |
U.S. Energy Information Administration (EIA), National Oceanic and Atmospheric Administration (NOAA), S&P Platts |
|
Insurance |
Policyholder data, claims data, risk assessment metrics, fraud detection analytics |
LexisNexis, Experian, Equifax, ISO (Insurance Services Office) |
|
Automotive |
Vehicle sales data, powertrain trends, consumer preferences, production metrics, maintenance records |
JD Power, IHS Markit (Information Handling Services Markit), Kelley Blue Book, Automotive OEM Data |
|
Investment and Asset Management |
Portfolio performance data, financial market trends, economic indicators, risk modeling |
Morningstar, Bloomberg, FactSet, S&P Capital IQ |
Methods of Data Collection
Data collection methods play a critical role in ensuring the quality, accuracy, and reliability of datasets. These methods vary depending on the type of data being collected, the industry, and the intended use. Below are some of the most common methods:
|
Method |
Description |
Applications |
Example |
|---|---|---|---|
|
Surveys and Questionnaires |
Collect data directly from individuals through structured or semi-structured forms |
Understanding customer preferences, measuring employee satisfaction, gathering patient feedback |
A retailer company uses customer surveys to assess satisfaction with their online shopping experience. |
|
Application Programming Interfaces (APIs) |
Automate the collection of data from external systems or platforms |
Retrieving website traffic data, accessing financial data for investment analysis |
A manufacturing organization uses APIs to gather real-time inventory data from its warehouse management system. |
|
Internet of Things (IoT) Devices and Sensors |
Capture real-time data through devices connected to the Internet of Things (IoT) |
Monitoring warehouse inventory levels, tracking vehicle performance, collecting environmental data |
A logistics company employ IoT sensors to monitor temperature and humidity in their storage facilities. |
|
Web Scraping |
Extract data from websites using automated tools |
Monitoring competitor pricing, gathering reviews for sentiment analysis |
A retail company scrapes competitor websites to analyze pricing trends during holiday seasons. |
|
Observational Data |
Collect data through direct observation of behaviors, processes, or environments |
Studying in-store customer behaviors, observing production line efficiency |
A marketing company observes customer interactions with displays to optimize product placement. |
|
Transaction Logs |
Record data automatically during transactions |
Tracking sales and returns, recording payments and claims |
A retail company uses transaction logs to analyze peak sales periods. |
|
Third-Party Data Providers |
Purchase or license data from external providers |
Accessing credit risk data, using market trend data in marketing analytics |
A consumer goods company integrates Experian’s market trend data to forecast seasonal demand. |
|
Focus Groups and Interviews |
Gather qualitative insights through small group discussions or one-on-one interviews |
Developing new product ideas, evaluating employee feedback |
A market research company conducts focus groups to test reactions to new store layouts. |
Effective data collection is the cornerstone of any successful analytics project, ensuring that the raw inputs are comprehensive, relevant, and timely. By leveraging diverse methods such as surveys, APIs, and IoT devices, organizations can gather the insights needed to address complex challenges across industries. However, the value of collected data depends not only on how it is gathered but also on its quality.
In the next section, we delve into the concept of data quality, exploring how accuracy, consistency, and reliability transform raw data into actionable insights.


