An important initial step in the data analytics is
understanding the problem in hand and smartly defining it to be able to find a
solution to the problem using the data. I will try to explain you the
importance and types of defining it in the data analytics space.
Generally, there could be 3 types of business problems
1.
Deriving Insights from the underlying data
2.
Predicting the future based upon the factors in
the data
3.
Optimising the outcome by learning overtime
Deriving Insights from the underlying data
This type of problem is limited only to understand the data
and identify the patterns/insights that inherently exist in it. From the book
titled “Competing on Analytics” by Thomas H Davenport, this type of problem is
more like “What had really happened”
in the past. This is more termed as Business Intelligence because it just
involves the identifying inherent nature of the data. The focus of the analyst
would really happens to unearth the patterns or trends that define the data
overtime. The outcome from this type of
problem would really be to tell business that this what had happened and there
are these trends that exist in the data which are occurring very frequent
historically in the data.
Predicting the future based upon the factors in the data
This type of problems leverages the statistical, data mining
concepts with the data to define a function/logic for the potential future
outcome. This is Data Science and it tries to address the question pertaining
to “What will Happen” the skills
required for solving this kind of problems are the combination knowing science
with domain along with logical and business acumen.
This is a next logical step after ascertaining the
trends/factors/insights from the data from the first problem definition.
For example, if the overall problem is to predict Churn for
a Post-paid or a Pre-paid telecom customer then how it can be solved or what
type of assumptions an analyst should consider before-hand to start attacking
the problem. This particular problem statement raises 2 important questions,
which will have the impact on the approach/methodology.
Question 1 – Is the problem at hand is to identify the
Customer who will not recharge on the date of recharge in case of pre-paid or
who will not pay the current bill in case of post-paid?
Question 2 – Is the problem at hand is to identify the
Customer at Risk irrespective of his/her recharge date of bill due date?
We first need to understand the meaning of Churn – obviously
if the customer stops using it then it is a first check-point and if he is not
using it continuously then definitely a Churn case. It is this statement, which
raises above 2 outlined questions for the problem statement.
Let me explain now in detail,
‘Question 1’ is basically prediction of the customers who
will not recharge or pay the bill on the respective date of recharge or bill
due-date. Here we try to look at the customers who would not re-charge or pay
the bill based upon the various factors like usage, demographics, VAS, customer
profile, etc.
‘Question 2’ is actually the first thing to identify than
the Question 1. Here we are trying to identify customers who are going to end
the usage or who are going to get out of the system in the next month/next
quarter irrespective of his/her recharge date or bill due-date in the current
month. The approach/methodology to be followed for this problem will change as
compared to the above problem but the underlying data/factors remain same.
Both of them are really trying to solve the problem of Churn
but the way it is dealt in case of Question 1 and the way in case of Question 2
will definitely change. Question 2 is more like trying to predict the customers
who will not use his mobile but he will still be the customer of the service
provider but whereas Question 1 is trying to predict, who will not re-charge or
not pay the bill on the due-date.
It is a known fact in the telecom industry that, even if the
customer recharges or pay the bills on the due-date without being using the
services then he is treated as not a “good” customer. Revenues for telecom
service provider is a direct function of the usage and the company is really
profitable only if the customer recharges it continuously and uses it as well.
With the stiff competition in the market and with the
Portability in place, service providers are actually looking to solve the above
2 problems using the Historical data combined with Data Mining Techniques.
To reiterate the point again that, it is the thorough
understanding of the business problem would lead the analyst for the right
approach/methodology to be followed for solving it.
Optimising the outcome by learning overtime
This type of problem tries to address “What best can happen” given the scenario with the data. This is
also a type of data science applying advanced level of concepts to optimize the
prediction further. It can be an algorithm which learns continuously to
optimize – maximize/minimize the solution for the busines problem using the
data
The focus of the Analyst towards finding a solution for the Approach/methodology
would really have an impact based upon
the type of the problem.
My next note – Get the underlying data