Thursday, 16 January 2014

Importance of Business Problem in the Data Analytics

Currently, Analytics is definitely a buzz word in the industry. As per my experience, almost each and every company in the industry are talking about this and trying to understand what it will do or what can be done with the data they have or what we can do for servicing the customers. As pointed in my last blog, there is like huge requirement of knowledge professionals for getting the insights from the data. As pointed by industry experts, statisticians, data mining experts and from my experience, I can certainly say that  it all starts with through understanding of the business problem in hand to be able to ascertain a possible solution for it using data.

An analyst should focus on through understanding of the problem before he thinks about the possible approach and the applicable technique/algorithm for the problem. Actually, understanding of problem is half way through for the analyst to decide upon the methodology and technique to be followed. In this blog, I thought of sharing couple of observations on the importance and how the steps would really change with a small difference in the problem.

For example, if the overall problem is to predict Churn for a Post-paid or a Pre-paid telecom customer then how it can be solved or what type of assumptions an analyst should consider before-hand to start attacking the problem. This particular problem statement raises 2 important questions, which will have the impact on the approach/methodology.

Question 1 – Is the problem at hand is to identify the Customer who will not recharge on the date of recharge in case of pre-paid or who will not pay the current bill in case of post-paid?

Question 2 – Is the problem at hand is to identify the Customer at Risk irrespective of his/her recharge date of bill due date?

We first need to understand the meaning of Churn – obviously if the customer stops using it then it is a first check-point and if he is not using it continuously then definitely a Churn case. It is this statement, which raises above 2 outlined questions for the problem statement.

Let me explain now in detail,
‘Question 1’ is basically prediction of the customers who will not recharge or pay the bill on the respective date of recharge or bill due-date. Here we try to look at the customers who would not re-charge or pay the bill based upon the various factors like usage, demographics, VAS, customer profile, etc.

‘Question 2’ is actually the first thing to identify than the Question 1. Here we are trying to identify customers who are going to end the usage or who are going to get out of the system in the next month/next quarter irrespective of his/her recharge date or bill due-date in the current month. The approach/methodology to be followed for this problem will change as compared to the above problem but the underlying data/factors remain same.

Both of them are really trying to solve the problem of Churn but the way it is dealt in case of Question 1 and the way in case of Question 2 will definitely change. Question 2 is more like trying to predict the customers who will not use his mobile but he will still be the customer of the service provider but whereas Question 1 is trying to predict, who will not re-charge or not pay the bill on the due-date.

It is a known fact in the telecom industry that, even if the customer recharges or pay the bills on the due-date without being using the services then he is treated as not a “good” customer. Revenues for telecom service provider is a direct function of the usage and the company is really profitable only if the customer recharges it continuously and uses it as well.

With the stiff competition in the market and with the Portability in place, service providers are actually looking to solve the above 2 problems using the Historical data combined with Data Mining Techniques.

To reiterate the point again that, it is the thorough understanding of the business problem would lead the analyst for the right approach/methodology to be followed for solving it.


Next question is having identified the customers at Risk, what package or offer to be given to retain them?

Friday, 3 January 2014

Installation of RMySQL Package into R3.0.2 - Windows 7

I think this is definitely worth posting on my blog because I have almost spent half day to figure out the installation of RMySQL package in R Console.

This package helps us to connect to the schema/database from R and then use the tables or create data frames with the connected table and then perform the further analysis. This will definitely help to save lot of time in terms of first exporting the data from sql to .csv and then importing to R from .csv for data analysis, modelling, etc

Here is the step by step procedure to install the package
1.       Firstly, a normal install.packages(libname) will not work in case of RMySQL. It has to be installed and compiled with the source
2.       It is required to first download and install Rtools exe depending upon the R Console Version. Here is the link to choose from - http://cran.r-project.org/bin/windows/Rtools/
3.       It is indeed required and mandatory to check box the system path in the process of installation
a.       Firstly, user will see this window prompt



b.      Ensure to check 2 boxes after clicking the next


c.       Click next and complete the installation

4.       Create an Environment Variable in the local system.
a.       Go to Control Panel àUser Account à Change my Environment variables
b.      Click on New
c.       Give the Variable name as MYSQL_HOME
d.      Give the Variable path as C:/Program Files/MySQL/MySQL Server 5.6(**Slash, this is not back slash as in case of path copy)
e.      It should look something like the below one

5.       Locate "libmysql.dll" file from the mysql program instance. Generally, this will be available in the bin folder of the Mysql instance in the program files
6.       Copy that ".dll" file and paste it in the R Environment into the bin folder. Generally, the path would be something like Program Files à R à R-3.02 à bin à x64/x32 à Copy that Mysql dll file into the path
7.       Restart the R-Console
8.       Run the Install Packages command by typing the following and selecting the respective CRAN

9.       Installation should work and you can connect to the database and access the tables

10.   Reference Manual is available in the package help