Friday, 20 December 2013

Lifecycle of the Predictive Data Modelling Project/Task

Predictive data science is really an art and science of problem solving. We are using the historical data to ascertain insights, trends, and patterns for building up a likelihood function using mathematical and statistical techniques. This function in turn takes up the new values combined with built-in coefficients for predicting the likely/probable value or likely/probable good or bad customer.

This is really a process and I thought of sharing my insights on the process that I learnt over the years through my data modelling experience for various problems. The idea is to present the likely number of days and the tasks involved to take up a data problem and the number of days might differ based upon the complexity of the problem and the data



Data Deep Dive
Usually, it takes six days for an analyst to be able to complete this task. It is one of the most primary and important step in the workflow. It usually involves tasks like collating the data into the workspace, understanding the data dictionary, analysing each and every variable in the tables to ascertain the data availability, time period of availability and scaling, remove those variables which have data quality issues and also include those variables which are related and have importance for the next task

Problem Statement & Feature Engineering
This is the next task after the data deep dive and it usually takes 14 days from the project start date. This is the base for the data and it becomes the input for any experimentation on the data. One of the major task of the analyst would be to understand the “business problem” in-and-out because only then the whole process of feature engineering and technique application would depend. It usually involves understanding the distribution of the variables in the data, ascertaining the outliers and missing values, estimating the correlations for the variables, removing the outliers and missing values based upon the correlation results, testing the hypothesis with respect to variables, agreeing upon the variable scaling, normalizing the variables using the agreed scaling, deriving the more representative variables for summarization, etc based upon the business problem

Data Modelling & Data Validation
Once the feature engineering process is complete, it is now the time to consider fitting/training the chosen algorithm with the processed data. It usually takes 23 days from the project start date to complete this task. It usually involves tasks like checking which statistical/data mining technique to apply for this given data and business problem, diving the data into training, test and validation datasets, training the algorithm using training data and testing it to check the results. If the results are good then validation set is used to validate the predictions but if the results are not great then the time of re-modelling, using combination of techniques, using ensemble techniques comes into play for improving the accuracy

Report/PPT
Here the last and very important task in the whole workflow. As I mentioned initially in this note that data science is really art and science. Art aspect comes into play here to smartly articulate the results from the modelling to business user. The impact of any best or worst model can be visualized smartly to educate the user easily and also to gain the importance. It usually takes 25 days from the project start date to complete the modelling and present the results to the user.
The whole of objective of any data analytics project is to map it to the business requirement and to be able to compute the ROI with the implementation of the model developed, and it is with this report, we can conceptualize the whole process and its return for the business


Lastly, throughout the process, it is indeed required to have the regular dialogue with the business & the team to get the domain into the model, to link up to the end business output and to get the outside prospective for validation of the results

No comments:

Post a Comment