Wednesday, 23 July 2014

Get the Underlying Data(3/13)

Now that it is clear to define the problem definition/objective, it is the time to look for the underlying data to find the possible solution for the problem statement.

Generally, there are like 2 types of data stores in the data science community

Structured Data

Structured data is stored in a fixed/pre-defined format having it saved in a record or file or a pre-defined data store. Generally, it is the combination of rows and columns saved in a fixed format representing in a specific table. Most of the business problems can be solved using the available structured data. Best part of the structured data is its ease in storing the data, querying it or analysing it. With the advancement of technological innovation, it is now becoming cheaper for the companies to have a large data store in the organisations.


Un-Structured Data

Un-structured data is a form of an information that doesn’t have a pre-defined format or organised in a pre-defined manner. It would typically be in a heavy text format consisting of numbers, facts, or speech notes or videos. Data Science community is working on some of the cool problems pertaining to text using data mining, text mining, etc. Like trying to understand the sentiment of the product in the market, ascertaining spam from the video upload, etc.

Essentially, it is the problem, which drives the requirement of the data along with the availability. Few problem statements don’t need un-structured data while few would only need it. It is also dependent upon the availability of the data even though the problem might demand for the requirement.

Availability of the data is key to find solution for the defined problem statement and it is really important to consolidate all the data from varied sources to analyse it.


Next logical step after defining the problem statement and getting access to data is to define a process/methodology to process the data for learning from it.

1 comment: