Now that it is clear to define the problem
definition/objective, it is the time to look for the underlying data to find
the possible solution for the problem statement.
Generally, there are like 2 types of data
stores in the data science community
Structured
Data
Structured data is stored in a fixed/pre-defined
format having it saved in a record or file or a pre-defined data store.
Generally, it is the combination of rows and columns saved in a fixed format
representing in a specific table. Most of the business problems can be solved
using the available structured data. Best part of the structured data is its
ease in storing the data, querying it or analysing it. With the advancement of
technological innovation, it is now becoming cheaper for the companies to have
a large data store in the organisations.
Un-Structured
Data
Un-structured data is a form of an
information that doesn’t have a pre-defined format or organised in a
pre-defined manner. It would typically be in a heavy text format consisting of
numbers, facts, or speech notes or videos. Data Science community is working on
some of the cool problems pertaining to text using data mining, text mining,
etc. Like trying to understand the sentiment of the product in the market, ascertaining
spam from the video upload, etc.
Essentially, it is the problem, which
drives the requirement of the data along with the availability. Few problem
statements don’t need un-structured data while few would only need it. It is
also dependent upon the availability of the data even though the problem might
demand for the requirement.
Availability of the data is key to find
solution for the defined problem statement and it is really important to
consolidate all the data from varied sources to analyse it.
Next logical step after defining the
problem statement and getting access to data is to define a process/methodology
to process the data for learning from it.
Good information, Thank you
ReplyDelete