Data science integrates various machine learning tools, algorithms, and principles to uncover hidden patterns in raw data. But how is this different from the work statisticians have done over the years?
The answer lies in the difference between interpretation and prediction.
As can be seen from the figure above, data analysts generally look at historical data to explain what is happening. On the other hand, data scientists not only perform exploratory analysis to discover information but also use various advanced machine learning algorithms to identify the occurrence of specific events in the future. Data scientists will look at data from multiple angles, sometimes from previously unknown angles.
Therefore, data science is primarily used for making decisions and predictions using predictive causal analytics, prescriptive analytics (prediction plus decision science), and machine learning.
Predictive causal analysis: If you want a model that can predict the probability of a particular event occurring in the future, you must apply predictive causal analysis. For example, if you provide funds on credit, your concern is the possibility that customers will pay on time in the future. Here, you can create a model that can perform predictive analysis of the customer’s payment history to predict whether future payments will be made on time.
Normative analysis: If you want a model that has the intelligence to make decisions on its own and can use dynamic parameters to modify it, then of course you need to do normative analysis. This relatively new area is to provide advice. In other words, it not only predicts but also suggests various pre-attack actions and related results.
The best example is Google’s self-driving car, which I have talked about before. The data collected by the vehicle can be used to train self-driving cars. You can run algorithms on this data to give it intelligence. This will enable your car to make decisions such as when to turn, which way to go, and when to slow down or accelerate.
Machine Learning to Make Predictions: If you have transaction data from financial companies and need to create models to determine future trends, then machine learning algorithms are the best choice. This belongs to the paradigm of supervised learning. It is called supervised because you already have the data to train the machine. For example, the history of fraudulent purchases can be used to train a fraud detection model.
Pattern discovery machine learning-If you have no parameters that can be predicted, then you need to discover hidden patterns in the data set to make meaningful predictions. This is just an unattended model because it has no predefined tags to a group. The most common algorithm used for pattern discovery is clustering.
Suppose you work in a telephone company and you need to build a network by placing signal towers in an area. You can then use clustering techniques to find the location of the signal tower to ensure that all users receive the best signal strength.