Stop! That Data Needs To Be Cleaned

Resources are scarce and usually scarce and we as managers need to continually plan and budget.

We should be concerned about the future and remember that budgeting helps take care of the future. Budgeting helps us estimate the future.

To understand what the future is, you need to understand what has happened in the past.

There are 3 terms we consider;

Forecasting ;

Is the process of making estimate of the future based on present data. It helps in decision making. We however have another term called backcasting which involves estimating backwards. There is also nowcasting which involves estimating what will happen between the present minute till the next 24 hours and so examples are weatherforcast, stock prices etc.

Note however that the longer the period for which you are making a forecast the less accurate that forecast will be.

Anything beyond 24 hours is forecasting. However, ensure that all relevant data is available.

Forecasting approach

In forecasting, there are some approaches we need to consider

Naïve approach
Moving average
Exponential smoothening
Trend projection

Techniques in Forecasting

Forecasting usually depends on

Availability of accurate historic data
Simplicity of the model
Cost consideration

Prediction; is not 100 percent accurate. There are 2 types of data used; train and test data. This is usually used when companies need to find patterns in data . in prediction, it is usually best to estimate the period when we have the actual data and compare these data to check how close they are.

Training data

is a portion of the actual data set that is fed into the machine learning model so as to discover and learn the patterns available in the set. This sets a precedent for the model. It is larger than the testing data set. this is so because the model should get as much data as possible so that a pattern can be discovered.

Testing Data

When a model is established with the training data above, we require ‘never seen’ data to test the model. And this data is referred to as the testing data.

This data is used to test and evaluate the performance and progress of the algorithms training and also helps optimize for improved results.

2 criterias should be considered for testing data;

. it should represent the actual data set

. it should be large enough to generate meaningful predictions.

Remember that the dataset used here needs to be new and one that has not been seen by the model because the model already knows the training data.

The performance of the model after the introduction of this new test data will let you know if it is working accurately or if more training data is needed in order to perform according to specifications.

Test data provides a final realistic check of the ‘unseen dataset’ to confirm that the machine learning algorithm was effectively trained.

In data science, data is typically split into 80-20 where 80 percent is used for training and 20 percent is used for testing.

Projection; is driven solely by assumption.

Stop! That Data Needs To Be Cleaned

Written by Axella Yusuf

Spinboss Nederland en de betekenis van transparante voorwaarden

Brionis Casino Analyse Fokus auf Spielerschutz und Transparenz der Bonusbedingungen

Meine erste Woche im Yoyospins Casino und was Sie über die Kontrolle wissen sollten

Leave a Reply Cancel reply