Statistics is the signature of reality.
Humanity hates reality because it is unbiased and unforgiving. We thrive on the idea that we are special. Reality offers us no special treatment, so we revolt.
Storytelling is the thread that holds humanity together. It is biased and forgiving of the storyteller. Humanity loves storytelling.
Good storytellers master the art of shaping statistics into a good story. The ability to shape statistics can be weaponized. Social media giants and marketing agencies have mastered this art.
This is maybe why Benjamin Disraeli once said, “There are three kinds of lies: lies, damned lies, and statistics.”
The biggest advantage any person or group can have is the ability to detect lies. That ability is helped by a deep understanding of statistics.
For this article, we will explore the starting points of statistics – population and sampling.
The beginning of all statistics is the population. The population as defined here is not limited to people. Population is a pool of information or data sets that catches your fancy.
If you wake up happy on some days and sad on some days, your night ritual before you go to bed might catch your fancy and becomes the population you choose to explore.
The population you choose is a limitation, but you cannot look at all possible populations because resources are limited.
The population you choose is also indicative of your hypothesis (or in simpler terms, your gut feeling).
So, when someone reels out statistics on any subject to you, the first thing to clarify is why this particular population piqued their interest and if the population is relevant.
Many fallacies in statistics are formed in the choice of the population, chief of which is hasty generalization. All conclusions in statistics are specific to the population, not to the full cocktail of reality.
When population is figured out, sampling becomes important.
Using the scenario of trying to understand why you wake up happy or sad, it will be impractical to try assessing all the days you have lived to arrive at a conclusion. You would have to get a representative sample from the population you choose.
For most populations, it is impossible to aggregate all necessary data sets. Therefore, we settle for a representative sample. Representative is the leading word.
There are tools to check how representative a sample size is, but in general, I like to think of the 80:20 rule. This simply states that 80% of outcomes are caused by 20% of activities. Any sample size of less than 20% of the population requires a deeper query.
Sampling is where bias is usually planted and this is why we are vulnerable to tail events. Tail events are low-probability events. The inherent risk is that what we define as low probability might be because of sampling, not because of reality.
Statistics is presented as a difficult undertaking. A signature that should be left to boring craftsmen. But it is too important to be left to anyone, except you either enjoy being lied to or living a lie.
#MEMBA11 #DA #Zazparelli