Today’s post is on the basics of statistics. If you remember, in one of my posts, I made a point to start loving mathematics no matter what it took. Since I said that I have began to look for techniques to help my mathematics skills. In an effort to do that, I decided to go back to the beginning. To aid my learning, I decided that I would regularly blog about my learning from the data analytics class.
Data in business
In every organization, data is collected during the operations of a business. The essence of is to measure the data and then establish baselines, set performance goals and create benchmarks. As managers, data helps us make informed decisions, but what really is data?
What is Data?
Data is information that has been translated into a form that is efficient for processing. It is the raw facts and statistics. Whereas information is data that is accurate, timely, specific and organized for a purpose.
Features of data
- Small data: can be defined as small datasets that are capable of impacting decisions. Any data currently ongoing and whose data can be accumulated in an Excel sheet is small data.
2. Big data: It can be represented as large chunks of structured and unstructured data.
Categories of Data
- Structured: is data that has been predefined and formatted to a set structure before being placed in data storage. Example is the relational database.
- Unstructured data: is data stored in its native format and not processed until it is used. Examples are file formats, including email, social media posts, presentations and chats.
Major differences of Structured and Unstructured data
- Structured data is quantitative, while unstructured data is qualitative.
- Structured data is often stored in data warehouses, while unstructured data is stored in data lakes.
- Structured data is easy to search and analyse, while unstructured data requires more work to process and understand.
- Structured data exists in predefined formats, while unstructured data is in a variety of formats.
Sources of Data
- Primary Data: Primary data refers to the first hand data gathered by the researcher himself. Primary data is often reliable, authentic, and objective in as much as it was collected with the purpose of addressing a particular research problem. It can be generated through survey, experiment, interview, observation and focus group.
- Secondary data: is data that is already collected and made readily available for researchers to use. They are usually easily accessible to researchers and individuals because they are mostly shared publicly. They are usually once primary data but become secondary when used by a third party. Examples NBS, NPC, IOT, Annual reports, ERT, web scraping (sentiment analysis) Government records.
Nature of the data
- Quantitative: numerical data that can be counted or measured. It is basically Descriptive Statistics. It can be either Discreet or Continuous.
- Qualitative: Data collected that is not numerical, hence cannot be quantified. It can also be termed as “Categorical Statistics”.
To illustrate the difference between quantitative and qualitative data, I found a great example online. Imagine you want to describe your best friend. What kind of data might you gather or use to paint a vivid picture? First, you might describe their physical attributes, such as their height, their hair style and colour, what size feet they have, and how much they weigh. Then you might describe some of their most prominent personality traits. And so on.
That’s it for me on statistics today. 😀