As a manager, the quality of decisions you make is determined by the quality of data available to you.
Data refers to raw facts and Information.
Information refers to Data that has been processed
Statistics is the branch of mathematics that deals with collecting, organizing, presenting, analyzing and interpreting numerical facts(data) so as to make informed decision.
Data Sources.
refers to where data is gotten from, and this can be divided into
- Primary data: Data generated by yourself. E.g. through survey, forms like google forms and survey monkey, interview etc.
- Secondary data; Data gotten from already existing information e.g. data from national population census, annual reports, social media, web scrapping etc.
In Data collection, remember to always reference the source.
Nature of Data
- Numeric: this is also called quantitative data .it is divided into discrete and continuous data. Examples of discrete data is number of students. This data is usually whole numbered. Continuous or non-discrete data can be decimals and examples includes length, temperature, volume.
- Non-numeric is also called qualitative or categorical data.
Terms used in describing Data
- Mixed method of data collection; refers to a combination of Quantitative and qualitative data.
- Dummy variable; used to measure qualitative data.
- Timing of data; this refers to the time at which data is collected. And is divided into crossectional which is data collected at a time and timeseries data which is data collected overtime. The frequency of this is either annually, daily, quarterly etc
- Pool/panel data; combination of cross-sectional and longitudinal/time series data
- Variables; refers to anything that changes
- Population: total number of interests being considered at a time eg the total number of students being referred to.
- Sample: is taking a part of the entire population as sometimes, you cant use the entire population. Your sample must be a representative of the population
- Outlier; refers to the odd one out in a set of data. Usually significantly higher or lower than the rest.
- Mean; refers to the average of a set of values
- Median; the middle value in a set of values
- Mode; the value that appears most frequently in a set of values
- Normal distribution refers to regularly distributed data. There is no bias or abnormality in it.
- Causality ; when a variable affects the other
- Parameter; a single value used to describe the population eg mean, median mode etc
In Excel, there are 4 data types
- Numbers ; refers to numbers like 1,2,3,4,5, etc
- Texts
- Boolean
- Formula/function
In Data collection, there is usually room for a little eror as you cant be a 100 percent correct. There is a margin of error that is acceptable and this margin is about 1% which is referred to as the level of significance. The other 99% is called the confidence level.