General

Data Analytics Journey – A Good Summary of the Whole Gist

Written by Kelvin Omozokpia · 2 min read >
Data Analytics

This is a series of my learnings in the data analytics class, this writes up explains some core concepts needed to become a full fledge data analytics expert. We will investigate probability and how it helps us ascertain mathematically to what percentage an event is likely to occur or not occur.

You Did Not Escape Mathematics and Basic Microsoft Excel

The journey to becoming a data analyst will carry you along the path of basic mathematics and probability, you will get your hands dirty using Microsoft Excel to help you wrangle quantitative data thoroughly, Microsoft Excel is an industry-standard and looks like it’s not going anywhere for a very long time.

We Choose Python because it’s Simpler

Some level of scripting knowledge needs to be in your belt as a data analyst, it is usually recommended to go with Python because it is very easy to learn, this is because of its concise and less verbose syntax, a vast array of pre-built modules and external libraries (like Pandas, NumPy, etc.) that will take the heavy lifting of you as the analyst. It boasts of very good developer experience (DX), setting up your python environment for the first time is usually a breeze, you simply download the python binaries from https://www.python.org/downloads/ and run through the installation or use sophisticated tools like PyCharm from JetBrains which on first launch pulls the necessary dependencies so you can start writing python code immediately. Python is also a dynamically typed language, which usually means that you don’t need to define your schema as you code, they are determined at runtime, Though this leads to less predictability during runtime it gives you the needed flexibility to readily allow variables to change it’s datatype when needed.  Learning a scripting language like python will help a lot in most of your automation tasks.

SQL or NoSQL, Pick Your Poison

The data to be usually manipulated seats somewhere waiting for you or some pre-written scripts to act on them for the purpose of sanitization, transformation, batch processing, etc. they are usually saved in the form of tables in a Structured Query Language (SQL) database or as objects in a not only SQL/non-SQL (NoSQL) database. SQL database is usually the more popular option because it has been around for over 40 years, it is well documented and widely used, safe, and versatile. SQL is more suited for complex queries. One drawback of SQL which is the flip side of things when it comes to NoSQL is the predefined tabular schema it possesses. The dynamic schema of NoSQL databases allows us to represent data that have the potential to change dynamically. This dynamic nature of NoSQL databases allows flexibility, new fields or attributes can be added to the schema definition easily.

More on SQL

Popular SQL flavors include MySQL, PostgreSQL, Oracle Db, Azure SQL, etc. they all use the SQL dialect so interfacing with most of them is not so far off from each other. Often as developers, you will be using Object Relationship Mapping (ORM) software best suited for your framework or language and this will provide a bridge between the underlying SQL database engine and the programming language you are currently writing. These ORMs provide a common Application Programming Interface (API) that exposes common methods you will need.

Up next: Complete more on SQL and More on SQL

 #MMBA4

Written by Kelvin Omozokpia
Kelvin is a forward-thinking Software Engineer fluent in Javascript, Typescript, and PHP programming languages for orchestrating cloud-based applications Profile

Happiness: A Unique Inside Job!

Yemi Alesh in General
  ·   1 min read

Leave a Reply