Take That For Data!
Well, data is everywhere, which makes it very powerful. Everyday, the world is dealing with lots and lots of different data from different fields and sectors we have. Anything under the sun we could possibly collect data from it; like from prices, weather, the internet, and even our todays enemy, Covid-19 cases. So, being able to work with data, process, analyze, and present it efficiently with advanced tools/methods of Data Science could be very beneficial macroscopically.
Getting the basics right first.
Before we can put up something successfully, we always start with the basics. And as for data, this is in the context of preparing the data to have better analytics and to deliver accurate results. Of course, data quality is also our priority.
IMPORTANCE OF DATA PREPARATION
Why do we need to explore the data?
Before making conclusions from the data, we need to explore it first in order for us to understand what the data was about. From here, we will be able to see the fundamental metrics of the dataset like the maximums, minimums, range, relationships of the numerical data, and also, we can see the different values of our categorical data. Having all these knowledge, we can use it to efficiently clean our data.
Why do we need to clean the data?
We need to clean our data since there might be typographical errors that may have an effect on the analysis. Being able to determine inaccurate data and correct the detected errors is an essential step to easily deploy different advance methods/Statistical techniques in Data Science for better analytics.
Data analysis is the process of researching from a mass of data to significant insights. There are many different methods to use for analyzing different types of data, and Statistical methods are one of the most important technique to analyze either quantitative and qualitative data.
RELEVANCE OF STATISTICS IN DATA SCIENCE
Why do we need to have sufficient knowledge of Statistics?
As mentioned “data is everywhere”, today we live in a world where information can be derived to any possible situations and much of these are determined mathematically with the help of Statistics. There are many instances in our lives when we try to determine relationships of things based on their characteristics, we usually associate height and weight, budget and expenses and other aspects in life which may be related with one another. Measuring the degree of their associations and predictions on a higher level needed proper Statistical Methods to collect the data, employ the correct analyses, and effectively present the results.
What concepts in Statistics are useful in Data Science?
Data Scientists should know how to support their findings by using Descriptive and Inferential Data Analysis of Statistics.
— Descriptive Analysis: this is the first level of analysis used to find patterns and to summarize individual variables
Mean: numerical average of a set of values.
Median: midpoint of a set of numerical values.
Mode: most common value among a set of values.
Percentage: used to express how a value or group of respondents within the data relates to a larger group of respondents.
Frequency: the number of times a value is found.
Range: the highest and lowest value in a set of values.
— Inferential Analysis: these are complex analyses used to show relationships between variables to generalize results, and make predictions
Correlation: describes the relationship between variables.
Regression: shows or predicts the relationship between variables.
Analysis of Variance: tests the extent to which 2 or more groups differ.