-Data Summary
Before do anything else, it is important to understand the structure of the data:
•missing data
•cleaning / tidying
•plotting
•correleations
•outliers
•summary stats
All the list are the functions in R. Some of them need additional packages.
General identifying
View(data)
glimpse(data)
spec(data) for csv file
attributes(data)
class(data)
In depth summarizing
1-With Summary() from Base
2-skim(), from the skimr package
3-describe, from the Hmisc package
4-stat.desc(), from the pastecs package
5-describe and describeBy, from the psych package
6-descr and dfSummary, from the summarytools package
7-CreateTableOne, from the tableone package
8-desctable, from the desctable package
9-ggpairs, from the GGally package
10-ds_summary_stats from descriptr
11-With dlookr: An automated report (as pdf or html)
12-With DataExplorer package:
Specifics identifying
1-Identify Duplicates values:
Find the duplicates values (only) in primary key
2-Identify NA values (Not Available):
3-Identify outliers:
4-Plausibility check: numeric & non numeric
Plausibility check can includes checking orders of magnitude, looking for implausible values (negative body weight), among others. A good starter is to differentiate between numeric and non-numeric variables.
5-Highly correlated & covariance of variables:
6-Mode: Unimodal or Bimodal distribution:
7-Principal Components Analysis:
8-Factor Analysis:
9-Bootstrap Resampling:
FULL SUMMARY:
Last updated