-Checklist Solution
From my experiences there are some of basics data cleaning in R. Check it out!
1-Duplicates solution
A primary key is first steps to identifying the duplicates values
data<-dataset #dataset
#1. Duplicates values: Find the duplicates
#values (only) in primary key better
#or all of the dataset
#packages:
library(skimr)
library(Hmisc)
data_1<-unique(data) #duplicates in primary key
before<-length(data$primarykey)
before
after<-length(data_1$primarykey)
after
different<-before-after
different
before_after_matrix<-cbind(before,after)
before_after_matrix
2- Missing Value solution
Not all missing values are really missing values. Checking the logics is the ways to find the really missing values. More about logics
3-Outliers solution
Last updated