-Checklist Solution

From my experiences there are some of basics data cleaning in R. Check it out!

1-Duplicates solution

A primary key is first steps to identifying the duplicates values

data<-dataset    #dataset

#1. Duplicates values: Find the duplicates 
#values (only) in primary key better 
#or all of the dataset

#packages:
library(skimr)
library(Hmisc)

data_1<-unique(data) #duplicates in primary key

before<-length(data$primarykey)
before

after<-length(data_1$primarykey)
after

different<-before-after
different

before_after_matrix<-cbind(before,after)
before_after_matrix

2- Missing Value solution

Not all missing values are really missing values. Checking the logics is the ways to find the really missing values. More about logics

3-Outliers solution

Previous-Data Summary NextUntitled

Last updated 6 years ago