R Programming
  • The wikipedia of R by me
  • Hello R
    • -What is R & RStudio
    • -Learning sources
    • -R online editor
    • -R environment
  • Data types
    • -Dealing with Number
    • -Dealing with String
    • -Dealing with Dates
    • -Dealing with NA's
    • -Dealing with Logicals
    • -Dealing with Factors
  • R data
    • -R object
    • -Data structures
      • --Basics
      • --Managing Vectors
      • --Managing Matrices
      • --Managing Data Frames
    • -Functions
    • -Importing/exporting data
    • -Shape&Transform data
    • -R management
  • Visualizations
  • Intro to R Bootcamp
    • -01-introduction
    • -02-data preparation
    • -03-data transformation
    • -04-visualization
  • R programming track
    • -a-Introduction to R
      • --1-Intro to basics
      • --2-Vectors
      • --3-Matrices
      • --4-Factors
      • --5-Data frames
      • --6-Lists
    • -b-Intermediate R
      • --1-Conditionals and Control Flow
      • --2-Loops
      • --3-Functions
      • --4-The apply family
      • --5-Utilities
    • -d-Writing Functions in R
      • --1-A quick refresher
      • --2-When and how you should write a function
      • --3-Functional programming
      • --4-Advanced inputs and outputs
      • --5-Robust functions
  • Data Wrangling with R
  • R-tutor
    • #R introduction
    • #Elementary Statistics with R
  • Hands-On Programming with R
  • R for Data Science
  • Advanced R
  • ggplot2
  • R packages
  • Statistik-1
  • Statistik-2
  • Statistik-3
  • Zeitreihen & Prognosen
  • Descriptive Analytics
  • Predictive Analytics
  • Prescriptive Analytics
  • R Graphics Cookbook
    • ggplot2 intro
    • ggplot2 custome
    • ggplot top-50
  • #Exploratory Data Analysis
    • -Data Summary
    • -Checklist Solution
  • #Data Mining
    • Untitled
    • Untitled
  • #Machine Learning
    • Intro to ML
    • Intro alghorithms
    • 1. Supervised Learning
  • Master R for Data Science
    • Learning R
    • Untitled
    • Untitled
  • Data Science Projects
    • Simple linear regression:
Powered by GitBook
On this page
  • #Creating, Converting & Inspecting Factors
  • #Ordering, Revaluing, & Dropping Factor Levels
  1. Data types

-Dealing with Factors

Factors are important in statistical modeling and are treated specially by modelling functions like lm() and glm().

#Creating, Converting & Inspecting Factors

# create a factor string
gender <- factor(c("male", "female", "female", "male", "female"))
gender
## [1] male   female female male   female
## Levels: female male

# inspect to see if it is a factor class
class(gender)
## [1] "factor"

# show that factors are just built on top of integers
typeof(gender)
## [1] "integer"

# See the underlying representation of factor
unclass(gender)
## [1] 2 1 1 2 1
## attr(,"levels")
## [1] "female" "male"

# what are the factor levels?
levels(gender)
## [1] "female" "male"

# show summary of counts
summary(gender)
## female   male 
##      3      2

If we have a vector of character strings or integers we can easily convert to factors:

group <- c("Group1", "Group2", "Group2", "Group1", "Group1")
str(group)
##  chr [1:5] "Group1" "Group2" "Group2" "Group1" "Group1"

# convert from characters to factors
as.factor(group)
## [1] Group1 Group2 Group2 Group1 Group1
## Levels: Group1 Group2

#Ordering, Revaluing, & Dropping Factor Levels

Ordering Levels:

# when not specified the default puts order as alphabetical
gender <- factor(c("male", "female", "female", "male", "female"))
gender
## [1] male   female female male   female
## Levels: female male

# specifying order
gender <- factor(c("male", "female", "female", "male", "female"), 
                 levels = c("male", "female"))
gender
## [1] male   female female male   female
## Levels: male female

Create ordinal factors with ordered = TRUE argument

ses <- c("low", "middle", "low", "low", "low", "low", "middle", "low", 
"middle",
    "middle", "middle", "middle", "middle", "high", "high", "low", 
    "middle",
    "middle", "low", "high")

# create ordinal levels
ses <- factor(ses, levels = c("low", "middle", "high"), ordered = TRUE)
ses
##  [1] low    middle low    low    low    low    middle low    
# middle middle
## [11] middle middle middle high   high   low    middle middle low high  
## Levels: low < middle < high

# you can also reverse the order of levels if desired
factor(ses, levels=rev(levels(ses)))
##  [1] low    middle low    low    low    low    middle low   
# middle middle
## [11] middle middle middle high   high   low    middle middle low    
# high  
## Levels: high < middle < low

Revalue Levels:

To recode factor levels I usually use the revalue() function from the plyrpackage.

plyr::revalue(ses, c("low" = "small", "middle" = "medium", "high" = 
"large"))
##  [1] small  medium small  small  small  small  medium small  
medium medium
## [11] medium medium medium large  large  small  medium medium 
small  large 
## Levels: small < medium < large

☛ Using the :: notation allows you to access the revalue() function without having to fully load the plyr package.

Dropping Levels:

When you want to drop unused factor levels, use droplevels():

ses2 <- ses[ses != "middle"]

# lets say you have no observations in one level
summary(ses2)
##    low middle   high 
##      8      0      3

# you can drop that level if desired
droplevels(ses2)
##  [1] low  low  low  low  low  low  high high low  low  high
## Levels: low < high
Previous-Dealing with LogicalsNextR data

Last updated 6 years ago