# -Dealing with Factors

&#x20;Factors are important in statistical modeling and are treated specially by modelling functions like `lm()` and `glm()`.

## #Creating, Converting & Inspecting Factors

```r
# create a factor string
gender <- factor(c("male", "female", "female", "male", "female"))
gender
## [1] male   female female male   female
## Levels: female male

# inspect to see if it is a factor class
class(gender)
## [1] "factor"

# show that factors are just built on top of integers
typeof(gender)
## [1] "integer"

# See the underlying representation of factor
unclass(gender)
## [1] 2 1 1 2 1
## attr(,"levels")
## [1] "female" "male"

# what are the factor levels?
levels(gender)
## [1] "female" "male"

# show summary of counts
summary(gender)
## female   male 
##      3      2
```

If we have a vector of character strings or integers we can easily convert to factors:

```r
group <- c("Group1", "Group2", "Group2", "Group1", "Group1")
str(group)
##  chr [1:5] "Group1" "Group2" "Group2" "Group1" "Group1"

# convert from characters to factors
as.factor(group)
## [1] Group1 Group2 Group2 Group1 Group1
## Levels: Group1 Group2
```

## #Ordering, Revaluing, & Dropping Factor Levels

#### Ordering Levels: <a href="#ordering-levels" id="ordering-levels"></a>

```r
# when not specified the default puts order as alphabetical
gender <- factor(c("male", "female", "female", "male", "female"))
gender
## [1] male   female female male   female
## Levels: female male

# specifying order
gender <- factor(c("male", "female", "female", "male", "female"), 
                 levels = c("male", "female"))
gender
## [1] male   female female male   female
## Levels: male female
```

&#x20;Create ordinal factors with  `ordered = TRUE` argument

```r
ses <- c("low", "middle", "low", "low", "low", "low", "middle", "low", 
"middle",
    "middle", "middle", "middle", "middle", "high", "high", "low", 
    "middle",
    "middle", "low", "high")

# create ordinal levels
ses <- factor(ses, levels = c("low", "middle", "high"), ordered = TRUE)
ses
##  [1] low    middle low    low    low    low    middle low    
# middle middle
## [11] middle middle middle high   high   low    middle middle low high  
## Levels: low < middle < high

# you can also reverse the order of levels if desired
factor(ses, levels=rev(levels(ses)))
##  [1] low    middle low    low    low    low    middle low   
# middle middle
## [11] middle middle middle high   high   low    middle middle low    
# high  
## Levels: high < middle < low
```

#### Revalue Levels: <a href="#revalue-levels" id="revalue-levels"></a>

To recode factor levels I usually use the `revalue()` function from the `plyr`package.

```r
plyr::revalue(ses, c("low" = "small", "middle" = "medium", "high" = 
"large"))
##  [1] small  medium small  small  small  small  medium small  
medium medium
## [11] medium medium medium large  large  small  medium medium 
small  large 
## Levels: small < medium < large
```

{% hint style="info" %}
&#x20;☛ *Using the `::` notation allows you to access the `revalue()` function without having to fully load the `plyr` package.*
{% endhint %}

#### Dropping Levels: <a href="#dropping-levels" id="dropping-levels"></a>

When you want to drop unused factor levels, use `droplevels()`:

```r
ses2 <- ses[ses != "middle"]

# lets say you have no observations in one level
summary(ses2)
##    low middle   high 
##      8      0      3

# you can drop that level if desired
droplevels(ses2)
##  [1] low  low  low  low  low  low  high high low  low  high
## Levels: low < high
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://r-pedia.gitbook.io/cevi/data-types/dealing-with-factors.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
