-Dealing with String

Handling, cleaning and processing character strings is becoming a prerequisite in daily data analysis.

#Character string basics

Creating Strings:

x <- "Berlin"    
y <- "Paris"     
#The paste() function is expert for creating text

# paste together string a & b
paste(a, b)                      
## [1] "learning to create character strings"

# paste character and number strings 
# (converts numbers to character class)
paste("The life of", pi)           
## [1] "The life of 3.14159265358979"

# paste multiple strings
paste("I", "love", "R")            
## [1] "I love R"

# paste multiple strings with a separating character
paste("I", "love", "R", sep = "-")  
## [1] "I-love-R"

# use paste0() to paste without spaces btwn characters
paste0("I", "love", "R")            
## [1] "IloveR"

# paste objects with different lengths
paste("R", 1:5, sep = " v1.")       
## [1] "R v1.1" "R v1.2" "R v1.3" "R v1.4" "R v1.5"

Converting to Strings:

Printing Strings:

  • print(): generic printing

  • noquote(): print with no quotes

  • cat(): concatenate and print with no quotes

  • sprintf(): a wrapper for the C function sprintf, that returns a character vector containing a formatted combination of text and variable values

To substitute in a string or string variable, use %s:

For integers, use %d or a variant:

For floating-point numbers, use %f for standard notation, and %e or %E for exponential notation:

Counting string elements and characters:

#String manipulation with base R

Case conversion:

Simple Character Replacement:

String Abbreviations:

Extract/Replace Substrings:

#String manipulation with stringr

The stringr package is the Winner problem solving for string manipulation. Developer by Hadley Wickham

Basic Operations:

Three string functions that are closely related to their base R equivalents:

  • Concatenate with str_c()

  • Number of characters with str_length()

  • Substring with str_sub()

Duplicate Characters within a String:

Remove Leading and Trailing Whitespace:

Pad a String with Whitespace:

#Set operatons for character strings

Set Union:

Set Intersection:

Identifying Different Elements:

Testing for Element Equality:

Testing for Exact Equality:

Identifying if Elements are Contained in a String:

Sorting a String:

Last updated