R Programming
  • The wikipedia of R by me
  • Hello R
    • -What is R & RStudio
    • -Learning sources
    • -R online editor
    • -R environment
  • Data types
    • -Dealing with Number
    • -Dealing with String
    • -Dealing with Dates
    • -Dealing with NA's
    • -Dealing with Logicals
    • -Dealing with Factors
  • R data
    • -R object
    • -Data structures
      • --Basics
      • --Managing Vectors
      • --Managing Matrices
      • --Managing Data Frames
    • -Functions
    • -Importing/exporting data
    • -Shape&Transform data
    • -R management
  • Visualizations
  • Intro to R Bootcamp
    • -01-introduction
    • -02-data preparation
    • -03-data transformation
    • -04-visualization
  • R programming track
    • -a-Introduction to R
      • --1-Intro to basics
      • --2-Vectors
      • --3-Matrices
      • --4-Factors
      • --5-Data frames
      • --6-Lists
    • -b-Intermediate R
      • --1-Conditionals and Control Flow
      • --2-Loops
      • --3-Functions
      • --4-The apply family
      • --5-Utilities
    • -d-Writing Functions in R
      • --1-A quick refresher
      • --2-When and how you should write a function
      • --3-Functional programming
      • --4-Advanced inputs and outputs
      • --5-Robust functions
  • Data Wrangling with R
  • R-tutor
    • #R introduction
    • #Elementary Statistics with R
  • Hands-On Programming with R
  • R for Data Science
  • Advanced R
  • ggplot2
  • R packages
  • Statistik-1
  • Statistik-2
  • Statistik-3
  • Zeitreihen & Prognosen
  • Descriptive Analytics
  • Predictive Analytics
  • Prescriptive Analytics
  • R Graphics Cookbook
    • ggplot2 intro
    • ggplot2 custome
    • ggplot top-50
  • #Exploratory Data Analysis
    • -Data Summary
    • -Checklist Solution
  • #Data Mining
    • Untitled
    • Untitled
  • #Machine Learning
    • Intro to ML
    • Intro alghorithms
    • 1. Supervised Learning
  • Master R for Data Science
    • Learning R
    • Untitled
    • Untitled
  • Data Science Projects
    • Simple linear regression:
Powered by GitBook
On this page
  1. R programming track
  2. -b-Intermediate R

--4-The apply family

Previous--3-FunctionsNext--5-Utilities

Last updated 6 years ago

#1-lapply an the apply family: Apply family video: , , ​

#2-Use lapply with a built-in R function

#Have a look at the strsplit() calls, 
#that splits the strings in pioneers on the : sign. 
#The result, split_math is a list of 4 character vectors: 
#the first vector element represents the name, the second element the birth year.
#Use lapply() to convert the character vectors in split_math to lowercase letters: 
#apply tolower() on each of the elements in split_math. Assign the result, 
#which is a list, to a new variable split_low.
#Finally, inspect the contents of split_low with str().

# The vector pioneers has already been created for you
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")

# Split names from birth year
split_math <- strsplit(pioneers, split = ":")

# Convert to lowercase strings: split_low
split_low<-lapply(split_math,FUN=tolower)

# Take a look at the structure of split_low
str(split_low)

#3-Use lapply with your own function

#Apply select_first() over the elements of split_low with lapply() 
#and assign the result to a new variable names.
#Next, write a function select_second() that does the exact same thing for 
#the second element of an inputted vector.
#Finally, apply the select_second() function over split_low and 
#assign the output to the variable years.

# Code from previous exercise:
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, FUN=tolower)

# Write function select_first()
select_first <- function(x) {
  x[1]
}

# Apply select_first() over split_low: names
names<-lapply(split_low,FUN=select_first)
names
# Write function select_second()
select_second <- function(x) {
  x[2]
}

# Apply select_second() over split_low: years
years<-lapply(split_low,FUN=select_second)
years

#4-lapply and anonymous functions

#Transform the first call of lapply() such that it uses an anonymous function 
#that does the same thing.
#In a similar fashion, convert the second call of lapply to use 
#an anonymous version of the select_second() function.
#Remove both the definitions of select_first() and select_second(), 
#as they are no longer useful.

# Definition of split_low
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)

# Transform: use anonymous function inside lapply
function(x) {
  x[1]
}
names <- lapply(split_low,function(x) {x[1]})
names
# Transform: use anonymous function inside lapply
function(x) {
  x[2]
}
years <- lapply(split_low, function(x) {x[2]})
years

#5-Use lapply with additional arguments

#Use lapply() twice to call select_el() over all elements in split_low: 
#once with the index equal to 1 and a second time with the index equal to 2. 
#Assign the result to names and years, respectively.

# Definition of split_low
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)

# Generic select function
select_el <- function(x, index) {
  x[index]
}

# Use lapply() twice on split_low: names and years
names<-lapply(split_low,select_el,index=1)
years<-lapply(split_low,select_el,index=2)
names
years
unlist(names)
unlist(years)

#6-Apply functions that return NULL

#What will the following code chunk return 
#(split_low is already available in the workspace)? 
#Try to reason about the result before simply executing it in the console!

pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)
split_low

lapply(split_low, function(x) {
  if (nchar(x[1]) > 5) {
      return(NULL)
  } else {
      return(x[2])
  }
})

#list(NULL, NULL, "1623", "1857")
#list("gauss", "bayes", NULL, NULL)
#list("1777", "1702", NULL, NULL)-> The right answer
#list("1777", "1702")

#8-How to use sapply

#Use lapply() to calculate the minimum (built-in function min()) 
#of the temperature measurements for every day.
#Do the same thing but this time with sapply(). 
#See how the output differs.
#Use lapply() to compute the the maximum (max()) temperature for each day.
#Again, use sapply() to solve the same question and see how lapply() and 
#sapply() differ.

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)

# temp has already been defined in the workspace
temp
# Use lapply() to find each day's minimum temperature
lapply(temp,FUN=min)

# Use sapply() to find each day's minimum temperature
sapply(temp,FUN=min)

# Use lapply() to find each day's maximum temperature
lapply(temp,FUN=max)

# Use sapply() to find each day's maximum temperature
sapply(temp,FUN=max)

#9-sapply with your own function

#Finish the definition of extremes_avg(): 
#it takes a vector of temperatures and calculates the average of the minimum 
#and maximum temperatures of the vector.
#Next, use this function inside sapply() 
#to apply it over the vectors inside temp.
#Use the same function over temp with lapply() and 
#see how the outputs differ.

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)

# temp is already defined in the workspace
temp
# Finish function definition of extremes_avg
extremes_avg <- function(temp) {
  ( min(temp) + max(temp) ) / 2
}

# Apply extremes_avg() over temp using sapply()
sapply(temp,FUN=extremes_avg)

# Apply extremes_avg() over temp using lapply()
lapply(temp,FUN=extremes_avg)

#10-sapply with function returning vector

#Finish the definition of the extremes() function. 
#It takes a vector of numerical values and returns a vector 
#containing the minimum and maximum values of a given vector, with the names "min" 
#and "max", respectively.
#Apply this function over the vector temp using sapply().
#Finally, apply this function over the vector temp using lapply() as well.

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)
# temp is already available in the workspace
temp
# Create a function that returns min and max of a vector: extremes
extremes <- function(x) {
  c(min = min(x), max = max(x))
}

# Apply extremes() over temp with sapply()
sapply(temp,FUN=extremes)

# Apply extremes() over temp with lapply()
lapply(temp,FUN=extremes)

#11-sapply can't simplify, now what?

#Apply below_zero() over temp using sapply() and store the result in freezing_s.
#Apply below_zero() over temp using lapply(). 
#Save the resulting list in a variable freezing_l.
#Compare freezing_s to freezing_l using the identical() function.

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)
# temp is already prepared for you in the workspace
temp
# Definition of below_zero()
below_zero <- function(x) {
  return(x[x < 0])
}

# Apply below_zero over temp using sapply(): freezing_s
freezing_s<-sapply(temp,FUN=below_zero)

# Apply below_zero over temp using lapply(): freezing_l
freezing_l<-lapply(temp,FUN=below_zero)

# Are freezing_s and freezing_l identical?
identical(freezing_s,freezing_l)

#12-sapply with functions that return NULL

#Apply print_info() over the contents of temp with sapply().
#Repeat this process with lapply(). Do you notice the difference?

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)
# temp is already available in the workspace
temp
# Definition of print_info()
print_info <- function(x) {
  cat("The average temperature is", mean(x), "\n")
}

# Apply print_info() over temp using sapply()
sapply(temp,FUN=print_info)

# Apply print_info() over temp using lapply()
lapply(temp,FUN=print_info)

#13-Reverse engineering sapply

sapply(list(runif (10), runif (10)), 
       function(x) c(min = min(x), mean = mean(x), max = max(x)))


#Without going straight to the console to run the code, 
#try to reason through which of the following statements are correct and why.

#(1) sapply() can't simplify the result that lapply() would return, 
#and thus returns a list of vectors.
#(2) This code generates a matrix with 3 rows and 2 columns.-->The right answer
#(3) The function that is used inside sapply() is anonymous.-->The right answer
#(4) The resulting data structure does not contain any names.

#Select the option that lists all correct statements.
#lapply: apply function over list or vector
#output: list
#sapply: apply function over list or vector
#try to simplify list to array
#vapply: apply function over list or vector
#explicitly specify output format

#15-Use vapply

#Apply the function basics() over the list of temperatures, temp, using vapply(). 
#This time, you can use numeric(3) to specify the FUN.VALUE argument.

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)

# temp is already available in the workspace
temp
# Definition of basics()
basics <- function(x) {
  c(min = min(x), mean = mean(x), max = max(x))
}

# Apply basics() over temp using vapply()
vapply(temp,FUN=basics,numeric(3))

#16-Use vapply (2)

#Inspect the code on the right and try to run it. 
#If you haven't changed anything, an error should pop up. 
#That's because vapply() still expects basics() to return a vector of length 3. 
#The error message gives you an indication of what's wrong.
#Try to fix the error by editing the vapply() command.

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)

# temp is already available in the workspace
temp
# Definition of the basics() function
basics <- function(x) {
  c(min = min(x), mean = mean(x), median = median(x), max = max(x))
}

# Fix the error:
vapply(temp, FUN=basics, numeric(4))

#17-From sapply to vapply

#Convert all the sapply() expressions on the right to their vapply() counterparts. 
#Their results should be exactly the same; you're only adding robustness. 
#You'll need the templates numeric(1) and logical(1).

x<-c(3,  7,  9,  6, -1)
xx<-c(3 , 7 , 9 , 6, -1)
xxx<-c(3,  7,  9,  6, -1)
xxxx<-c(3,  7,  9,  6, -1)
xxxxx<-c( 5, 7, 9, 4, 2)
xxxxxx<-c( 5, 7, 9, 4, 2)
xxxxxxx<-c( 3 ,6, 9, 4, 1)

temp<-list(x,xx,xxx,xxxx,xxxxx,xxxxxx,xxxxxxx)

# temp is already defined in the workspace
temp
# Convert to vapply() expression
vapply(temp, FUN=max,numeric(1))

# Convert to vapply() expression
vapply(temp, FUN=function(x, y) { mean(x) > y }, y = 5,logical(1))

#7-sapply: Apply family video: , , ​

#14-vapply: Apply family video: , , ​

Part1
Part 2
Part 3
Part1
Part 2
Part 3
Part1
Part 2
Part 3