R Programming
  • The wikipedia of R by me
  • Hello R
    • -What is R & RStudio
    • -Learning sources
    • -R online editor
    • -R environment
  • Data types
    • -Dealing with Number
    • -Dealing with String
    • -Dealing with Dates
    • -Dealing with NA's
    • -Dealing with Logicals
    • -Dealing with Factors
  • R data
    • -R object
    • -Data structures
      • --Basics
      • --Managing Vectors
      • --Managing Matrices
      • --Managing Data Frames
    • -Functions
    • -Importing/exporting data
    • -Shape&Transform data
    • -R management
  • Visualizations
  • Intro to R Bootcamp
    • -01-introduction
    • -02-data preparation
    • -03-data transformation
    • -04-visualization
  • R programming track
    • -a-Introduction to R
      • --1-Intro to basics
      • --2-Vectors
      • --3-Matrices
      • --4-Factors
      • --5-Data frames
      • --6-Lists
    • -b-Intermediate R
      • --1-Conditionals and Control Flow
      • --2-Loops
      • --3-Functions
      • --4-The apply family
      • --5-Utilities
    • -d-Writing Functions in R
      • --1-A quick refresher
      • --2-When and how you should write a function
      • --3-Functional programming
      • --4-Advanced inputs and outputs
      • --5-Robust functions
  • Data Wrangling with R
  • R-tutor
    • #R introduction
    • #Elementary Statistics with R
  • Hands-On Programming with R
  • R for Data Science
  • Advanced R
  • ggplot2
  • R packages
  • Statistik-1
  • Statistik-2
  • Statistik-3
  • Zeitreihen & Prognosen
  • Descriptive Analytics
  • Predictive Analytics
  • Prescriptive Analytics
  • R Graphics Cookbook
    • ggplot2 intro
    • ggplot2 custome
    • ggplot top-50
  • #Exploratory Data Analysis
    • -Data Summary
    • -Checklist Solution
  • #Data Mining
    • Untitled
    • Untitled
  • #Machine Learning
    • Intro to ML
    • Intro alghorithms
    • 1. Supervised Learning
  • Master R for Data Science
    • Learning R
    • Untitled
    • Untitled
  • Data Science Projects
    • Simple linear regression:
Powered by GitBook
On this page
  • 1. Understanding the Ggplot Syntax:
  • 2. How to Make a Simple Scatterplot:
  • 3. Adjusting the X and Y axis limits:
  • 4. How to Change the Title and Axis Labels:
  • 5. How to Change the Color and Size of Points:
  • #How to Change the Color and Size To Static?
  • #How to Change the Color To Reflect Categories in Another Column?
  • 6. How to Change the X Axis Texts and Ticks Location:
  • #How to Change the X and Y Axis Text and its Location?
  • #How to Write Customized Texts for Axis Labels, by Formatting the Original Values?
  • #How to Customize the Entire Theme in One Shot using Pre-Built Themes?
  1. R Graphics Cookbook

ggplot2 intro

Covers the basic knowledge about

PreviousPrescriptive AnalyticsNextggplot2 custome

Last updated 6 years ago

1. Understanding the Ggplot Syntax:

The main difference is that, unlike base graphics, ggplot works with dataframes and not individual vectors.

# Setup
options(scipen=999)  # turn off scientific notation like 1e+06
library(ggplot2)
data("midwest", package = "ggplot2")  # load the data

# Init Ggplot
ggplot(midwest, aes(x=area, y=poptotal))  # area and poptotal are columns in 'midwest'

aes() function is used to specify the X and Y axes. Even though the x and y are specified, there are no points or lines in it. This is because, ggplot doesn’t assume that you meant a scatterplot or a line chart to be drawn. .

2. How to Make a Simple Scatterplot:

Adding points using a geom layer called geom_point. More about geom_xxx :

ggplot(midwest, aes(x=area, y=poptotal)) + 
    geom_point() 

Like geom_point(), there are many such geom layers which we will see in list.

Example: a smoothing layer using geom_smooth(method='lm').

g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands
plot(g)

The majority of points lie in the bottom of the chart which doesn’t really look nice. So, let’s change the Y-axis limits to focus on the lower half.

3. Adjusting the X and Y axis limits:

Method 1: By deleting the points outside the range

g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands
# Delete the points outside the limits
g + xlim(c(0, 0.1)) + ylim(c(0, 1000000))   # deletes points

Method 2: Zooming In

g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands

# Zoom in without deleting the points outside the limits. 
# As a result, the line of best fit is the same as the original plot.
g1 <- g + 
  coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000))  # zooms in
plot(g1)

4. How to Change the Title and Axis Labels:

# set se=FALSE to turnoff confidence bands
g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  

# zooms in
g1 <- g + 
  coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000))  

# Add Title and Labels
g1 + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
# or
g1 + 
  ggtitle("Area Vs Population", subtitle="From midwest dataset") + 
  xlab("Area") + 
  ylab("Population")

Here is the full function:

ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm") + 
  coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

5. How to Change the Color and Size of Points:

#How to Change the Color and Size To Static?

ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(col="steelblue", size=3) +   # Set static color and size for points
  geom_smooth(method="lm", col="green") +  # change the color of line
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

#How to Change the Color To Reflect Categories in Another Column?

gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
plot(gg)

As an added benefit, the legend is added automatically. If needed, it can be removed by setting the legend.position to None from within a theme() function.

gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
plot(gg)
gg + theme(legend.position="None")  # remove legend

Also, You can change the color palette entirely.

gg + scale_colour_brewer(palette = "Set1")  # change color palette
More of such palettes can be found in the RColorBrewer package
library(RColorBrewer)
head(brewer.pal.info, 10)  # show 10 palettes
#>          maxcolors category colorblind
#> BrBG            11      div       TRUE
#> PiYG            11      div       TRUE
#> PRGn            11      div       TRUE
#> PuOr            11      div       TRUE
#> RdBu            11      div       TRUE
#> RdGy            11      div      FALSE
#> RdYlBu          11      div       TRUE
#> RdYlGn          11      div      FALSE
#> Spectral        11      div      FALSE
#> Accent           8     qual      FALSE

6. How to Change the X Axis Texts and Ticks Location:

#How to Change the X and Y Axis Text and its Location?

This involves two aspects: breaks and labels.

Step 1: Set the breaks

scale_x_continuous because, the X axis variable is a continuous variable. Had it been a date variable, scale_x_date could be used. Like scale_x_continuous() an equivalent scale_y_continuous() is available for Y axis.

# Base plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

# Change breaks
gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01))

Step 2: Change the labels

#Let me demonstrate by setting the labels to alphabets 
#from a to k (though there is no meaning to it in this context).

# Base Plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

# Change breaks + label
gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01), labels = letters[1:11])

If you need to reverse the scale, use scale_x_reverse().

gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

# Reverse X Axis Scale
gg + scale_x_reverse()

#How to Write Customized Texts for Axis Labels, by Formatting the Original Values?

* Method 1: Using sprintf()

* Method 2: Using a custom user defined function. (Formatted 1000’s to 1K scale)

# Base Plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

# Change Axis Texts
gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01), labels = sprintf("%1.2f%%", seq(0, 0.1, 0.01))) + 
  scale_y_continuous(breaks=seq(0, 1000000, 200000), labels = function(x){paste0(x/1000, 'K')})

#How to Customize the Entire Theme in One Shot using Pre-Built Themes?

Finally, instead of changing the theme components individually. We can change the entire theme itself using pre-built themes. The help page ?theme_bw shows all the available built-in themes. We have 2 methode: * Use the theme_set() and * Draw the ggplot and then add the overall theme setting (eg. theme_bw())

# Base plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")

gg <- gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01))

# method 1: Using theme_set()
theme_set(theme_classic())  # not run
gg

# method 2: Adding theme Layer itself.
gg + theme_bw() + labs(subtitle="BW Theme")
gg + theme_classic() + labs(subtitle="Classic Theme")
More about aes
List of geom_xxx