Introduction

This tutorial shows how to generate interactive data visualizations in R.

This tutorial is aimed at beginners and intermediate users of R with the aim of showcasing how to generate interactive visualizations and how to process the resulting concordances using R. The aim is not to provide a fully-fledged analysis but rather to show and exemplify selected useful methods associated with interactive graphs

The entire R Notebook for the tutorial can be downloaded here. If you want to render the R Notebook on your machine, i.e. knitting the document to html or a pdf, you need to make sure that you have R and RStudio installed and you also need to download the bibliography file and store it in the same folder where you store the Rmd file.


Interactive visualization refers to a type of graphic visualization that allows the viewer to interact with the data or the information that is visualized. As such, interactive visualizations are more engaging or appealing compared with non-interactive visualization. However, interactive visualizations cannot be implemented in reports that are printed on paper but are restricted to digital formats (e.g. websites, presentations, etc.).

There are various options to generate interactive data visualizations in R. The most popular option is to create a shiny app (Beeley 2013). This tutorial will not use shiny because shiny requires that the computation on which the computation that underlies the visualization is performed on a server. Rather, we will use GooleViz (Gesmann and Castillo 2011) for generating interactive visualizations that use the computer (or the browser) of the viewer to perform the computation. Thus, the interactive visualizations shown here do not require an external server.

Preparation and session set up

This tutorial is based on R. If you have not installed R or are new to it, you will find an introduction to and more information how to use R here. For this tutorials, we need to install certain packages from an R library so that the scripts shown below are executed without errors. Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).

# install packages
install.packages("googleVis")
install.packages("tidyverse")
install.packages("DT")
install.packages("flextable")
install.packages("ggplot2")
install.packages("gganimate")
install.packages("gapminder")
install.packages("maptools")
install.packages("plotly")
install.packages("leaflet")
# install klippy for copy-to-clipboard button in code chunks
install.packages("remotes")
remotes::install_github("rlesur/klippy")

Now that we have installed the packages, we activate them as shown below.

# set options
options(stringsAsFactors = F)          # no automatic data transformation
options("scipen" = 100, "digits" = 12) # suppress math annotation
# Warning: the following option adaptation requires re-setting during session outro!
op <- options(gvis.plot.tag='chart')  # set gViz options
# activate packages
library(googleVis)
library(tidyverse)
library(DT)
library(flextable)
library(ggplot2)
library(gganimate)
library(gapminder)
library(maptools)
library(plotly)
library(leaflet)
# activate klippy for copy-to-clipboard button
klippy::klippy()

Once you have installed R and RStudio and also initiated the session by executing the code shown above, you are good to go.

Getting Started

To get started with motion charts, we load the googleVis package for the visualizations, the tidyverse package for data processing, and we load a data set called coocdata. The coocdata contains information about how often adjectives were amplified by a degree adverb across time (see below).

# load data
coocdata  <- base::readRDS(url("https://slcladal.github.io/data/coo.rda", "rb"))

The coocdata is rather complex and requires some processing. First, we rename the columns to render their naming more meaningful. In this context we rename the OBS column Frequency and the Amp column Amplifier. As we are only interested if an adjective was amplified by very, we collapse all amplifiers that are not very in a bin category called other. We then calculate the frequency of the adjective within each time period and also the frequency with which each adjective is amplified by either very or other amplifiers. Then, we calculate the percentage with which each adjective is amplified by very.

# process data
coocs <- coocdata %>%
  dplyr::select(Decade, Amp, Adjective, OBS) %>%
  dplyr::rename(Frequency = OBS,
         Amplifier = Amp) %>%
  dplyr::mutate(Amplifier = ifelse(Amplifier == "very", "very", "other")) %>%
  dplyr::group_by(Decade, Adjective, Amplifier) %>%
  dplyr::summarise(Frequency = sum(Frequency)) %>%
  dplyr::ungroup() %>%
  tidyr::spread(Amplifier, Frequency) %>%
  dplyr::group_by(Decade, Adjective) %>%
  dplyr::mutate(Frequency_Adjective = sum(other + very),
         Percent_very = round(very/(other+very)*100, 2)) %>%
  dplyr::mutate(Percent_very = ifelse(is.na(Percent_very), 0, Percent_very),
         Adjective = factor(Adjective))

We now have a data set that we can use to generate interactive visualization.

1 Basic Interactive Graphs

This section shows some very basic interactive graphs including scatter plots, line graphs, and bar plots.

Scatter Plots

Scatter plots show the relationship between two numeric variables if you have more than one observation per variable level (if the data is not grouped by another variable). This means that you can use scatter plots to display data when you have, e.g. more than one observation for each data in your data set. If you only have a single observation, you could also use a line graph (which we will turn to below).

scdat <- coocs %>%
  dplyr::group_by(Decade) %>%
  dplyr::summarise(Precent_very = mean(Percent_very))
# create scatter plot
SC <- gvisScatterChart(scdat, 
                       options=list(
                         title="Interactive Scatter Plot",
                         legend="none",
                         pointSize=5))

If you want to display the visualization in a Notebook environment, you can use the plot function as shown below.

plot(SC)

However, if you want to display the visualization on a website, you must use the print function rather than the plot function and specify that you want to print a chart.

print(SC, 'chart')

Line Graphs

To create an interactive line chart, we use the gvisLineChart function as shown below.

# create scatter plot
SC <- gvisLineChart(scdat, 
                    options=list(
                      title="Interactive Scatter Plot",
                      legend="none"))

If you want to display the visualization in a Notebook environment, you can use the plot function. For website, you must use the print function and specify that you want to print a chart.

print(SC, 'chart')

Bar Plots

To create an interactive bar chart, we use the gvisBarChart function as shown below.

# create scatter plot
SC <- gvisBarChart(scdat, 
                       options=list(
                         title="Interactive Bar chart",
                         legend="right",
                         pointSize=10))

Normally, you would use the plot function to display the interactive chart but you must use the print function with the chart argument if you want to display the result on a website.

print(SC, 'chart')

2 Animations

Animations or GIFs (Graphics Interchange Format) can be generated using the gganimate package written by Thomas Lin Pedersen and David Robinson. The gganimate package allows to track changes over time while simultaneously displaying several variables in one visualization. As we will create animations using the ggplot2 package, we also load that package from the library. In this case, we use the gapminder data set which comes with the gapminder package and which contains information about different countries, such as the average life expectancy, the population, or the gross domestic product (GDP), across time.

# set options
theme_set(theme_bw())

After loading the data, we create static plot so that we can check what the data looks like at one point in time.

p <- ggplot(
  gapminder, 
  aes(x = gdpPercap, y=lifeExp, size = pop, colour = country)
  ) +
  geom_point(show.legend = FALSE, alpha = 0.7) +
  scale_color_viridis_d() +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  labs(x = "GDP per capita", y = "Life expectancy")
p

We can then turn static plot into animation by defining the content of the transition_time object.

gif <- p + transition_time(year) +
  labs(title = "Year: {frame_time}")
# show gif
gif

Another way to generate animations is to use the plotly package as shown below. While I personally do not find the visualizations created by the plot_ly function as visually appealing, it has the advantage that it allows mouse-over effects.

fig <- gapminder %>%
  plot_ly(
    x = ~gdpPercap, 
    y = ~lifeExp, 
    size = ~pop, 
    color = ~continent, 
    frame = ~year, 
    text = ~country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  )
fig <- fig %>% layout(
    xaxis = list(
      type = "log"
    )
  )

fig

3 Interactive Maps

You can also use the leaflet package to create interactive maps. In this example, we display the beautiful city of Brisbane and the visualization allows you to zoom in and out.

# generate visualization
m <- leaflet() %>% 
  setView(lng = 153.05, lat = -27.45, zoom = 12)
# display map
m %>% addTiles()

Another option is to display information about different countries. In this case, we can use the information provided in the maptools package which comes with a SpatialPolygonsDataFrame of the world and the population by country (in 2005). To make the visualization a bit more appealing, we will calculate the population density, add this variable to the data which underlies the visualization, and then display the information interactively. In this case, this means that you can use mouse-over or hoover effects so that you see the population density in each country if you put the cursor on that country (given the information is available for that country).

We start by loading the required package from the library, adding population density to the data, and removing data points without meaningful information (e.g. we set values like Inf to NA).

# load data
data(wrld_simpl)
# calculate population density and add it to the data 
wrld_simpl@data$PopulationDensity <- round(wrld_simpl@data$POP2005/wrld_simpl@data$AREA,2)
wrld_simpl@data$PopulationDensity <- ifelse(wrld_simpl@data$PopulationDensity == "Inf", NA, wrld_simpl@data$PopulationDensity)
wrld_simpl@data$PopulationDensity <- ifelse(wrld_simpl@data$PopulationDensity == "NaN", NA, wrld_simpl@data$PopulationDensity)
# inspect data
head(wrld_simpl@data)
##     FIPS ISO2 ISO3 UN                NAME   AREA  POP2005 REGION SUBREGION
## ATG   AC   AG  ATG 28 Antigua and Barbuda     44    83039     19        29
## DZA   AG   DZ  DZA 12             Algeria 238174 32854159      2        15
## AZE   AJ   AZ  AZE 31          Azerbaijan   8260  8352021    142       145
## ALB   AL   AL  ALB  8             Albania   2740  3153731    150        39
## ARM   AM   AM  ARM 51             Armenia   2820  3017661    142       145
## AGO   AO   AO  AGO 24              Angola 124670 16095214      2        17
##         LON     LAT PopulationDensity
## ATG -61.783  17.078           1887.25
## DZA   2.632  28.163            137.94
## AZE  47.395  40.430           1011.14
## ALB  20.068  41.143           1151.00
## ARM  44.563  40.534           1070.09
## AGO  17.544 -12.296            129.10

We can now display the data.

# define colors
qpal <- colorQuantile(rev(viridis::viridis(10)),
                      wrld_simpl$PopulationDensity, n=10)
# generate visualization
l <- leaflet(wrld_simpl, options =
               leafletOptions(attributionControl = FALSE, minzoom=1.5)) %>%
  addPolygons(
    label=~stringr::str_c(
      NAME, ' ',
      formatC(PopulationDensity, big.mark = ',', format='d')),
    labelOptions= labelOptions(direction = 'auto'),
    weight=1, color='#333333', opacity=1,
    fillColor = ~qpal(PopulationDensity), fillOpacity = 1,
    highlightOptions = highlightOptions(
      color='#000000', weight = 2,
      bringToFront = TRUE, sendToBack = TRUE)
    ) %>%
  addLegend(
    "topright", pal = qpal, values = ~PopulationDensity,
    title = htmltools::HTML("Population density <br> (2005)"),
    opacity = 1 )
# display visualization
l