Data Visualization and Maps I

HES 505 Fall 2024: Session 25

Carolyn Koehn

Objectives

By the end of today you should be able to:

  • Describe some basic principles of data visualization

  • Extend principles of data visualization to the development of maps

  • Distinguish between several common types of spatial data visualization

  • Understand the relationship between the Grammar of Graphics and ggplot syntax

  • Describe the various options for customizing ggplots and their syntactic conventions

But first… Scaling

Assignment 9: Scaling the hazard data

hazard.smooth.scl <- (hazard.smooth - mean(incident.cejst.prep$hazard))/sd(incident.cejst.prep$hazard)
#versus
hazard.smooth.scl.nogood <- scale(hazard.smooth)

Assignment 9: Scaling the hazard data

Assignment 9: Different predictions for different scaling

Introduction to Data Visualization

Principles vs. Rules

  • Lots of examples of good and bad data visualization

  • What makes a graphic good (or bad)?

  • Who decides?

  • Rule: externally compels you, through force, threat or punishment, to do the things someone else has deemed good or right.

  • Principle: internally motivating because it is a good practice; a general statement describing a philosophy that good rules should satisfy

  • Rules contribute to the design process, but do not guarantee a satisfactory outcome

“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design … [It] consists of complex ideas communicated with clarity, precision, and efficiency. … [It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space … [It] is nearly always multivariate … And graphical excellence requires telling the truth about the data.”
— Edward Tufte

Ugly, Wrong, and Bad

  • Ugly: graphic is clear and informative, but has aesthetic issues

  • Bad: graphic is unclear, confusing, or decieving

  • Wrong: the figure is objectively incorrect

Monstrous Costs’ by Nigel Holmes from Healy 2018

Bad and Wrong

  • Presentation of the data is (intentionally?) decieving

  • Presentation is just incorrect

Tricky (from Healy 2018)

Wrong

Grammar of Graphics (Wilkinson 2005)

  • Grammar: A set of structural rules that help establish the components of a language

  • System and structure of language consist of syntax and semantics

  • Grammar of Graphics: a framework that allows us to concisely describe the components of any graphic

  • Follows a layered approach by using defined components to build a visualization

  • ggplot2 is a formal implementation in R

Aesthetics: Mapping Data to Visual Elements

  • Define the systematic conversion of data into elements of the visualization

  • Are either categorical or continuous (exclusively)

  • Examples include x, y, fill, color, and alpha

From Wilke 2019

Scales

  • Scales map data values to their aesthetics

  • Must be a one-to-one relationship; each specific data value should map to only one aesthetic

Principles of Data Visualization

  • Be Honest

  • Principle of proportional ink

  • Avoid unnecessary ‘chart junk’

  • Use color judiciously

  • Balance data and context

Extending Data Viz to Maps

Telling stories with maps

  • Maps organize a lot of information in a coherent way

  • They invite critique and inspection

  • They are also aesthetic objects that can engage broader audiences

Key Issues

  • Thinking about projections

  • Scale of the map

  • Errors of Omission

Cartographic Principles

  1. Concept before compilation

  2. Hierarchy with harmony (Important things should look important)

  3. Simplicity from sacrifice

  4. Maximum information at minimum cost

  5. Engage emotion to enhance understanding

Map Elements

Scale

  • Relates map distance to distance on the ground

  • Ratio scales (1:24,000 or 1/24,000)

  • Graphic scales

  • Large vs. small-scale?

Projection

Developable Surfaces
  • Distortion makes scale invalid across large areas

  • Distortion increases with distance from standard line

  • Five distortions: areas, angles, shapes, distances, and direction

Map Symbols

  • Graphic code for retrieving information

  • (De-)emphasize (un)important information

  • Contrast and the role of colors

Generalization

A good map tells a multitude of little white lies: it supresses truth to help the user see what needs to be seen…
— Mark Monmonier

Geometry

Zhilin et al. 2008

Context

  • Filter out irrelevant details

  • Two elements: selection and classification

  • Reflect interpretations of the relative importance of different features

Mackaness and Chaudry

Data Maps

Point Maps

  • Dot Maps: quantity represented by amount and concentration of dots

  • Proportional Symbol Map: Geometric symbols scaled in proportion to a quantity

Ebbinghaus’ illusion

Line Maps

From High Country News

Choropleth

  • Mapping color to geographies

  • Common problems

From Healy 2019

Cartogram

  • Adjusts for differences in area, population, etc

  • Common Problems

From Healy 2019

The ggplot2 hex logo.


{ggplot2} is a system for declaratively creating graphics,
based on “The Grammar of Graphics” (Wilkinson, 2005).

You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Advantages of {ggplot2}

  • consistent underlying “grammar of graphics” (Wilkinson 2005)
  • very flexible, layered plot specification
  • theme system for polishing plot appearance
  • lots of additional functionality thanks to extensions
  • active and helpful community

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.
Statistics stat_*() The statistical transformations applied to the data.
Scales scale_*() Maps between the data and the aesthetic dimensions.
Coordinate System coord_*() Maps data into the plane of the data rectangle.
Facets facet_*() The arrangement of the data into a grid of plots.
Visual Themes theme() and theme_*() The overall visual defaults of a plot.

A Basic ggplot Example

The Data

Bike sharing counts in London, UK, powered by TfL Open Data

  • covers the years 2015 and 2016
  • incl. weather data acquired from freemeteo.com
  • prepared by Hristo Mavrodiev for Kaggle
  • further modification by myself
Variable Description Class
date Date encoded as `YYYY-MM-DD` date
day_night `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) character
year `2015` or `2016` factor
month `1` (January) to `12` (December) factor
season `winter`, `spring`, `summer`, or `autumn` factor
count Sum of reported bikes rented integer
is_workday `TRUE` being Monday to Friday and no bank holiday logical
is_weekend `TRUE` being Saturday or Sunday logical
is_holiday `TRUE` being a bank holiday in the UK logical
temp Average air temperature (°C) double
temp_feel Average feels like temperature (°C) double
humidity Average air humidity (%) double
wind_speed Average wind speed (km/h) double
weather_type Most common weather type character

ggplot2::ggplot()

The help page of the ggplot() function.

Data

ggplot(data = bikes)

Aesthetic Mapping

= link variables to graphical properties

  • positions (x, y)
  • colors (color, fill)
  • shapes (shape, linetype)
  • size (size)
  • transparency (alpha)
  • groupings (group)

Aesthetic Mapping

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)

aesthetics

aes() outside as component

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)

aes() inside, explicit matching

ggplot(data = bikes, mapping = aes(x = temp_feel, y = count))

aes() inside, implicit matching

ggplot(bikes, aes(temp_feel, count))

aes() inside, mixed matching

ggplot(bikes, aes(x = temp_feel, y = count))

Geometries


= interpret aesthetics as graphical representations

  • points
  • lines
  • polygons
  • text labels

Geometries

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point()

Visual Properties of Layers

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5,
    shape = "X",
    stroke = 1,
    size = 4
  )

Setting vs Mapping of Visual Properties

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5
  )
ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  )

Mapping Expressions

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = temp_feel > 20),
    alpha = .5
  )

Mapping Expressions

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear"),
    alpha = .5,
    size = 2
  )

Mapping to Size

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    alpha = .5
  )

Setting a Constant Property

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Adding More Layers

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Statistical Layers

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = temp_feel, y = count)) +
  stat_smooth(geom = "smooth")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = temp_feel, y = count)) +
  geom_smooth(stat = "smooth")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = season)) +
  stat_count(geom = "bar")
ggplot(bikes, aes(x = season)) +
  geom_bar(stat = "count")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = date, y = temp_feel)) +
  stat_identity(geom = "point")
ggplot(bikes, aes(x = date, y = temp_feel)) +
  geom_point(stat = "identity")

Facets

Facets


= split variables to multiple panels

Facets are also known as:

  • small multiples
  • trellis graphs
  • lattice plots
  • conditioning

Wrapped Facets

g <-
  ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .3,
    guide = "none"
  )
g +
  facet_wrap(
    vars(day_night)
  )

Wrapped Facets

g +
  facet_wrap(
    ~ day_night
  )

Scales

Scales


= translate between variable ranges and property ranges

  • feels-like temperature  ⇄  x
  • reported bike shares  ⇄  y
  • season  ⇄  color
  • year  ⇄  shape

Scales

The scale_*() components control the properties of all the aesthetic dimensions mapped to the data.

Consequently, there are scale_*() functions for all aesthetics such as:

  • positions via scale_x_*() and scale_y_*()

  • colors via scale_color_*() and scale_fill_*()

  • sizes via scale_size_*() and scale_radius_*()

  • shapes via scale_shape_*() and scale_linetype_*()

  • transparency via scale_alpha_*()

Scales

The scale_*() components control the properties of all the aesthetic dimensions mapped to the data.

The extensions (*) can be filled by e.g.:

  • continuous(), discrete(), reverse(), log10(), sqrt(), date() for positions

  • continuous(), discrete(), manual(), gradient(), gradient2(), brewer() for colors

  • continuous(), discrete(), manual(), ordinal(), area(), date() for sizes

  • continuous(), discrete(), manual(), ordinal() for shapes

  • continuous(), discrete(), manual(), ordinal(), date() for transparency

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height
  • weight
  • age
  • counts

Discrete:
qualitative or categorical data

  • species
  • sex
  • study sites
  • age group

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height (continuous)
  • weight (continuous)
  • age (continuous or discrete)
  • counts (discrete)

Discrete:
qualitative or categorical data

  • species (nominal)
  • sex (nominal)
  • study site (nominal or ordinal)
  • age group (ordinal)

Aesthetics + Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point()

Aesthetics + Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_date() +
  scale_y_continuous() +
  scale_color_discrete()

Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_color_discrete()

Coordinate Systems

= interpret the position aesthetics

  • linear coordinate systems: preserve the geometrical shapes
    • coord_cartesian()
    • coord_fixed()
    • coord_flip()
  • non-linear coordinate systems: likely change the geometrical shapes
    • coord_polar()
    • coord_map() and coord_sf()
    • coord_trans()