HES 505 Fall 2024: Session 25
By the end of today you should be able to:
Describe some basic principles of data visualization
Extend principles of data visualization to the development of maps
Distinguish between several common types of spatial data visualization
Understand the relationship between the Grammar of Graphics and ggplot
syntax
Describe the various options for customizing ggplots
and their syntactic conventions
Lots of examples of good and bad data visualization
What makes a graphic good (or bad)?
Who decides?
Rule: externally compels you, through force, threat or punishment, to do the things someone else has deemed good or right.
Principle: internally motivating because it is a good practice; a general statement describing a philosophy that good rules should satisfy
Rules contribute to the design process, but do not guarantee a satisfactory outcome
“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design … [It] consists of complex ideas communicated with clarity, precision, and efficiency. … [It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space … [It] is nearly always multivariate … And graphical excellence requires telling the truth about the data.”
Ugly: graphic is clear and informative, but has aesthetic issues
Bad: graphic is unclear, confusing, or decieving
Wrong: the figure is objectively incorrect
Presentation of the data is (intentionally?) decieving
Presentation is just incorrect
Grammar: A set of structural rules that help establish the components of a language
System and structure of language consist of syntax and semantics
Grammar of Graphics: a framework that allows us to concisely describe the components of any graphic
Follows a layered approach by using defined components to build a visualization
ggplot2
is a formal implementation in R
Define the systematic conversion of data into elements of the visualization
Are either categorical or continuous (exclusively)
Examples include x
, y
, fill
, color
, and alpha
Scales map data values to their aesthetics
Must be a one-to-one relationship; each specific data value should map to only one aesthetic
Be Honest
Principle of proportional ink
Avoid unnecessary ‘chart junk’
Use color judiciously
Balance data and context
Maps organize a lot of information in a coherent way
They invite critique and inspection
They are also aesthetic objects that can engage broader audiences
Thinking about projections
Scale of the map
Errors of Omission
Concept before compilation
Hierarchy with harmony (Important things should look important)
Simplicity from sacrifice
Maximum information at minimum cost
Engage emotion to enhance understanding
Relates map distance to distance on the ground
Ratio scales (1:24,000 or 1/24,000)
Graphic scales
Large vs. small-scale?
Distortion makes scale invalid across large areas
Distortion increases with distance from standard line
Five distortions: areas, angles, shapes, distances, and direction
Graphic code for retrieving information
(De-)emphasize (un)important information
Contrast and the role of colors
A good map tells a multitude of little white lies: it supresses truth to help the user see what needs to be seen…
Zhilin et al. 2008
Filter out irrelevant details
Two elements: selection and classification
Reflect interpretations of the relative importance of different features
Dot Maps: quantity represented by amount and concentration of dots
Proportional Symbol Map: Geometric symbols scaled in proportion to a quantity
From High Country News
Mapping color to geographies
Common problems
From Healy 2019
Adjusts for differences in area, population, etc
Common Problems
From Healy 2019
{ggplot2}
is a system for declaratively creating graphics,
based on “The Grammar of Graphics” (Wilkinson, 2005).
You provide the data, tell ggplot2
how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Component | Function | Explanation |
---|---|---|
Data |
ggplot(data)
|
The raw data that you want to visualise. |
Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
Geometries |
geom_*()
|
The geometric shapes representing the data. |
Component | Function | Explanation |
---|---|---|
Data |
ggplot(data)
|
The raw data that you want to visualise. |
Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
Geometries |
geom_*()
|
The geometric shapes representing the data. |
Statistics |
stat_*()
|
The statistical transformations applied to the data. |
Scales |
scale_*()
|
Maps between the data and the aesthetic dimensions. |
Coordinate System |
coord_*()
|
Maps data into the plane of the data rectangle. |
Facets |
facet_*()
|
The arrangement of the data into a grid of plots. |
Visual Themes |
theme() and theme_*()
|
The overall visual defaults of a plot. |
Bike sharing counts in London, UK, powered by TfL Open Data
Variable | Description | Class |
---|---|---|
date | Date encoded as `YYYY-MM-DD` | date |
day_night | `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) | character |
year | `2015` or `2016` | factor |
month | `1` (January) to `12` (December) | factor |
season | `winter`, `spring`, `summer`, or `autumn` | factor |
count | Sum of reported bikes rented | integer |
is_workday | `TRUE` being Monday to Friday and no bank holiday | logical |
is_weekend | `TRUE` being Saturday or Sunday | logical |
is_holiday | `TRUE` being a bank holiday in the UK | logical |
temp | Average air temperature (°C) | double |
temp_feel | Average feels like temperature (°C) | double |
humidity | Average air humidity (%) | double |
wind_speed | Average wind speed (km/h) | double |
weather_type | Most common weather type | character |
ggplot2::ggplot()
= link variables to graphical properties
x
, y
)color
, fill
)shape
, linetype
)size
)alpha
)group
)aes()
outside as component
= interpret aesthetics as graphical representations
ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
color = "#28a87d",
alpha = .5
)
ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
aes(color = season),
alpha = .5
)
ggplot(bikes, aes(x = season)) +
stat_count(geom = "bar")
ggplot(bikes, aes(x = season)) +
geom_bar(stat = "count")
ggplot(bikes, aes(x = date, y = temp_feel)) +
stat_identity(geom = "point")
ggplot(bikes, aes(x = date, y = temp_feel)) +
geom_point(stat = "identity")
= split variables to multiple panels
Facets are also known as:
= translate between variable ranges and property ranges
The scale_*()
components control the properties of all the aesthetic dimensions mapped to the data.
Consequently, there are scale_*()
functions for all aesthetics such as:
positions via scale_x_*()
and scale_y_*()
colors via scale_color_*()
and scale_fill_*()
sizes via scale_size_*()
and scale_radius_*()
shapes via scale_shape_*()
and scale_linetype_*()
transparency via scale_alpha_*()
The scale_*()
components control the properties of all the aesthetic dimensions mapped to the data.
The extensions (*
) can be filled by e.g.:
continuous()
, discrete()
, reverse()
, log10()
, sqrt()
, date()
for positions
continuous()
, discrete()
, manual()
, gradient()
, gradient2()
, brewer()
for colors
continuous()
, discrete()
, manual()
, ordinal()
, area()
, date()
for sizes
continuous()
, discrete()
, manual()
, ordinal()
for shapes
continuous()
, discrete()
, manual()
, ordinal()
, date()
for transparency
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
= interpret the position aesthetics
coord_cartesian()
coord_fixed()
coord_flip()
coord_polar()
coord_map()
and coord_sf()
coord_trans()