Data Structures

Published

September 1, 2023

Data Types and Structures

Data Types

Okay, now that we have all of those details out of the way, let’s take a look at data structures in R. As we discussed,R has six basic types of data: numeric, integer, logical, complex, character, and raw. For this class, we won’t bother with complex or raw as you are unlikely to encounter them in your introductory spatial explorations.

Numeric data are numbers that contain a decimal. They can also be whole numbers
Integers are whole numbers (those numbers without a decimal point).
Logical data take on the value of either TRUE or FALSE. There’s also another special type of logical called NA to represent missing values.
Character data represent string values. You can think of character strings as something like a word (or multiple words). A special type of character string is a factor, which is a string but with additional attributes (like levels or an order). Factors become important in the analyses and visualizations we’ll attempt later in the course.

There are a variety of ways to learn more about the structure of different data types:

class() - returns the type of object (high level)
typeof() - returns the type of object (low level)
length() tells you about the length of an object
attributes() - does the object have any metadata

Code

num <- 2.2
class(num)

[1] "numeric"

Code

typeof(num)

[1] "double"

Code

y <- 1:10 
y

 [1]  1  2  3  4  5  6  7  8  9 10

Code

class(y)

[1] "integer"

Code

typeof(y)

[1] "integer"

Code

length(y)

[1] 10

Code

b <- "3"
class(b)

[1] "character"

Code

is.numeric(b)

[1] FALSE

Code

c <- as.numeric(b)
class(c)

[1] "numeric"

Data Structures

You can store information in a variety of ways in R. The types we are most likely to encounter this semester are:

Vectors: a collection of elements that are typically character, logical, integer, or numeric.

Code

#sometimes we'll need to make sequences of numbers to facilitate joins
series <- 1:10
series.2 <- seq(10)
series.3 <- seq(from = 1, to = 10, by = 0.1)
series

 [1]  1  2  3  4  5  6  7  8  9 10

Code

series.2

 [1]  1  2  3  4  5  6  7  8  9 10

Code

series.3

 [1]  1.0  1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9  2.0  2.1  2.2  2.3  2.4
[16]  2.5  2.6  2.7  2.8  2.9  3.0  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9
[31]  4.0  4.1  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.0  5.1  5.2  5.3  5.4
[46]  5.5  5.6  5.7  5.8  5.9  6.0  6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
[61]  7.0  7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.0  8.1  8.2  8.3  8.4
[76]  8.5  8.6  8.7  8.8  8.9  9.0  9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9
[91] 10.0

Code

c(series.2, series.3)

  [1]  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0 10.0  1.0  1.1  1.2  1.3  1.4
 [16]  1.5  1.6  1.7  1.8  1.9  2.0  2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9
 [31]  3.0  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.0  4.1  4.2  4.3  4.4
 [46]  4.5  4.6  4.7  4.8  4.9  5.0  5.1  5.2  5.3  5.4  5.5  5.6  5.7  5.8  5.9
 [61]  6.0  6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9  7.0  7.1  7.2  7.3  7.4
 [76]  7.5  7.6  7.7  7.8  7.9  8.0  8.1  8.2  8.3  8.4  8.5  8.6  8.7  8.8  8.9
 [91]  9.0  9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9 10.0

Code

class(series.3)

[1] "numeric"

Code

typeof(series.3)

[1] "double"

Code

length(series.3)

[1] 91

Missing Data: R supports missing data in most of the data structures we use, but they can lead to some strange behaviors. Here are a few ways to find missing data:

Code

x <- c("a", NA, "c", "d", NA)
is.na(x)

[1] FALSE  TRUE FALSE FALSE  TRUE

Code

anyNA(x)

[1] TRUE

Matrices: are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns. As with atomic vectors, the elements of a matrix must be of the same data. Matrices are the foundation of rasters, which we’ll be discussing frequently throughout the course

Code

#matrices are filled columnwise in R
m <- matrix(1:6, nrow = 2, ncol = 3)
dim(m)

[1] 2 3

Code

x <- 1:3
y <- 10:12

a <- cbind(x, y)
dim(a)

[1] 3 2

Code

a[3,1]

x 
3

Code

b <- rbind(x, y)
dim(b)

[1] 2 3

Code

b[1,3]

x 
3

Lists: Lists essentially act like containers in R - they can hold a variety of different data types and structures including more lists. We use lists a lot for functional programming in R where we can apply a function to each element in a list. We’ll see this with extracting values from multiple rasters. We can extract elements of lists usin [] and [[]]

Code

x <- list(1, "a", TRUE, 1+4i)
x

[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

[[4]]
[1] 1+4i

Code

#adding names
xlist <- list(a = "Waldo", b = 1:10, data = head(mtcars))
xlist

$a
[1] "Waldo"

$b
 [1]  1  2  3  4  5  6  7  8  9 10

$data
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Code

xlist[[1]]

[1] "Waldo"

Code

xlist[[3]]

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Code

xlist[[3]][1]

                   mpg
Mazda RX4         21.0
Mazda RX4 Wag     21.0
Datsun 710        22.8
Hornet 4 Drive    21.4
Hornet Sportabout 18.7
Valiant           18.1

Code

xlist[[3]][1,2]

[1] 6

Code

xlist[3][1]

$data
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Data Frames: data frames resemble that tabular datasets you might be used to in spreadsheet programs and are probably one of the most common types of data in R. A data frame is a special type of list where every element has the same length (but can have different types of data). We’ll be reading in a number of data frames for this first assignment.

Code

dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat

   id  x  y
1   a  1 11
2   b  2 12
3   c  3 13
4   d  4 14
5   e  5 15
6   f  6 16
7   g  7 17
8   h  8 18
9   i  9 19
10  j 10 20

Code

is.list(dat)

[1] TRUE

Code

class(dat)

[1] "data.frame"

Code

#lots of ways to look at data in data frames
str(dat) #compact summary of the structure of a dataframe

'data.frame':   10 obs. of  3 variables:
 $ id: chr  "a" "b" "c" "d" ...
 $ x : int  1 2 3 4 5 6 7 8 9 10
 $ y : int  11 12 13 14 15 16 17 18 19 20

Code

head(dat) #gives the first 6 rows similar to tail()

Code

dim(dat)

[1] 10  3

Code

colnames(dat)

[1] "id" "x"  "y"

Code

## accessing elements of a dataframe
dat[1,3]

[1] 11

Code

dat[["y"]]

 [1] 11 12 13 14 15 16 17 18 19 20

Code

dat$y

 [1] 11 12 13 14 15 16 17 18 19 20

Tibbles: are similar to data frames, but allow for lists within columns. They are designed for use with the tidyverse (which we’ll explore more in future classes), but the primary reason for introducing them here is because they are the foundation of sf objects which we’ll use frequently in the weeks to come.

Code

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.4.0
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

dat.tib <- tibble(dat)
is.list(dat.tib)

[1] TRUE

Code

class(dat.tib)

[1] "tbl_df"     "tbl"        "data.frame"

Code

#lots of ways to look at data in data frames
str(dat.tib) #compact summary of the structure of a dataframe

tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
 $ id: chr [1:10] "a" "b" "c" "d" ...
 $ x : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ y : int [1:10] 11 12 13 14 15 16 17 18 19 20

Code

head(dat.tib) #gives the first 6 rows similar to tail()

# A tibble: 6 × 3
  id        x     y
  <chr> <int> <int>
1 a         1    11
2 b         2    12
3 c         3    13
4 d         4    14
5 e         5    15
6 f         6    16

Code

dim(dat.tib)

[1] 10  3

Code

colnames(dat.tib)

[1] "id" "x"  "y"

Code

## accessing elements of a dataframe
dat.tib[1,3]

# A tibble: 1 × 1
      y
  <int>
1    11

Code

dat.tib[["y"]]

 [1] 11 12 13 14 15 16 17 18 19 20

Code

dat.tib$y

 [1] 11 12 13 14 15 16 17 18 19 20

Many of the packages used for spatial operations in R rely on special objects (e.g., sf, SpatRasters) that are combinations of these various elemental data types. That is why we are taking a little time to understand them before jumping into spatial data.