Okay, now that we have all of those details out of the way, let’s take a look at data structures in R. As we discussed,R has six basic types of data: numeric, integer, logical, complex, character, and raw. For this class, we won’t bother with complex or raw as you are unlikely to encounter them in your introductory spatial explorations.
Numeric data are numbers that contain a decimal. They can also be whole numbers
Integers are whole numbers (those numbers without a decimal point).
Logical data take on the value of either TRUE or FALSE. There’s also another special type of logical called NA to represent missing values.
Character data represent string values. You can think of character strings as something like a word (or multiple words). A special type of character string is a factor, which is a string but with additional attributes (like levels or an order). Factors become important in the analyses and visualizations we’ll attempt later in the course.
There are a variety of ways to learn more about the structure of different data types:
class() - returns the type of object (high level)
typeof() - returns the type of object (low level)
length() tells you about the length of an object
attributes() - does the object have any metadata
Code
num <-2.2class(num)
[1] "numeric"
Code
typeof(num)
[1] "double"
Code
y <-1:10y
[1] 1 2 3 4 5 6 7 8 9 10
Code
class(y)
[1] "integer"
Code
typeof(y)
[1] "integer"
Code
length(y)
[1] 10
Code
b <-"3"class(b)
[1] "character"
Code
is.numeric(b)
[1] FALSE
Code
c <-as.numeric(b)class(c)
[1] "numeric"
Data Structures
You can store information in a variety of ways in R. The types we are most likely to encounter this semester are:
Vectors: a collection of elements that are typically character, logical, integer, or numeric.
Code
#sometimes we'll need to make sequences of numbers to facilitate joinsseries <-1:10series.2<-seq(10)series.3<-seq(from =1, to =10, by =0.1)series
Missing Data: R supports missing data in most of the data structures we use, but they can lead to some strange behaviors. Here are a few ways to find missing data:
Code
x <-c("a", NA, "c", "d", NA)is.na(x)
[1] FALSE TRUE FALSE FALSE TRUE
Code
anyNA(x)
[1] TRUE
Matrices: are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns. As with atomic vectors, the elements of a matrix must be of the same data. Matrices are the foundation of rasters, which we’ll be discussing frequently throughout the course
Code
#matrices are filled columnwise in Rm <-matrix(1:6, nrow =2, ncol =3)dim(m)
[1] 2 3
Code
x <-1:3y <-10:12a <-cbind(x, y)dim(a)
[1] 3 2
Code
a[3,1]
x
3
Code
b <-rbind(x, y)dim(b)
[1] 2 3
Code
b[1,3]
x
3
Lists: Lists essentially act like containers in R - they can hold a variety of different data types and structures including more lists. We use lists a lot for functional programming in R where we can apply a function to each element in a list. We’ll see this with extracting values from multiple rasters. We can extract elements of lists usin [] and [[]]
Data Frames: data frames resemble that tabular datasets you might be used to in spreadsheet programs and are probably one of the most common types of data in R. A data frame is a special type of list where every element has the same length (but can have different types of data). We’ll be reading in a number of data frames for this first assignment.
Code
dat <-data.frame(id = letters[1:10], x =1:10, y =11:20)dat
id x y
1 a 1 11
2 b 2 12
3 c 3 13
4 d 4 14
5 e 5 15
6 f 6 16
7 g 7 17
8 h 8 18
9 i 9 19
10 j 10 20
Code
is.list(dat)
[1] TRUE
Code
class(dat)
[1] "data.frame"
Code
#lots of ways to look at data in data framesstr(dat) #compact summary of the structure of a dataframe
'data.frame': 10 obs. of 3 variables:
$ id: chr "a" "b" "c" "d" ...
$ x : int 1 2 3 4 5 6 7 8 9 10
$ y : int 11 12 13 14 15 16 17 18 19 20
Code
head(dat) #gives the first 6 rows similar to tail()
id x y
1 a 1 11
2 b 2 12
3 c 3 13
4 d 4 14
5 e 5 15
6 f 6 16
Code
dim(dat)
[1] 10 3
Code
colnames(dat)
[1] "id" "x" "y"
Code
## accessing elements of a dataframedat[1,3]
[1] 11
Code
dat[["y"]]
[1] 11 12 13 14 15 16 17 18 19 20
Code
dat$y
[1] 11 12 13 14 15 16 17 18 19 20
Tibbles: are similar to data frames, but allow for lists within columns. They are designed for use with the tidyverse (which we’ll explore more in future classes), but the primary reason for introducing them here is because they are the foundation of sf objects which we’ll use frequently in the weeks to come.
Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.4.0
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
dat.tib <-tibble(dat)is.list(dat.tib)
[1] TRUE
Code
class(dat.tib)
[1] "tbl_df" "tbl" "data.frame"
Code
#lots of ways to look at data in data framesstr(dat.tib) #compact summary of the structure of a dataframe
tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
$ id: chr [1:10] "a" "b" "c" "d" ...
$ x : int [1:10] 1 2 3 4 5 6 7 8 9 10
$ y : int [1:10] 11 12 13 14 15 16 17 18 19 20
Code
head(dat.tib) #gives the first 6 rows similar to tail()
# A tibble: 6 × 3
id x y
<chr> <int> <int>
1 a 1 11
2 b 2 12
3 c 3 13
4 d 4 14
5 e 5 15
6 f 6 16
Code
dim(dat.tib)
[1] 10 3
Code
colnames(dat.tib)
[1] "id" "x" "y"
Code
## accessing elements of a dataframedat.tib[1,3]
# A tibble: 1 × 1
y
<int>
1 11
Code
dat.tib[["y"]]
[1] 11 12 13 14 15 16 17 18 19 20
Code
dat.tib$y
[1] 11 12 13 14 15 16 17 18 19 20
Many of the packages used for spatial operations in R rely on special objects (e.g., sf, SpatRasters) that are combinations of these various elemental data types. That is why we are taking a little time to understand them before jumping into spatial data.
Source Code
---title: "Data Structures"date: "2023-9-1"---## Data Types and Structures### Data TypesOkay, now that we have all of those details out of the way, let's take a look at data structures in `R`. As we discussed,`R` has six basic types of data: numeric, integer, logical, complex, character, and raw. For this class, we won't bother with complex or raw as you are unlikely to encounter them in your introductory spatial explorations.* __Numeric__ data are numbers that contain a decimal. They can also be whole numbers* __Integers__ are whole numbers (those numbers without a decimal point). * __Logical__ data take on the value of either `TRUE` or `FALSE`. There’s also another special type of logical called `NA` to represent missing values.* __Character data__ represent string values. You can think of character strings as something like a word (or multiple words). A special type of character string is a factor, which is a string but with additional attributes (like levels or an order). Factors become important in the analyses and visualizations we'll attempt later in the course.There are a variety of ways to learn more about the structure of different data types:* `class()` - returns the type of object (high level)* `typeof()` - returns the type of object (low level)* `length()` tells you about the length of an object* `attributes()` - does the object have any metadata```{r datastructure}num <- 2.2class(num)typeof(num)y <- 1:10 yclass(y)typeof(y)length(y)b <- "3"class(b)is.numeric(b)c <- as.numeric(b)class(c)```### Data StructuresYou can store information in a variety of ways in `R`. The types we are most likely to encounter this semester are:* __Vectors__: a collection of elements that are typically `character`, `logical`, `integer`, or `numeric`.```{r makevects}#sometimes we'll need to make sequences of numbers to facilitate joinsseries <- 1:10series.2 <- seq(10)series.3 <- seq(from = 1, to = 10, by = 0.1)seriesseries.2series.3c(series.2, series.3)class(series.3)typeof(series.3)length(series.3)``` * Missing Data: R supports missing data in most of the data structures we use, but they can lead to some strange behaviors. Here are a few ways to find missing data:```{r missingdata}x <- c("a", NA, "c", "d", NA)is.na(x)anyNA(x)```* __Matrices__: are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns. As with atomic vectors, the _elements of a matrix must be of the same data_. Matrices are the foundation of rasters, which we'll be discussing frequently throughout the course```{r matrices}#matrices are filled columnwise in Rm <- matrix(1:6, nrow = 2, ncol = 3)dim(m)x <- 1:3y <- 10:12a <- cbind(x, y)dim(a)a[3,1]b <- rbind(x, y)dim(b)b[1,3]```* __Lists__: Lists essentially act like containers in `R` - they can hold a variety of different data types and structures including more lists. We use lists a lot for functional programming in R where we can apply a function to each element in a list. We'll see this with extracting values from multiple rasters. We can extract elements of lists usin `[]` and `[[]]````{r listex}x <- list(1, "a", TRUE, 1+4i)x#adding namesxlist <- list(a = "Waldo", b = 1:10, data = head(mtcars))xlistxlist[[1]]xlist[[3]]xlist[[3]][1]xlist[[3]][1,2]xlist[3][1]```* __Data Frames__: data frames resemble that tabular datasets you might be used to in spreadsheet programs and are probably one of the most common types of data in `R`. A data frame is a special type of list where every element has the same length (but can have different types of data). We'll be reading in a number of data frames for this first assignment. ```{r datframeintro}dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20)datis.list(dat)class(dat)#lots of ways to look at data in data framesstr(dat) #compact summary of the structure of a dataframehead(dat) #gives the first 6 rows similar to tail()dim(dat)colnames(dat)## accessing elements of a dataframedat[1,3]dat[["y"]]dat$y```* __Tibbles__: are similar to data frames, but allow for lists _within_ columns. They are designed for use with the `tidyverse` (which we'll explore more in future classes), but the primary reason for introducing them here is because they are the foundation of `sf` objects which we'll use frequently in the weeks to come.```{r tibble}library(tidyverse)dat.tib <- tibble(dat)is.list(dat.tib)class(dat.tib)#lots of ways to look at data in data framesstr(dat.tib) #compact summary of the structure of a dataframehead(dat.tib) #gives the first 6 rows similar to tail()dim(dat.tib)colnames(dat.tib)## accessing elements of a dataframedat.tib[1,3]dat.tib[["y"]]dat.tib$y```Many of the packages used for spatial operations in `R` rely on special objects (e.g., `sf`, `SpatRasters`) that are combinations of these various elemental data types. That is why we are taking a little time to understand them before jumping into spatial data.