Statistical Modelling I

HES 505 Fall 2024: Session 21

Carolyn Koehn

Objectives

By the end of today you should be able to:

  • Describe and implement overlay analyses

  • Extend overlay analysis to statistical modeling

  • Generate spatial predictions from statistical models

Overlay Analyses

Overlays

  • Methods for identifying optimal site selection or suitability

  • Apply a common scale to diverse or dissimilar outputs

Getting Started

  1. Define the problem.

  2. Break the problem into submodels.

  3. Determine significant layers.

  4. Reclassify or transform the data within a layer.

  5. Add or combine the layers.

  6. Verify

Boolean Overlays

  • Successive disqualification of areas

  • Series of “yes/no” questions

  • “Sieve” mapping

Boolean Overlays

  • Reclassifying

  • Which types of land are appropriate

nlcd <-  rast(system.file("raster/nlcd.tif", package = "spDataLarge"))
plot(nlcd)

Boolean Overlays

  • Which types of land are appropriate?
nlcd.segments <- segregate(nlcd)
names(nlcd.segments) <- levels(nlcd)[[1]][-1,2]
plot(nlcd.segments)

Boolean Overlays

  • Which types of land are appropriate?
srtm <- rast(system.file("raster/srtm.tif", package = "spDataLarge"))
slope <- terrain(srtm, v = "slope")

Boolean Overlays

  • Make sure data is aligned!
suit.slope <- slope < 10
suit.landcov <- nlcd.segments["Shrubland"]
suit.slope.match <- project(suit.slope, suit.landcov)
suit <- suit.slope.match + suit.landcov

Boolean Overlays

Challenges with Boolean Overlays

  1. Assume relationships are really Boolean

  2. No measurement error

  3. Categorical measurements are known exactly

  4. Boundaries are well-represented

A more general approach

  • Define a favorability metric

\[ \begin{equation} F(\mathbf{s}) = \prod_{M=1}^{m}X_m(\mathbf{s}) \end{equation} \]

  • Treat \(F(\mathbf{s})\) as binary
  • Then \(F(\mathbf{s}) = 1\) if all inputs (\(X_m(\mathbf{s})\)) are suitable
  • Then \(F(\mathbf{s}) = 0\) if not

Estimating favorability

\[ \begin{equation} F(\mathbf{s}) = f(w_1X_1(\mathbf{s}), w_2X_2(\mathbf{s}), w_3X_3(\mathbf{s}), ..., w_mX_m(\mathbf{s})) \end{equation} \]

  • \(F(\mathbf{s})\) does not have to be binary (could be ordinal or continuous)

  • \(X_m(\mathbf{s})\) could also be extended beyond simply ‘suitable/not suitable’

  • Adding weights allows incorporation of relative importance

  • Other functions for combining inputs (\(X_m(\mathbf{s})\))

Weighted Linear Combinations

\[ \begin{equation} F(\mathbf{s}) = \frac{\sum_{i=1}^{m}w_iX_i(\mathbf{s})}{\sum_{i=1}^{m}w_i} \end{equation} \]

  • \(F(s)\) is now an index based on the values of \(X_m(\mathbf{s})\)

  • \(w_i\) can incorporate weights of evidence, uncertainty, or different participant preferences

  • Dividing by \(\sum_{i=1}^{m}w_i\) normalizes by the sum of weights

Model-driven overlay

\[ \begin{equation} F(\mathbf{s}) = w_0 + \sum_{i=1}^{m}w_iX_i(\mathbf{s}) + \epsilon \end{equation} \]

  • If we estimate \(w_i\) using data, we specify \(F(s)\) as the outcome of regression

  • When \(F(s)\) is binary → logistic regression

  • When \(F(s)\) is continuous → linear (gamma) regression

  • When \(F(s)\) is discrete → Poisson regression

  • Assumptions about \(\epsilon\) matter!!

Logistic Regression and Distribution Models

Why do we create distribution models?

  • To identify important correlations between predictors and the occurrence of an event

  • Generate maps of the ‘range’ or ‘niche’ of events

  • Understand spatial patterns of event co-occurrence

  • Forecast changes in event distributions