Semi-parametric spatial filtering

The function selects eigenvectors in a semi-parametric spatial filtering approach to removing spatial dependence from linear models. Selection is by brute force by finding the single eigenvector reducing the standard variate of Moran's I for regression residuals most, and continuing until no candidate eigenvector reduces the value by more than tol. It returns a summary table from the selection process and a matrix of selected eigenvectors for the specified model.

Usage

SpatialFiltering(formula, lagformula=NULL, data=list(), na.action=na.fail,
 nb=NULL, glist = NULL,
 style = "C", zero.policy = NULL, tol = 0.1, zerovalue = 1e-04,
 ExactEV = FALSE, symmetric = TRUE, alpha=NULL, alternative="two.sided",
 verbose=NULL)

Arguments

formula: a symbolic description of the model to be fit, assuming a spatial error representation; when lagformula is given, it should include only the response and the intercept term
lagformula: An extra one-sided formula to be used when a spatial lag representation is desired; the intercept is excluded within the function if present because it is part of the formula argument, but excluding it explicitly in the lagformula argument in the presence of factors generates a collinear model matrix
data: an optional data frame containing the variables in the model
nb: an object of class nb
glist: list of general weights corresponding to neighbours
style: style can take values W, B, C, U, and S
na.action: a function (default options("na.action")), can also be na.omit or na.exclude with consequences for residuals and fitted values - in these cases the spatial weights list will be subsetted to remove NAs in the data. It may be necessary to set zero.policy to TRUE because this subsetting may create no-neighbour observations. Note that only weights lists created without using the glist argument to nb2listw may be subsetted.
zero.policy: default NULL, use global option value; if FALSE stop with error for any empty neighbour sets, if TRUE permit the weights list to be formed with zero-length weights vectors
tol: tolerance value for convergence of spatial filtering
zerovalue: eigenvectors with eigenvalues of an absolute value smaller than zerovalue will be excluded in eigenvector search
ExactEV: Set ExactEV=TRUE to use exact expectations and variances rather than the expectation and variance of Moran's I from the previous iteration, default FALSE
symmetric: Should the spatial weights matrix be forced to symmetry, default TRUE
alpha: if not NULL, used instead of the tol= argument as a stopping rule to choose all eigenvectors up to and including the one with a probability value exceeding alpha.
alternative: a character string specifying the alternative hypothesis, must be one of greater, less or two.sided (default).
verbose: default NULL, use global option value; if TRUE report eigenvectors selected

Value

An SfResult object, with:

selection

a matrix summarising the selection of eigenvectors for inclusion, with columns:

Step: Step counter of the selection procedure
SelEvec: number of selected eigenvector (sorted descending)
Eval: its associated eigenvalue
MinMi: value Moran's I for residual autocorrelation
ZMinMi: standardized value of Moran's I assuming a normal approximation
pr(ZI): probability value of the permutation-based standardized deviate for the given value of the alternative argument
R2: R^2 of the model including exogenous variables and eigenvectors
gamma: regression coefficient of selected eigenvector in fit

The first row is the value at the start of the search

dataset

a matrix of the selected eigenvectors in order of selection

References

Tiefelsdorf M, Griffith DA. (2007) Semiparametric Filtering of Spatial Autocorrelation: The Eigenvector Approach. Environment and Planning A, 39 (5) 1193 - 1221.

Author

Yongwan Chun, Michael Tiefelsdorf, Roger Bivand

Examples

require("sf", quietly=TRUE)
columbus <- st_read(system.file("shapes/columbus.gpkg", package="spData")[1], quiet=TRUE)
#require("spdep", quietly=TRUE)
col.gal.nb <- spdep::read.gal(system.file("weights/columbus.gal", package="spData")[1])
lmbase <- lm(CRIME ~ INC + HOVAL, data=columbus)
sarcol <- SpatialFiltering(CRIME ~ INC + HOVAL, data=columbus,
 nb=col.gal.nb, style="W", ExactEV=TRUE)
sarcol
#>   Step SelEvec      Eval        MinMi      ZMinMi      Pr(ZI)        R2
#> 0    0       0 0.0000000  0.212374153  2.68100025 0.007340246 0.5524040
#> 1    1       5 0.7148326  0.121528166  1.89037770 0.058707464 0.6209393
#> 2    2       3 0.8408661  0.065848648  1.54064108 0.123404165 0.6481722
#> 3    3       1 1.0206316 -0.005424824  1.08514557 0.277857187 0.6726114
#> 4    4      10 0.3658588 -0.039356232  0.80357070 0.421644951 0.7000258
#> 5    5      14 0.1831325 -0.072949543  0.47790213 0.632719864 0.7393770
#> 6    6      11 0.3144120 -0.108332631  0.18566599 0.852706701 0.7611907
#> 7    7       2 0.9157325 -0.153675621 -0.03464097 0.972366030 0.7713163
#>       gamma
#> 0   0.00000
#> 1  30.34786
#> 2  19.13010
#> 3 -18.12234
#> 4 -19.19379
#> 5 -22.99586
#> 6  17.12127
#> 7  11.66487
lmsar <- lm(CRIME ~ INC + HOVAL + fitted(sarcol), data=columbus)
(x <- summary(lmsar))
#> 
#> Call:
#> lm(formula = CRIME ~ INC + HOVAL + fitted(sarcol), data = columbus)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -27.6527  -5.3084   0.0804   5.6844  15.6912 
#> 
#> Coefficients:
#>                      Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)          68.61896    3.67609  18.666  < 2e-16 ***
#> INC                  -1.59731    0.25938  -6.158 3.12e-07 ***
#> HOVAL                -0.27393    0.08011  -3.419  0.00148 ** 
#> fitted(sarcol)vec5   30.34786    8.87679   3.419  0.00149 ** 
#> fitted(sarcol)vec3   19.13010    8.87679   2.155  0.03739 *  
#> fitted(sarcol)vec1  -18.12234    8.87679  -2.042  0.04800 *  
#> fitted(sarcol)vec10 -19.19379    8.87679  -2.162  0.03679 *  
#> fitted(sarcol)vec14 -22.99586    8.87679  -2.591  0.01341 *  
#> fitted(sarcol)vec11  17.12127    8.87679   1.929  0.06106 .  
#> fitted(sarcol)vec2   11.66487    8.87679   1.314  0.19649    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 8.877 on 39 degrees of freedom
#> Multiple R-squared:  0.7713,	Adjusted R-squared:  0.7185 
#> F-statistic: 14.62 on 9 and 39 DF,  p-value: 5.579e-10
#> 
coef(x)
#>                        Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept)          68.6189611 3.67608656 18.666307 4.802352e-21
#> INC                  -1.5973108 0.25938068 -6.158172 3.122801e-07
#> HOVAL                -0.2739315 0.08011158 -3.419374 1.483698e-03
#> fitted(sarcol)vec5   30.3478552 8.87679493  3.418785 1.486164e-03
#> fitted(sarcol)vec3   19.1300996 8.87679493  2.155068 3.738943e-02
#> fitted(sarcol)vec1  -18.1223409 8.87679493 -2.041541 4.800339e-02
#> fitted(sarcol)vec10 -19.1937947 8.87679493 -2.162244 3.679422e-02
#> fitted(sarcol)vec14 -22.9958588 8.87679493 -2.590559 1.340783e-02
#> fitted(sarcol)vec11  17.1212741 8.87679493  1.928768 6.106079e-02
#> fitted(sarcol)vec2   11.6648669 8.87679493  1.314085 1.964945e-01
anova(lmbase, lmsar)
#> Analysis of Variance Table
#> 
#> Model 1: CRIME ~ INC + HOVAL
#> Model 2: CRIME ~ INC + HOVAL + fitted(sarcol)
#>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
#> 1     46 6014.9                                  
#> 2     39 3073.1  7    2941.8 5.3334 0.0002445 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
spdep::lm.morantest(lmsar, spdep::nb2listw(col.gal.nb))
#> 
#> 	Global Moran I for regression residuals
#> 
#> data:  
#> model: lm(formula = CRIME ~ INC + HOVAL + fitted(sarcol), data =
#> columbus)
#> weights: spdep::nb2listw(col.gal.nb)
#> 
#> Moran I statistic standard deviate = -0.034641, p-value = 0.5138
#> alternative hypothesis: greater
#> sample estimates:
#> Observed Moran I      Expectation         Variance 
#>     -0.153675621     -0.150918131      0.006336477 
#> 
lagcol <- SpatialFiltering(CRIME ~ 1, ~ INC + HOVAL - 1, data=columbus,
 nb=col.gal.nb, style="W")
lagcol
#>    Step SelEvec      Eval       MinMi      ZMinMi      Pr(ZI)        R2
#> 0     0       0 0.0000000  0.21237415  2.68100025 0.007340246 0.5524040
#> 1     1       6 0.7161123  0.11782248  1.84511963 0.065020139 0.6038801
#> 2     2       4 0.8682938  0.06242664  1.49482111 0.134961136 0.6531288
#> 3     3       1 1.0310063 -0.02066604  0.88134183 0.378132834 0.6924845
#> 4     4       5 0.7905397 -0.04619973  0.84746904 0.396733736 0.7136578
#> 5     5      15 0.1753342 -0.07609524  0.55233191 0.580720971 0.7558543
#> 6     6       9 0.5501433 -0.10190889  0.43919419 0.660520837 0.7626784
#> 7     7       8 0.5721041 -0.12232942  0.41846803 0.675604953 0.7757314
#> 8     8       3 0.9026222 -0.14991822  0.38315383 0.701605709 0.7908693
#> 9     9       2 0.9649166 -0.21756342 -0.28556733 0.775209527 0.8078727
#> 10   10       7 0.6219404 -0.22017920 -0.04856547 0.961265592 0.8082842
#>         gamma
#> 0    0.000000
#> 1   19.848854
#> 2   35.542595
#> 3  -30.697851
#> 4  -24.540372
#> 5   25.227798
#> 6    7.590082
#> 7  -16.933168
#> 8  -20.556931
#> 9  -18.434534
#> 10  -2.597572
lmlag <- lm(CRIME ~ INC + HOVAL + fitted(lagcol), data=columbus)
lmlag
#> 
#> Call:
#> lm(formula = CRIME ~ INC + HOVAL + fitted(lagcol), data = columbus)
#> 
#> Coefficients:
#>         (Intercept)                  INC                HOVAL  
#>             56.7977              -0.4857              -0.3821  
#>  fitted(lagcol)vec6   fitted(lagcol)vec4   fitted(lagcol)vec1  
#>             19.8489              35.5426             -30.6979  
#>  fitted(lagcol)vec5  fitted(lagcol)vec15   fitted(lagcol)vec9  
#>            -24.5404              25.2278               7.5901  
#>  fitted(lagcol)vec8   fitted(lagcol)vec3   fitted(lagcol)vec2  
#>            -16.9332             -20.5569             -18.4345  
#>  fitted(lagcol)vec7  
#>             -2.5976  
#> 
anova(lmbase, lmlag)
#> Analysis of Variance Table
#> 
#> Model 1: CRIME ~ INC + HOVAL
#> Model 2: CRIME ~ INC + HOVAL + fitted(lagcol)
#>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
#> 1     46 6014.9                                  
#> 2     36 2576.3 10    3438.6 4.8049 0.0002165 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
spdep::lm.morantest(lmlag, spdep::nb2listw(col.gal.nb))
#> 
#> 	Global Moran I for regression residuals
#> 
#> data:  
#> model: lm(formula = CRIME ~ INC + HOVAL + fitted(lagcol), data =
#> columbus)
#> weights: spdep::nb2listw(col.gal.nb)
#> 
#> Moran I statistic standard deviate = -0.048565, p-value = 0.5194
#> alternative hypothesis: greater
#> sample estimates:
#> Observed Moran I      Expectation         Variance 
#>     -0.220179195     -0.217083975      0.004061888 
#> 
NA.columbus <- columbus
NA.columbus$CRIME[20:25] <- NA
COL.SF.NA <- SpatialFiltering(CRIME ~ INC + HOVAL, data=NA.columbus,
 nb=col.gal.nb, style="W", na.action=na.exclude)
#> Warning: subsetting caused increase in subgraph count
COL.SF.NA$na.action
#> 20 21 22 23 24 25 
#> 20 21 22 23 24 25 
#> attr(,"class")
#> [1] "exclude"
summary(lm(CRIME ~ INC + HOVAL + fitted(COL.SF.NA), data=NA.columbus,
 na.action=na.exclude))
#> 
#> Call:
#> lm(formula = CRIME ~ INC + HOVAL + fitted(COL.SF.NA), data = NA.columbus, 
#>     na.action = na.exclude)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -23.6712  -4.7984   0.1761   6.7460  11.3353 
#> 
#> Coefficients:
#>                     Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)         69.04674    3.91643  17.630  < 2e-16 ***
#> INC                 -1.60115    0.26017  -6.154 3.88e-07 ***
#> HOVAL               -0.28742    0.07716  -3.725 0.000649 ***
#> fitted(COL.SF.NA)1 -39.91305    8.23597  -4.846 2.27e-05 ***
#> fitted(COL.SF.NA)2  19.81805    8.23597   2.406 0.021226 *  
#> fitted(COL.SF.NA)3 -35.07336    8.23597  -4.259 0.000135 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 8.236 on 37 degrees of freedom
#>   (6 observations deleted due to missingness)
#> Multiple R-squared:  0.7772,	Adjusted R-squared:  0.7471 
#> F-statistic: 25.81 on 5 and 37 DF,  p-value: 3.996e-11
#>