Spatial conditional and simultaneous autoregression model estimation
spautolm.Rd
Function taking family and weights arguments for spatial autoregression model estimation by Maximum Likelihood, using dense matrix methods, not suited to large data sets with thousands of observations. With one of the sparse matrix methods, larger numbers of observations can be handled, but the interval=
argument should be set. The implementation is GLS using the single spatial coefficient value, here termed lambda, found by line search using optimize
to maximise the log likelihood.
Usage
spautolm(formula, data = list(), listw, weights,
na.action, family = "SAR", method="eigen", verbose = NULL, trs=NULL,
interval=NULL, zero.policy = NULL, tol.solve=.Machine$double.eps,
llprof=NULL, control=list())
# S3 method for class 'Spautolm'
summary(object, correlation = FALSE, adj.se=FALSE,
Nagelkerke=FALSE, ...)
Arguments
- formula
a symbolic description of the model to be fit. The details of model specification are given for
lm()
- data
an optional data frame containing the variables in the model. By default the variables are taken from the environment which the function is called.
- listw
a
listw
object created for example bynb2listw
- weights
an optional vector of weights to be used in the fitting process
- na.action
a function (default
options("na.action")
), can also bena.omit
orna.exclude
with consequences for residuals and fitted values - in these cases the weights list will be subsetted to remove NAs in the data. Note that only weights lists created without using the glist argument tonb2listw
may be subsetted.- family
character string: either
"SAR"
or"CAR"
for simultaneous or conditional autoregressions;"SMA"
for spatial moving average added thanks to Jielai Ma -"SMA"
is only implemented for method="eigen"
because it necessarily involves dense matrices- method
character string: default
"eigen"
for use of dense matrices,"Matrix_J"
for sparse matrices (restricted to spatial weights symmetric or similar to symmetric) using methods in the Matrix package; “Matrix” provides updating Cholesky decomposition methods. Values of method may also include "LU", which provides an alternative sparse matrix decomposition approach, and the "Chebyshev" and Monte Carlo "MC" approximate log-determinant methods.- verbose
default NULL, use global option value; if TRUE, reports function values during optimization.
- trs
default NULL, if given, a vector of powered spatial weights matrix traces output by
trW
; when given, used in some Jacobian methods- interval
search interval for autoregressive parameter when not using method="eigen"; default is c(-1,0.999),
optimize
will reset NA/NaN to a bound and gives a warning when the interval is poorly set; method="Matrix" will attempt to search for an appropriate interval, if find_interval=TRUE (fails on some platforms)- zero.policy
default NULL, use global option value; Include list of no-neighbour observations in output if TRUE — otherwise zero.policy is handled within the listw argument
- tol.solve
the tolerance for detecting linear dependencies in the columns of matrices to be inverted - passed to
solve()
(default=double precision machine tolerance). Errors insolve()
may constitute indications of poorly scaled variables: if the variables have scales differing much from the autoregressive coefficient, the values in this matrix may be very different in scale, and inverting such a matrix is analytically possible by definition, but numerically unstable; rescaling the RHS variables alleviates this better than setting tol.solve to a very small value- llprof
default NULL, can either be an integer, to divide the feasible range into llprof points, or a sequence of spatial coefficient values, at which to evaluate the likelihood function
- control
list of extra control arguments - see section below
- object
Spautolm
object fromspautolm
- correlation
logical; if 'TRUE', the correlation matrix of the estimated parameters is returned and printed (default=FALSE)
- adj.se
if TRUE, adjust the coefficient standard errors for the number of fitted coefficients
- Nagelkerke
if TRUE, the Nagelkerke pseudo R-squared is reported
- ...
further arguments passed to or from other methods
Details
This implementation is based on lm.gls
and errorsarlm
. In particular, the function does not (yet) prevent asymmetric spatial weights being used with "CAR" family models. It appears that both numerical issues (convergence in particular) and uncertainties about the exact spatial weights matrix used make it difficult to reproduce Cressie and Chan's 1989 results, also given in Cressie 1993.
Note that the fitted() function for the output object assumes that the response variable may be reconstructed as the sum of the trend, the signal, and the noise (residuals). Since the values of the response variable are known, their spatial lags are used to calculate signal components (Cressie 1993, p. 564). This differs from other software, including GeoDa, which does not use knowledge of the response variable in making predictions for the fitting data.
Control arguments
- tol.opt:
the desired accuracy of the optimization - passed to
optimize()
(default=.Machine$double.eps^(2/3)
)- fdHess:
default NULL, then set to (method != "eigen") internally; use
fdHess
to compute an approximate Hessian using finite differences when using sparse matrix methods; used to make a coefficient covariance matrix when the number of observations is large; may be turned off to save resources if need be- optimHess:
default FALSE, use
fdHess
from nlme, if TRUE, useoptim
to calculate Hessian at optimum- optimHessMethod:
default “optimHess”, may be “nlm” or one of the
optim
methods- Imult:
default 2; used for preparing the Cholesky decompositions for updating in the Jacobian function
- super:
if NULL (default), set to FALSE to use a simplicial decomposition for the sparse Cholesky decomposition and method “Matrix_J”, set to
as.logical(NA)
for method “Matrix”, if TRUE, use a supernodal decomposition- cheb_q:
default 5; highest power of the approximating polynomial for the Chebyshev approximation
- MC_p:
default 16; number of random variates
- MC_m:
default 30; number of products of random variates matrix and spatial weights matrix
- type
default “MC”, used with method “moments”; alternatives “mult” and “moments”, for use if
trs
is missing,trW
- correct
default TRUE, used with method “moments” to compute the Smirnov/Anselin correction term
- trunc
default TRUE, used with method “moments” to truncate the Smirnov/Anselin correction term
- SE_method
default “LU”, may be “MC”
- nrho
default 200, as in SE toolbox; the size of the first stage lndet grid; it may be reduced to for example 40
- interpn
default 2000, as in SE toolbox; the size of the second stage lndet grid
- small_asy
default TRUE; if the method is not “eigen”, use asymmetric covariances rather than numerical Hessian ones if n <= small
- small
default 1500; threshold number of observations for asymmetric covariances when the method is not “eigen”
- SElndet
default NULL, may be used to pass a pre-computed SE toolbox style matrix of coefficients and their lndet values to the "SE_classic" and "SE_whichMin" methods
- LU_order
default FALSE; used in “LU_prepermutate”, note warnings given for
lu
method- pre_eig
default NULL; may be used to pass a pre-computed vector of eigenvalues
Value
A list object of class Spautolm
:
- fit
a list, with items:
- coefficients
ML coefficient estimates
- SSE
ML sum of squared errors
- s2
ML residual variance
- imat
ML coefficient covariance matrix (before multiplying by s2)
- signal_trend
non-spatial component of fitted.values
- signal_stochastic
spatial component of fitted.values
- fitted.values
sum of non-spatial and spatial components of fitted.values
- residuals
difference between observed and fitted values
- lambda
ML autoregressive coefficient
- LL
log likelihood for fitted model
- LL0
log likelihood for model with lambda=0
- call
the call used to create this object
- parameters
number of parameters estimated
- aliased
if not NULL, details of aliased variables
- method
Jacobian method chosen
- family
family chosen
- zero.policy
zero.policy used
- weights
case weights used
- interval
the line search interval used
- timings
processing timings
- na.action
(possibly) named vector of excluded or omitted observations if non-default na.action argument used
- llprof
if not NULL, a list with components lambda and ll of equal length
- lambda.se
Numerical Hessian-based standard error of lambda
- fdHess
Numerical Hessian-based variance-covariance matrix
- X
covariates used in model fitting
- Y
response used in model fitting
- weights
weights used in model fitting
References
Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion; Ord, J. K. 1975 Estimation methods for models of spatial interaction, Journal of the American Statistical Association, 70, 120-126; Waller, L. A., Gotway, C. A. 2004 Applied spatial statistics for public health, Wiley, Hoboken, NJ, 325-380; Cressie, N. A. C. 1993 Statistics for spatial data, Wiley, New York, 548-568; Ripley, B. D. 1981 Spatial statistics, Wiley, New York, 88-95; LeSage J and RK Pace (2009) Introduction to Spatial Econometrics. CRC Press, Boca Raton.
Author
Roger Bivand Roger.Bivand@nhh.no
Note
The standard errors given in Waller and Gotway (2004) are adjusted for the numbers of parameters estimated, and may be reproduced by using the additional argument adj.se=TRUE
in the summary
method. In addition, the function returns fitted values and residuals as given by Cressie (1993) p. 564.
Examples
require("sf", quietly=TRUE)
nydata <- st_read(system.file("shapes/NY8_bna_utm18.gpkg", package="spData")[1], quiet=TRUE)
if (FALSE) { # \dontrun{
lm0 <- lm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata)
summary(lm0)
lm0w <- lm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata, weights=POP8)
summary(lm0w)
} # }
suppressMessages(nyadjmat <- as.matrix(foreign::read.dbf(system.file(
"misc/nyadjwts.dbf", package="spData")[1])[-1]))
suppressMessages(ID <- as.character(names(foreign::read.dbf(system.file(
"misc/nyadjwts.dbf", package="spData")[1]))[-1]))
identical(substring(ID, 2, 10), substring(as.character(nydata$AREAKEY), 2, 10))
#> [1] TRUE
#require("spdep", quietly=TRUE)
listw_NY <- spdep::mat2listw(nyadjmat, as.character(nydata$AREAKEY), style="B")
eigs <- eigenw(listw_NY)
if (FALSE) { # \dontrun{
esar0 <- errorsarlm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata,
listw=listw_NY)
summary(esar0)
system.time(esar1f <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME,
data=nydata, listw=listw_NY, family="SAR", method="eigen",
control=list(pre_eig=eigs)))
res <- summary(esar1f)
print(res)
coef(res)
sqrt(diag(res$resvar))
sqrt(diag(esar1f$fit$imat)*esar1f$fit$s2)
sqrt(diag(esar1f$fdHess))
system.time(esar1M <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME,
data=nydata, listw=listw_NY, family="SAR", method="Matrix"))
summary(esar1M)
system.time(esar1M <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME,
data=nydata, listw=listw_NY, family="SAR", method="Matrix",
control=list(super=TRUE)))
summary(esar1M)
esar1wf <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata,
listw=listw_NY, weights=POP8, family="SAR", method="eigen",
control=list(pre_eig=eigs))
summary(esar1wf)
system.time(esar1wM <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME,
data=nydata, listw=listw_NY, weights=POP8, family="SAR", method="Matrix"))
summary(esar1wM)
esar1wlu <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata,
listw=listw_NY, weights=POP8, family="SAR", method="LU")
summary(esar1wlu)
esar1wch <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata,
listw=listw_NY, weights=POP8, family="SAR", method="Chebyshev")
summary(esar1wch)
} # }
ecar1f <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata,
listw=listw_NY, family="CAR", method="eigen",
control=list(pre_eig=eigs))
summary(ecar1f)
#>
#> Call:
#> spautolm(formula = Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data = nydata,
#> listw = listw_NY, family = "CAR", method = "eigen", control = list(pre_eig = eigs))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.539732 -0.384311 -0.030646 0.335126 3.808848
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.648362 0.181129 -3.5796 0.0003442
#> PEXPOSURE 0.077899 0.043692 1.7829 0.0745986
#> PCTAGE65P 3.703830 0.627185 5.9055 3.516e-09
#> PCTOWNHOME -0.382789 0.195564 -1.9574 0.0503053
#>
#> Lambda: 0.084123 LR test value: 5.8009 p-value: 0.016018
#> Numerical Hessian standard error of lambda: 0.030872
#>
#> Log likelihood: -275.8283
#> ML residual variance (sigma squared): 0.40758, (sigma: 0.63842)
#> Number of observations: 281
#> Number of parameters estimated: 6
#> AIC: 563.66
#>
if (FALSE) { # \dontrun{
system.time(ecar1M <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME,
data=nydata, listw=listw_NY, family="CAR", method="Matrix"))
summary(ecar1M)
} # }
ecar1wf <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data=nydata,
listw=listw_NY, weights=POP8, family="CAR", method="eigen",
control=list(pre_eig=eigs))
summary(ecar1wf)
#>
#> Call:
#> spautolm(formula = Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data = nydata,
#> listw = listw_NY, weights = POP8, family = "CAR", method = "eigen",
#> control = list(pre_eig = eigs))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.491042 -0.270906 0.081435 0.451556 4.198134
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.790154 0.144862 -5.4545 4.910e-08
#> PEXPOSURE 0.081922 0.028593 2.8651 0.004169
#> PCTAGE65P 3.825858 0.577720 6.6223 3.536e-11
#> PCTOWNHOME -0.386820 0.157436 -2.4570 0.014010
#>
#> Lambda: 0.022419 LR test value: 0.38785 p-value: 0.53343
#> Numerical Hessian standard error of lambda: 0.038977
#>
#> Log likelihood: -251.5711
#> ML residual variance (sigma squared): 1102.9, (sigma: 33.21)
#> Number of observations: 281
#> Number of parameters estimated: 6
#> AIC: NA (not available for weighted model)
#>
if (FALSE) { # \dontrun{
system.time(ecar1wM <- spautolm(Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME,
data=nydata, listw=listw_NY, weights=POP8, family="CAR", method="Matrix"))
summary(ecar1wM)
} # }
if (FALSE) { # \dontrun{
require("sf", quietly=TRUE)
nc.sids <- st_read(system.file("shapes/sids.gpkg", package="spData")[1], quiet=TRUE)
ft.SID74 <- sqrt(1000)*(sqrt(nc.sids$SID74/nc.sids$BIR74) +
sqrt((nc.sids$SID74+1)/nc.sids$BIR74))
lm_nc <- lm(ft.SID74 ~ 1)
sids.nhbr30 <- spdep::dnearneigh(cbind(nc.sids$east, nc.sids$north), 0, 30,
row.names=row.names(nc.sids))
sids.nhbr30.dist <- spdep::nbdists(sids.nhbr30, cbind(nc.sids$east, nc.sids$north))
sids.nhbr <- spdep::listw2sn(spdep::nb2listw(sids.nhbr30,
glist=sids.nhbr30.dist, style="B", zero.policy=TRUE))
dij <- sids.nhbr[,3]
n <- nc.sids$BIR74
el1 <- min(dij)/dij
el2 <- sqrt(n[sids.nhbr$to]/n[sids.nhbr$from])
sids.nhbr$weights <- el1*el2
sids.nhbr.listw <- spdep::sn2listw(sids.nhbr)
both <- factor(paste(nc.sids$L_id, nc.sids$M_id, sep=":"))
ft.NWBIR74 <- sqrt(1000)*(sqrt(nc.sids$NWBIR74/nc.sids$BIR74) +
sqrt((nc.sids$NWBIR74+1)/nc.sids$BIR74))
mdata <- data.frame(both, ft.NWBIR74, ft.SID74, BIR74=nc.sids$BIR74)
outl <- which.max(rstandard(lm_nc))
as.character(nc.sids$NAME[outl])
mdata.4 <- mdata[-outl,]
W <- spdep::listw2mat(sids.nhbr.listw)
W.4 <- W[-outl, -outl]
sids.nhbr.listw.4 <- spdep::mat2listw(W.4)
esarI <- errorsarlm(ft.SID74 ~ 1, data=mdata, listw=sids.nhbr.listw,
zero.policy=TRUE)
summary(esarI)
esarIa <- spautolm(ft.SID74 ~ 1, data=mdata, listw=sids.nhbr.listw,
family="SAR")
summary(esarIa)
esarIV <- errorsarlm(ft.SID74 ~ ft.NWBIR74, data=mdata, listw=sids.nhbr.listw,
zero.policy=TRUE)
summary(esarIV)
esarIVa <- spautolm(ft.SID74 ~ ft.NWBIR74, data=mdata, listw=sids.nhbr.listw,
family="SAR")
summary(esarIVa)
esarIaw <- spautolm(ft.SID74 ~ 1, data=mdata, listw=sids.nhbr.listw,
weights=BIR74, family="SAR")
summary(esarIaw)
esarIIaw <- spautolm(ft.SID74 ~ both - 1, data=mdata, listw=sids.nhbr.listw,
weights=BIR74, family="SAR")
summary(esarIIaw)
esarIVaw <- spautolm(ft.SID74 ~ ft.NWBIR74, data=mdata,
listw=sids.nhbr.listw, weights=BIR74, family="SAR")
summary(esarIVaw)
ecarIaw <- spautolm(ft.SID74 ~ 1, data=mdata.4, listw=sids.nhbr.listw.4,
weights=BIR74, family="CAR")
summary(ecarIaw)
ecarIIaw <- spautolm(ft.SID74 ~ both - 1, data=mdata.4,
listw=sids.nhbr.listw.4, weights=BIR74, family="CAR")
summary(ecarIIaw)
ecarIVaw <- spautolm(ft.SID74 ~ ft.NWBIR74, data=mdata.4,
listw=sids.nhbr.listw.4, weights=BIR74, family="CAR")
summary(ecarIVaw)
nc.sids$fitIV <- append(fitted.values(ecarIVaw), NA, outl-1)
plot(nc.sids[,"fitIV"], nbreaks=12) # Cressie 1993, p. 565
} # }
if (FALSE) { # \dontrun{
data(oldcol, package="spdep")
COL.errW.eig <- errorsarlm(CRIME ~ INC + HOVAL, data=COL.OLD,
spdep::nb2listw(COL.nb, style="W"))
summary(COL.errW.eig)
COL.errW.sar <- spautolm(CRIME ~ INC + HOVAL, data=COL.OLD,
spdep::nb2listw(COL.nb, style="W"))
summary(COL.errW.sar)
data(boston, package="spData")
gp1 <- spautolm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2)
+ I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT),
data=boston.c, spdep::nb2listw(boston.soi), family="SMA")
summary(gp1)
} # }