This vignetted describes how simple features can be read in R from files or databases, and how they can be converted to other formats (text, sp)

Reading and writing through GDAL

The Geospatial Data Abstraction Library (GDAL) is the swiss army knife for spatial data: it reads and writes vector and raster data from and to practically every file format, or database, of significance. Package sf reads and writes using GDAL by the functions st_read and st_write.

The data model GDAL uses needs

  • a data source, which may be a file, directory, or database
  • a layer, which is a single geospatial dataset inside a file or directory or e.g. a table in a database.
  • the specification of a driver (i.e., which format)
  • driver-specific reading or writing data sources, or layers

This may sound complex, but it is needed to map to over 200 data formats! Package sf tries hard to simplify this where possible (e.g. a file contains a single layer), but this vignette will try to point you to the options.

Using st_read

As an example, we read the North Carolina counties SIDS dataset, which comes shipped with the sf package by:

Typical users will use a file name with path for fname, or first set R’s working directory with setwd() and use file name without path.

We see here that a single argument is used to find both the datasource and the layer. This works when the datasource contains a single layer. In case the number of layers is zero (e.g. a database with no tables), an error message is given. In case there are more layers than one, the first layer is returned, but a message and a warning are given:

The message points to the st_layers command, which lists the driver and layers in a datasource, e.g.

A particular layer can now be read by e.g.

st_layers has the option to count the number of features in case these are missing: some datasources (e.g. OSM xml files) do not report the number of features, but need to be completely read for this. GDAL allows for more than one geometry column for a feature layer; these are reported by st_layers.

In case a layer contains only geometries but no attributes (fields), st_read still returns an sf object, with a geometry column only.

We see that GDAL automatically detects the driver (file format) of the datasource, by trying them all in turn.

st_read follows the conventions of base R, similar to how it reads tabular data into data.frames. This means that character data are read, by default as factors. For those who insist on retrieving character data as character vectors, the argument stringsAsFactors can be set to FALSE:

Alternatively, a user can set the global option stringsAsFactors, and this will have the same effect:

Using st_write

To write a simple features object to a file, we need at least two arguments, the object and a filename:

The file name is taken as the data source name. The default for the layer name is the basename (filename without path) of the the data source name. For this, st_write needs to guess the driver. The above command is, for instance, equivalent to:

How the guessing of drivers works is explained in the next section.

Guessing a driver for output

The output driver is guessed from the datasource name, either from its extension (.shp: ESRI Shapefile), or its prefix (PG:: PostgreSQL). The list of extensions with corresponding driver (short driver name) is:

extension driver short name
bna BNA
csv CSV
e00 AVCE00
gdb FileGDB
geojson GeoJSON
gml GML
gmt GMT
gpkg GPKG
gps GPSBabel
gtm GPSTrackMaker
gxt Geoconcept
jml JML
map WAsP
mdb Geomedia
nc netCDF
ods ODS
osm OSM
pbf OSM
shp ESRI Shapefile
sqlite SQLite
vdv VDV
xls xls
xlsx XLSX

The list with prefixes is:

prefix driver short name
couchdb: CouchDB
DB2ODBC: DB2ODBC
DODS: DODS
GFT: GFT
MSSQL: MSSQLSpatial
MySQL: MySQL
OCI: OCI
ODBC: ODBC
PG: PostgreSQL
SDE: SDE

Dataset and layer reading or creation options

Various GDAL drivers have options that influences the reading or writing process, for example what the driver should do when a table already exists in a database: append records to the table or overwrite it:

In case the table exists and the option is not specified, the driver will give an error. Driver-specific options are documented in the driver manual of gdal. Multiple options can be given by multiple strings in options.

For st_read, there is only options; for st_write, one needs to distinguish between dataset_options and layer_options, the first related to opening a dataset, the second to creating layers in the dataset.

Reading and writing directly to and from spatial databases

Package sf supports reading and writing from and to spatial databases using the DBI interface. So far, testing has mainly be done with PostGIS, other databases might work but may also need more work. An example of reading is:

We see here that in the second example a query is given. This query may contain spatial predicates, which could be a way to work through massive spatial datasets in R without having to read them completely in memory.

Similarly, tables can be written:

Here, the default table (layer) name is taken from the object name (meuse). Argument drop informs to drop (remove) the table before writing; logical argument binary determines whether to use well-known binary or well-known text when writing the geometry (where well-known binary is faster and lossless).

Conversion to other formats: WKT, WKB, sp

Conversion to and from well-known text

The usual form in which we see simple features printed is well-known text:

We can create these well-known text strings explicitly using st_as_text:

We can convert back from WKT by using st_as_sfc: