vignettes/tabularaster-usage.Rmd
tabularaster-usage.Rmd
The raster
package is extremely powerful in the R
ecosystem for spatial data. It can be used very efficiently to drive
data extraction and summary tools using its consistent cell-index and
comprehensive helper functions for converting between cell values and
less abstract raster grid properties.
Tabularaster provides some more helpers for working with cells and tries to fill some of the (very few!) gaps in raster functionality. When raster returns cell values of hierarchical objects it returns a hierarchical (list) of cells to match the input query.
Tabularaster provides:
cellnumbers()
: extraction of cells as a simple data
frame with “object ID” and “cell index”as_tibble()
: for raster data, with options for value
column and cell, dimension and date indexingdecimate()
: fast resolution-reduction without careindex_extent()
: build an index extent, basically a
raster Extent in row/column formThe raster function extentFromCells()
started life in
tabularaster.
There were some sf-features in early versions of tabularaster, but the raster package now supports that format (despite there being absolutely zero community development between the two worlds).
Extract the cell numbers of raster r
that are co-located
with object q
. (The argument names are x
and
query
).
cellnumbers(r, q)
In the above example, r
is any raster object
and q
is another spatial object, used as a query. Cell
numbers can be extracted from any raster object, any of a
raster::raster
, raster::stack
or
raster::brick
. It’s not really relevant what that object
contains, as only the dimensions (number of cells in x and y)
and the extent (geographic range in x and y) determine the
result. The r
object can actually not contain any data -
this is a very powerful but seemingly under-used feature of the
raster
package.
The object q
may be any of sf
,
sp
layer types or a matrix of raw coordinates (x-y).
Tabularaster follows the basic principles of tidy data and hypertidy data and aspires to keep the software design out of your way and simply help to get the task done.
In straightforward usage, cellnumbers
returns a tibble
with object_
to identify the spatial object by number, and
cell_
which is specific to the raster object, a function of
its extent
, dim
ensions and
projection
(crs - coordinate reference system).
## Loading required package: sp
library(tabularaster)
(r <- raster(volcano))
## class : RasterLayer
## dimensions : 87, 61, 5307 (nrow, ncol, ncell)
## resolution : 0.01639344, 0.01149425 (x, y)
## extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)
## crs : NA
## source : memory
## names : layer
## values : 94, 195 (min, max)
(cell <- cellnumbers(r, cbind(0.5, 0.5)))
## projections not the same
## x: NA
## query: NAFALSE
## # A tibble: 1 × 2
## object_ cell_
## <int> <dbl>
## 1 1 2654
This cell number query can be then be used to drive other raster
functions, like extract
and xyFromCell
and
many others.
xyFromCell(r, cell$cell_)
## x y
## [1,] 0.5 0.5
raster::extract(r, cell$cell_)
## [1] 161
This is an extremely efficient way to drive extractions from raster
objects, for performing the same query from multiple layers at different
times. It’s also very useful for using dplyr
to derive
summaries, rather than juggling lists of extracted values, or different
parts of raster objects.
There is an as_tibble
method with options for cell,
dimension, and date.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:raster':
##
## intersect, select, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
as_tibble(r)
## # A tibble: 5,307 × 2
## cellvalue cellindex
## <dbl> <int>
## 1 100 1
## 2 100 2
## 3 101 3
## 4 101 4
## 5 101 5
## 6 101 6
## 7 101 7
## 8 100 8
## 9 100 9
## 10 100 10
## # ℹ 5,297 more rows
## # A tibble: 10,614 × 3
## cellvalue cellindex dimindex
## <dbl> <int> <int>
## 1 100 1 1
## 2 100 2 1
## 3 101 3 1
## 4 101 4 1
## 5 101 5 1
## 6 101 6 1
## 7 101 7 1
## 8 100 8 1
## 9 100 9 1
## 10 100 10 1
## # ℹ 10,604 more rows
## # A tibble: 10,614 × 2
## cellvalue dimindex
## <dbl> <int>
## 1 200 2
## 2 200 2
## 3 202 2
## 4 202 2
## 5 202 2
## 6 202 2
## 7 202 2
## 8 200 2
## 9 200 2
## 10 200 2
## # ℹ 10,604 more rows
The date or date-time is used as the dimension index if present.
btime <- raster::setZ(b, Sys.time() + c(1, 10))
as_tibble(btime) %>% group_by(dimindex) %>% summarize(n = n())
## # A tibble: 2 × 2
## dimindex n
## <dttm> <int>
## 1 2024-01-10 21:15:54 5307
## 2 2024-01-10 21:16:03 5307
as_tibble(btime, split_date = TRUE)
## # A tibble: 10,614 × 3
## cellvalue cellindex dimindex
## <dbl> <int> <dttm>
## 1 100 1 2024-01-10 21:15:54
## 2 100 2 2024-01-10 21:15:54
## 3 101 3 2024-01-10 21:15:54
## 4 101 4 2024-01-10 21:15:54
## 5 101 5 2024-01-10 21:15:54
## 6 101 6 2024-01-10 21:15:54
## 7 101 7 2024-01-10 21:15:54
## 8 100 8 2024-01-10 21:15:54
## 9 100 9 2024-01-10 21:15:54
## 10 100 10 2024-01-10 21:15:54
## # ℹ 10,604 more rows
tidyr::extract
and
raster::extract
, dplyr::select
and
raster::select
as I always use these packages
together.cellnumbers
doesn’t currently reproject the second
argument query
, even when would make sense to do so like
extract
does. This is purely to reduce the required
dependencies.If you find that things don’t work, first check if it’s a namespace
problem, there are a few function name overlaps in the
tidyverse
and raster
, and in R generally.
There is no way to fix this properly atm.
Tabularaster doesn’t reproject on the fly, but might use the reproj package in future.
Ultimately the cell index vector should probably be a formal class,
with knowledge of its extent and grain. I’d love this to be formalized,
but I seem to not have the design expertise required to get the system
right. It’s something that ggplot2
needs, but there aren’t
any existing examples in R anywhere as far as I can tell. The stars project is a good
place to see what else is happening in this space in R. Other examples
are the unfinshed tbl_cube
in dplyr
, the R6
objects in velox
, and the mesh indexing used by packages
rgl
, Vcg
, icosa
,
dggridR
, deldir
, geometry
,
RTriangle
, TBA
, (and there are many
others).
If you are interested in these issues please get in touch, use the Issues tab or discuss at r-spatial, get on twitter #rstats or contact me directly.
This example uses extracted data per polygon and uses base R to lapply across the list of values extracted per polygon. Here we show a more dplyr-ish version after extracting the cell numbers with tabularaster.
library(tabularaster)
## https://gis.stackexchange.com/questions/102870/step-by-step-how-do-i-extract-raster-values-from-polygon-overlay-with-q-gis-or
library(raster)
# Create integer class raster
r <- raster(ncol=36, nrow=18)
r[] <- round(runif(ncell(r),1,10),digits=0)
# Create two polygons
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1),
Polygons(list(Polygon(cds2)), 2))),data.frame(ID=c(1,2)))
## do extraction in abstract terms
(cn <- cellnumbers(r, polys))
## projections not the same
## x: +proj=longlat +datum=WGS84
## query: NAFALSE
## cellnumbers is very slow for SpatialPolygons, consider conversion with 'sf::st_as_sf'
## # A tibble: 63 × 2
## object_ cell_
## <int> <dbl>
## 1 1 326
## 2 1 327
## 3 1 328
## 4 1 329
## 5 1 330
## 6 1 331
## 7 1 332
## 8 1 333
## 9 1 334
## 10 1 335
## # ℹ 53 more rows
library(dplyr)
## now perform extraction for real
## and pipe into grouping by polygon (object_) and value, and
## calculate class percentage from class counts per polygon
cn %>%
mutate(v = raster::extract(r, cell_)) %>%
group_by(object_, v) %>%
summarize(count = n()) %>%
mutate(v.pct = count / sum(count))
## `summarise()` has grouped output by 'object_'. You can override using the
## `.groups` argument.
## # A tibble: 17 × 4
## # Groups: object_ [2]
## object_ v count v.pct
## <int> <dbl> <int> <dbl>
## 1 1 1 1 0.0263
## 2 1 2 6 0.158
## 3 1 3 3 0.0789
## 4 1 4 2 0.0526
## 5 1 5 5 0.132
## 6 1 6 3 0.0789
## 7 1 7 7 0.184
## 8 1 8 4 0.105
## 9 1 9 6 0.158
## 10 1 10 1 0.0263
## 11 2 1 5 0.2
## 12 2 2 4 0.16
## 13 2 3 4 0.16
## 14 2 5 3 0.12
## 15 2 7 4 0.16
## 16 2 8 2 0.08
## 17 2 10 3 0.12
## here is the traditional code used in the stackoverflow example
# Extract raster values to polygons
#( v <- extract(r, polys) )
# Get class counts for each polygon
#v.counts <- lapply(v,table)
# Calculate class percentages for each polygon
#( v.pct <- lapply(v.counts, FUN=function(x){ x / sum(x) } ) )
library(tabularaster)
data("ghrsst") ## a RasterLayer
data("sst_regions") ## a polygon layer, contiguous with ghrsst
gcells <- cellnumbers(ghrsst, sst_regions) %>% mutate(object_ = as.integer(object_))
## cellnumbers is very slow for SpatialPolygons, consider conversion with 'sf::st_as_sf'
result <- gcells %>% mutate(sst = raster::extract(ghrsst, cell_)) %>%
group_by(object_) %>%
summarize_at(vars(sst), funs(mean(., na.rm = TRUE), sd(., na.rm = TRUE), length))
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
library(tabularaster)
library(raster)
library(dplyr)
data("rastercano")
data("polycano")
cells <- cellnumbers(rastercano, polycano[4:5, ])
## projections not the same
## x: NA
## query: NAFALSE
## cellnumbers is very slow for SpatialPolygons, consider conversion with 'sf::st_as_sf'
cellnumbers(rastercano, as(polycano[4:5, ], "SpatialLinesDataFrame"))
## # A tibble: 305 × 2
## object_ cell_
## <int> <int>
## 1 1 1129
## 2 1 1190
## 3 1 1251
## 4 2 1
## 5 2 2
## 6 2 3
## 7 2 4
## 8 2 5
## 9 2 6
## 10 2 7
## # ℹ 295 more rows
cellnumbers(rastercano, as(as(polycano[4:5, ], "SpatialLinesDataFrame"), "SpatialPointsDataFrame"))
## projections not the same
## x: NA
## query: NAFALSE
## # A tibble: 331 × 2
## object_ cell_
## <int> <dbl>
## 1 1 1129
## 2 2 1129
## 3 3 1251
## 4 4 1251
## 5 5 1129
## 6 6 1098
## 7 7 1098
## 8 8 1098
## 9 9 1098
## 10 10 1037
## # ℹ 321 more rows