The goal of controlledburn is to rasterize without materializing any pixel values.
A very fast rasterization algorithm for polygons in fasterize does the following:
controlledburn avoids that last step. With these rasterization functions the return value is not a set of pixels or a side-effect of a materialized raster file but simply those row and column start and end indexes.
This is an expression of the cell abstraction wish item fasterize/issues/11.
Lines is working but still has some problems. For polygons it’s pretty good, see fasterize #6 for a remaining issue.
Internal function burn_polygon()
has arguments sf
, extent
, dimension
. The first is a sf polygons data frame, extent is c(xmin, xmax, ymin, ymax)
and dimension is c(ncol, nrow)
. In fasterize you need an actual non-materialized RasterLayer object, but all that was really used for was the six numbers extent, dimension and as the shell for the in-memory output.
The output of burn_polygon()
is a list of four-element indexes start,end,row,poly_id
- these are zero-based atm because they reflect the underlying C++ code. Examples shown here flatten this to a 4-column matrix (and add 1).
These options are still in play for what the interface/s could do:
I also found this real world example for which this is relevant, discussed in PROJ for very fast lookup for large non-materialized (highly compressed) grids by Thomas Knudsen:
https://github.com/OSGeo/PROJ/issues/1461#issuecomment-491501992
You can install the development version of controlledburn like so:
remotes::install_github("hypertidy/controlledburn")
This is a basic example, this is fast, and shows that it works.
pols <- silicate::inlandwaters
library(vaster)
#>
#> Attaching package: 'vaster'
#> The following object is masked from 'package:stats':
#>
#> ts
## define a raster (xmin, xmax, ymin, ymax), (ncol, nrow)
ext <- unlist(lapply(silicate::sc_vertex(pols), range))
dm <- c(50, 40)
r <- controlledburn:::burn_polygon(pols, extent = ext,
dimension = dm)
## our index is triplets of start,end,line where the polygon edge was detected -
## this essentially an rle by scanline of start,end polygon coverage
index <- matrix(unlist(r, use.names = F), ncol = 4L, byrow = TRUE) + 1 ## plus one because 0-index internally
## plot just the start and ends of each scanline detected
xy0 <- vaster::xy_from_cell(dm, ext, vaster::cell_from_row_col(dm, index[,c(3, 3)], index[,1:2]))
plot(xy0)
plot(silicate::PATH0(pols), add = TRUE)
## expand out to every cell
cr <- do.call(rbind, apply(index, 1,\(.x) cbind(seq(.x[1], .x[2]), .x[3], .x[4])))
xy <- vaster::xy_from_cell(dm, ext, vaster::cell_from_row_col(dm, cr[,2], cr[,1]))
plot(xy, pch = 19, cex = .3, col = palr::d_pal(cr[,3]))
plot(silicate::PATH0(pols), add = TRUE)
It scales to very large tasks, with small output.
dm <- c(500000, 400000)
system.time(r <- controlledburn:::burn_polygon(pols, extent = ext,
dimension = dm))
#> user system elapsed
#> 0.913 0.068 0.982
length(r)
#> [1] 989153
## consider a prod(dm) raster of type double (or even bool) obviously would compress again but why churn
pryr::object_size(r)
#> 71.22 MB
The following is inefficient, but shows that we get the right result.
dm <- c(500, 400)
system.time(r <- controlledburn:::burn_polygon(pols, extent = ext,
dimension = dm))
#> user system elapsed
#> 0.002 0.001 0.003
index <- matrix(unlist(r, use.names = F), ncol = 4L, byrow = TRUE) + 1 ## plus one because 0-index internally
## now go inefficient, this is every column,row index, then converted to cell, converted to xy
cr <- do.call(rbind, apply(index, 1, \(.x) cbind(seq(.x[1], .x[2]), .x[3], .x[4])))
xy <- vaster::xy_from_cell(dm, ext, vaster::cell_from_row_col(dm, cr[,2], cr[,1]))
plot(xy, pch = ".", col = palr::d_pal(cr[,3]))
Please note that the controlledburn project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.