A lazy data frame for GDAL drawings ('vector data sources'). lazysf is DBI compatible and designed to work with dplyr. It should work with any data source (file, url, connection string) readable by the sf package function sf_read.

lazysf(x, layer, ...)

# S3 method for character
lazysf(x, layer, ..., query = NA)

# S3 method for SFSQLConnection
lazysf(x, layer, ..., query = NA)

Arguments

x

the data source name (file path, url, or database connection string

layer

layer name (varies by driver, may be a file name without extension); in case layer is missing, st_read will read the first layer of dsn, give a warning and (unless quiet = TRUE) print a message when there are multiple layers, or give an error if there are no layers in dsn. If dsn is a database connection, then layer can be a table name or a database identifier (see Id). It is also possible to omit layer and rather use the query argument.

...

ignored

query

SQL query to pass in directly

Value

a 'tbl_SFSQLConnection', extending 'tbl_lazy' (something that works with dplyr verbs, and only shows a preview until you commit the result via collect()) see Details

Details

Lazy means that the usual behaviour of reading the entirety of a data source into memory is avoided. Printing the output results in a preview query being run and displayed (the top few rows of data).

The output of lazysf() is a 'tbl_SFSQLConnectionthat extendstbl_dbi` and may be used with functions and workflows in the normal DBI way, see SFSQL() for the lazysf DBI support.

The kind of q uery that may be run will depend on the type of format, see the list on the GDAL vector drivers page. For some details see the GDALSQL vignette.

When dplyr is attached the lazy data frame can be used with the usual verbs verbs (filter, select, distinct, mutate, transmute, arrange, left_join, pull, collect etc.). To see the result as a SQL query rather than a data frame preview use dplyr::show_query().

To obtain an in memory data frame use an explict collect() or st_as_sf(). A call to collect() is triggered by st_as_sf() and will add the sf class to the output. A result may not contain a geometry column, and so cannot be convert to an sf data frame. Using collect() on its own returns an unclassed data.frame and may include a classed sfc geometry column.

As well as collect() it's also possible to use tibble::as_tibble() or as.data.frame() or pull() which all force computation and retrieve the result.

Examples

# online sources can work geojson <- file.path("https://raw.githubusercontent.com/SymbolixAU", "geojsonsf/master/inst/examples/geo_melbourne.geojson") # \donttest{ lazysf(geojson)
#> # Source: table<geo_melbourne> [?? x 8] #> # Database: SFSQLConnection #> SA2_NAME polygonId SA3_NAME AREASQKM fillColor strokeColor strokeWeight #> <chr> <int> <chr> <dbl> <chr> <chr> <int> #> 1 Abbotsf… 70 Yarra 1.74 #440154 #440154 1 #> 2 Albert … 59 Port Ph… 4.67 #450457 #450457 1 #> 3 Alphing… 41 Darebin… 2.89 #46075A #46075A 1 #> 4 Armadale 66 Stonnin… 2.18 #460A5D #460A5D 1 #> 5 Ascot V… 44 Essendon 3.84 #460C5F #460C5F 1 #> 6 Brunswi… 36 Brunswi… 5.14 #472D7B #472D7B 1 #> 7 Brunswi… 37 Brunswi… 2.17 #472D7B #472D7B 1 #> 8 Brunswi… 38 Brunswi… 3.18 #472E7C #472E7C 1 #> 9 Carlton 48 Melbour… 1.82 #443A83 #443A83 1 #> 10 Carlton… 71 Yarra 2.30 #443A83 #443A83 1 #> # … with more rows, and 1 more variable: `_ogr_geometry_` <POLYGON [°]>
# } ## normal file stuff ## (Geopackage is an actual database so with SELECT we must be explicit re geom-column) f <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE) lazysf(f)
#> # Source: table<nc.gpkg> [?? x 16] #> # Database: SFSQLConnection #> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74 #> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> #> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10 #> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10 #> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208 #> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123 #> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066 #> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954 #> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115 #> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254 #> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748 #> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160 #> # … with more rows, and 4 more variables: BIR79 <dbl>, SID79 <dbl>, #> # NWBIR79 <dbl>, geom <MULTIPOLYGON [°]>
lazysf(f, query = "SELECT AREA, FIPS, geom FROM \"nc.gpkg\" WHERE AREA < 0.1")
#> # Source: SQL [?? x 3] #> # Database: SFSQLConnection #> AREA FIPS geom #> <dbl> <chr> <MULTIPOLYGON [°]> #> 1 0.061 37005 (((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504, -… #> 2 0.07 37053 (((-76.00897 36.3196, -76.01735 36.33773, -76.03288 36.33598, -7… #> 3 0.097 37091 (((-76.74506 36.23392, -76.98069 36.23024, -76.99475 36.23558, -… #> 4 0.062 37029 (((-76.00897 36.3196, -75.95718 36.19377, -75.98134 36.16973, -7… #> 5 0.091 37073 (((-76.56251 36.34057, -76.60424 36.31498, -76.64822 36.31532, -… #> 6 0.072 37181 (((-78.49252 36.17359, -78.51472 36.17522, -78.51709 36.46148, -… #> 7 0.053 37139 (((-76.29893 36.21423, -76.32423 36.23362, -76.37242 36.25235, -… #> 8 0.081 37189 (((-81.80622 36.10456, -81.81715 36.10939, -81.82231 36.15786, -… #> 9 0.063 37143 (((-76.48053 36.07979, -76.53696 36.08792, -76.5756 36.10266, -7… #> 10 0.044 37041 (((-76.68874 36.29452, -76.64822 36.31532, -76.60424 36.31498, -… #> # … with more rows
lazysf(f, layer = "nc.gpkg") %>% dplyr::select(AREA, FIPS, geom) %>% dplyr::filter(AREA < 0.1)
#> # Source: lazy query [?? x 3] #> # Database: SFSQLConnection #> AREA FIPS geom #> <dbl> <chr> <MULTIPOLYGON [°]> #> 1 0.061 37005 (((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504, -… #> 2 0.07 37053 (((-76.00897 36.3196, -76.01735 36.33773, -76.03288 36.33598, -7… #> 3 0.097 37091 (((-76.74506 36.23392, -76.98069 36.23024, -76.99475 36.23558, -… #> 4 0.062 37029 (((-76.00897 36.3196, -75.95718 36.19377, -75.98134 36.16973, -7… #> 5 0.091 37073 (((-76.56251 36.34057, -76.60424 36.31498, -76.64822 36.31532, -… #> 6 0.072 37181 (((-78.49252 36.17359, -78.51472 36.17522, -78.51709 36.46148, -… #> 7 0.053 37139 (((-76.29893 36.21423, -76.32423 36.23362, -76.37242 36.25235, -… #> 8 0.081 37189 (((-81.80622 36.10456, -81.81715 36.10939, -81.82231 36.15786, -… #> 9 0.063 37143 (((-76.48053 36.07979, -76.53696 36.08792, -76.5756 36.10266, -7… #> 10 0.044 37041 (((-76.68874 36.29452, -76.64822 36.31532, -76.60424 36.31498, -… #> # … with more rows
## the famous ESRI Shapefile (not an actual database) ## so if we SELECT we must be ex shp <- lazysf(system.file("shape/nc.shp", package = "sf", mustWork = TRUE)) library(dplyr)
#> #> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’: #> #> filter, lag
#> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union
shp %>% filter(NAME %LIKE% 'A%') %>% mutate(abc = 1.3) %>% select(abc, NAME, `_ogr_geometry_`) %>% arrange(desc(NAME)) #%>% show_query()
#> # Source: lazy query [?? x 3] #> # Database: SFSQLConnection #> # Ordered by: desc(NAME) #> abc NAME `_ogr_geometry_` #> <dbl> <chr> <POLYGON [°]> #> 1 1.3 Avery ((-81.94135 35.95498, -81.9614 35.93922, -81.94495 35.91861, -… #> 2 1.3 Ashe ((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359, … #> 3 1.3 Anson ((-79.91995 34.80792, -80.32528 34.81476, -80.27512 35.19311, … #> 4 1.3 Allegha… ((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504, … #> 5 1.3 Alexand… ((-81.10889 35.7719, -81.12728 35.78897, -81.1414 35.82332, -8… #> 6 1.3 Alamance ((-79.24619 35.86815, -79.23799 35.83725, -79.54099 35.83699, …
## a multi-layer file system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE)
#> [1] "/Users/runner/work/_temp/Library/lazysf/extdata/multi.gpkg"