The rancid
package aims to simplify and systematize read-access to NetCDF in R.
The basic workflow hides the underlying details of calls to the NetCDF API.
Create an object that has a complete description of the file so that we can easily see the available variables (vars
) and dimensions (dims
), and perform queries that find the details we need in the form that we want, rather than just printed out on the screen.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rancid)
#>
#> Attaching package: 'rancid'
#> The following object is masked from 'package:dplyr':
#>
#> vars
ifile <- system.file("extdata", "S2008001.L3m_DAY_CHL_chlor_a_9km.nc", package = "rancid")
nc <- NetCDF(ifile)
## tidyverse steals a name again
rancid::vars(nc)
#> # A tibble: 2 x 18
#> name ndims natts prec units
#> <chr> <int> <int> <chr> <chr>
#> 1 chlor_a 2 12 float mg m^-3
#> 2 palette 2 0 unsigned byte
#> # ... with 13 more variables: longname <chr>, group_index <int>,
#> # storage <int>, shuffle <int>, compression <int>, unlim <lgl>,
#> # make_missing_value <lgl>, missval <dbl>, hasAddOffset <lgl>,
#> # addOffset <dbl>, hasScaleFact <lgl>, scaleFact <dbl>, id <dbl>
dims(nc)
#> # A tibble: 4 x 7
#> name len unlim group_index group_id id create_dimvar
#> <chr> <int> <lgl> <int> <int> <int> <lgl>
#> 1 lat 2160 FALSE 1 65536 0 TRUE
#> 2 lon 4320 FALSE 1 65536 1 TRUE
#> 3 rgb 3 FALSE 1 65536 2 FALSE
#> 4 eightbitcolor 256 FALSE 1 65536 3 FALSE
## perform a join of variable to dimension, keeping only the varname and id
rancid::vars(nc) %>% dplyr::filter(name == "chlor_a") %>% transmute(varname = name, id) %>% inner_join(nc$vardim, "id") %>% inner_join(dims(nc), c("dimids" = "id"))
#> # A tibble: 2 x 9
#> varname id dimids name len unlim group_index group_id
#> <chr> <dbl> <int> <chr> <int> <lgl> <int> <int>
#> 1 chlor_a 0 1 lon 4320 FALSE 1 65536
#> 2 chlor_a 0 0 lat 2160 FALSE 1 65536
#> # ... with 1 more variables: create_dimvar <lgl>
There is a complicated and incomplete suite of NetCDF support in R with some clear missing functionality. Here we document the available support and outline some directions for improvement.
NOTE All content here needs review in light of some changes on CRAN since December 2015.
More soon
There is a complex set of overlapping support - HDF5 and NetCDF-4 can in some ways read each others data sources, but neither can read HDF4 and the use of groups and compound types is generally low, at least in the R-community.
RNetCDF and ncdf apparently lack some features for NetCDF version 4 (though it does build against it, including features for HDF4, HDF5, Thredds/OpenDAP). There is currently (2014-12-08) no CRAN package for HDF, there have been h5r, rhdf5, and some packages use it internally (RcppArmadillo, others?), and there has been Windows binary support of some of these. rhdf5 is available on Bioconductor (and maybe others?)
rgdal2 is another complication, not yet on CRAN and with no support for building on Windows - it requires gdal-config installed, and so it might work if GDAL was built from source with all utilities using MingW. (?)
NetCDF on CRAN is stuck at version 3, only ncdf and RNetCDF are integrated with the “win-builder” on CRAN, so they get identical binary versions of the library. GDAL on CRAN does not include NetCDF (either version 3 or 4, or HDF4 or HDF5)
OSGeo4W provides binaries for NetCDF4, HDF4, HDF5, OpenDAP Thredds but rgdal cannot be easily built with these (*need details about the compiler/s used for OSGeo4W), it can all be done with MinGW but the final packaging on Windows to R is done via cross-compilation for CRAN.
Utilities ncdump and vdp allow “dumping” of files to either text or binary format, which provides a workaround as do tools like the GDAL utilities, but the aim here is for tight coupling to make things simpler and more flexible in R directly.
ncdf4 has author-hosted Windows binaries, but these do not currently support compound types.
RNetCDF has been forked for compound types by Bertran Brelier, but this package does not provide documentation and is not synchronized with a newer release of RNetCDF.
These are features and tasks that I want done.