Overview

cmemsarco provides cloud-native access to Copernicus Marine Service (CMEMS) Analysis-Ready Cloud-Optimized (ARCO) Zarr datasets. The package builds a catalog of GDAL-ready data source names, letting you go straight from URL to pixels without file downloads, directory listings, format, or tool wrangling.

library(cmemsarco)

# The bundled catalog
cmems_catalog_data
#> # A tibble: 1,731 × 13
#>    product_id       dataset_version_id timeChunked_url geoChunked_url native_url
#>    <chr>            <chr>              <chr>           <chr>          <chr>     
#>  1 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  2 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  3 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  4 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  5 NWSHELF_ANALYSI… cmems_mod_nws_bgc… NA              NA             https://s…
#>  6 NWSHELF_ANALYSI… cmems_mod_nws_bgc… NA              NA             https://s…
#>  7 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  8 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  9 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 10 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> # ℹ 1,721 more rows
#> # ℹ 8 more variables: dataset_id <chr>, version <chr>, timeChunked_gdal <chr>,
#> #   geoChunked_gdal <chr>, timeChunked_gdals3 <chr>, geoChunked_gdals3 <chr>,
#> #   timeChunked_s3 <chr>, geoChunked_s3 <chr>

The catalog

The catalog is built by walking the CMEMS STAC API. Each row represents a versioned dataset with URLs to Zarr stores in different formats:

Column Description
product_id CMEMS product identifier
dataset_id Dataset identifier (without version)
version 6-digit version (YYYYMM)
timeChunked_url HTTPS URL to timeChunked.zarr
geoChunked_url HTTPS URL to geoChunked.zarr
*_gdal GDAL DSN using /vsicurl/
*_gdals3 GDAL DSN using /vsis3/
*_s3 S3 URI (s3://bucket/path)

Use cmems_latest() to keep only the most recent version of each dataset, and cmems_arco_only() to drop datasets without Zarr URLs (static/native-only).

cmems_catalog_data |>
  cmems_arco_only() |>
  cmems_latest()
#> # A tibble: 1,056 × 13
#>    product_id       dataset_version_id timeChunked_url geoChunked_url native_url
#>    <chr>            <chr>              <chr>           <chr>          <chr>     
#>  1 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  2 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  3 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  4 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  5 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  6 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  7 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  8 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#>  9 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 10 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> # ℹ 1,046 more rows
#> # ℹ 8 more variables: dataset_id <chr>, version <chr>, timeChunked_gdal <chr>,
#> #   geoChunked_gdal <chr>, timeChunked_gdals3 <chr>, geoChunked_gdals3 <chr>,
#> #   timeChunked_s3 <chr>, geoChunked_s3 <chr>

Chunking strategies

CMEMS provides two Zarr stores for each dataset, optimised for different access patterns:

timeChunked (chunks: 1 × 720 × 512 in time × lat × lon)

  • One time step per chunk in the time dimension
  • Use for spatial queries: maps, regional extracts, spatial analysis
  • Efficient when you need a large area at one or few time steps

geoChunked (chunks: 138 × 32 × 64 in time × lat × lon)

  • Many time steps per chunk, small spatial footprint
  • Use for time series: point extraction, temporal analysis
  • Efficient when you need many time steps at one or few locations

Choosing the wrong chunking strategy means many more HTTP requests and slower performance.

URL formats

Each Zarr store is available in four formats. Use whichever suits your tooling:

Uses GDAL’s /vsicurl/ handler which works without any environment setup:

dsn <- cmems_catalog_data$timeChunked_gdal[1]
#> 'ZARR:"/vsicurl/https://s3.waw3-1.cloudferro.com/mdl-arco-time-045/..."'

# Works immediately with any GDAL-based tool
#vapour::vapour_raster_info(dsn)
#terra::rast(dsn)

*_gdals3 — S3 protocol

Uses GDAL’s /vsis3/ handler which requires cmems_setup() first to configure the AWS endpoint:

cmems_setup()  # Sets AWS_NO_SIGN_REQUEST=YES, AWS_S3_ENDPOINT=...

dsn <- cmems_catalog_data$timeChunked_gdals3[1L]
dsn
#> [1] "ZARR:\"/vsis3/mdl-arco-time-041/arco/NWSHELF_ANALYSISFORECAST_BGC_004_002/cmems_mod_nws_bgc_anfc_0.027deg-3D_P1D-m_202411/timeChunked.zarr\""

This may offer better performance in some cases due to S3-specific optimisations in GDAL.

*_s3 — S3 URI

Standard s3:// URIs for use with S3-aware tools:

uri <- cmems_catalog_data$timeChunked_s3[1]
uri
#> [1] "s3://mdl-arco-time-041/arco/NWSHELF_ANALYSISFORECAST_BGC_004_002/cmems_mod_nws_bgc_anfc_0.027deg-3D_P1D-m_202411/timeChunked.zarr"

*_url — raw HTTPS

The underlying HTTPS URLs, useful if you need to construct your own access pattern:

url <- cmems_catalog_data$timeChunked_url[1]
url
#> [1] "https://s3.waw3-1.cloudferro.com/mdl-arco-time-041/arco/NWSHELF_ANALYSISFORECAST_BGC_004_002/cmems_mod_nws_bgc_anfc_0.027deg-3D_P1D-m_202411/timeChunked.zarr"

Typical workflow

library(cmemsarco)

# Find your dataset
sla <- cmems_catalog_data |>
  dplyr::filter(grepl("SEALEVEL.*NRT", product_id)) |>
  cmems_latest()

# Grab the DSN (no setup needed)
dsn <- sla$timeChunked_gdal[1]
dsn
#> [1] "ZARR:\"/vsicurl/https://s3.waw3-1.cloudferro.com/mdl-arco-time-053/arco/SEALEVEL_EUR_PHY_L3_NRT_008_059/cmems_obs-sl_eur_phy-ssh_nrt_al-l3-duacs_PT1S_202311/timeChunked\""

Refreshing the catalog

The bundled catalog is a snapshot. To get the latest datasets:

#fresh <- cmems_catalog()

This walks the STAC API and takes a few minutes for all ~330 products.

Why this works

The CMEMS S3 buckets don’t allow LIST operations, but GDAL’s Zarr driver doesn’t need them. It reads /.zmetadata to understand the array structure, then fetches only the chunks required for your read operation. No directory listings, no full downloads—just the bytes you need.