Find nearest neighbors on the WGS84 ellipsoid using geodesic distance. These functions build an efficient vantage-point tree index for fast repeated queries.

geodesic_nn(dataset, query, k = 1L)

geodesic_nn_radius(dataset, query, radius)

Arguments

dataset

A matrix or vector of coordinates (lon, lat) for the dataset points. For a matrix, each row is a point. For a vector, it should be c(lon, lat) for a single point.

query

A matrix or vector of coordinates (lon, lat) for the query points. Same format as dataset.

k

Integer. The number of nearest neighbors to find.

radius

Numeric. The search radius in meters.

Value

For geodesic_nn(): A list with two matrices:

  • index: Integer matrix (k x n_queries) of 1-based indices into dataset

  • distance: Numeric matrix (k x n_queries) of geodesic distances in meters

For geodesic_nn_radius(): A list of data frames, one per query point, each containing:

  • index: Integer vector of 1-based indices into dataset

  • distance: Numeric vector of geodesic distances in meters

Details

These functions use the GeographicLib NearestNeighbor class, which implements a vantage-point tree optimized for geodesic distance calculations on the WGS84 ellipsoid.

The vantage-point tree provides O(log n) search complexity after O(n log n) construction time. For repeated queries against the same dataset, this is much more efficient than computing all pairwise distances.

Distances are computed using the exact geodesic inverse formula, not approximations like Haversine or Vincenty.

Examples

# Create a dataset of cities
cities <- cbind(
  lon = c(151.21, 144.96, 153.03, 115.86, 138.60),
  lat = c(-33.87, -37.81, -27.47, -31.95, -34.93)
)
rownames(cities) <- c("Sydney", "Melbourne", "Brisbane", "Perth", "Adelaide")

# Find 2 nearest neighbors for each city (including itself)
result <- geodesic_nn(cities, cities, k = 2)
result$index
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3    4    5
#> [2,]    2    5    1    5    2
result$distance
#>          [,1]     [,2]     [,3]    [,4]     [,5]
#> [1,]      0.0      0.0      0.0       0      0.0
#> [2,] 713809.5 653904.1 730609.3 2135437 653904.1

# Query points not in the dataset
queries <- cbind(
  lon = c(149.13, 147.32),
  lat = c(-35.28, -42.88)
)
rownames(queries) <- c("Canberra", "Hobart")

geodesic_nn(cities, queries, k = 3)
#> $index
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    2    1
#> [3,]    3    5
#> 
#> $distance
#>          [,1]      [,2]
#> [1,] 246750.8  597549.7
#> [2,] 467066.8 1055999.9
#> [3,] 941828.4 1160986.1
#> 

# Find all cities within 1000 km
geodesic_nn_radius(cities, queries, radius = 1e6)
#> [[1]]
#>   index distance
#> 1     1 246750.8
#> 2     2 467066.8
#> 3     3 941828.4
#> 4     5 960368.7
#> 
#> [[2]]
#>   index distance
#> 1     2 597549.7
#>