Handling spatial objects with geoparquet

Recently, I found out that the people working on Arrow’s Parquet format developed geoparquet, which has implementations in both R and Python. This means that you can now write and read spatial objects without giving up on the advantages of the Parquet format.

In R, you can run:

library(geoarrow, arrow, sf)

sf_object |> arrow::write_parquet(path_to_file)

sf_file <- arrow::read_parquet(path_to_file, as_data_frame = FALSE) |> sf::st_as_sf()

In both cases, the geoarrow package handles the geometry column contained in the sf object. Not including as_data_frame = FALSE) and sf::st_as_sf() will most likely fail – apparently, this has to do with the way the sf package processes the input file.

I was then able to import that file in Python by running:

import geoarrow.pyarrow as ga
import geoarrow.pyarrow.io

pa_table = ga.io.read_parquet(path_to_file)
gdf_object = ga.to_geopandas(pa_table)

which created a geopandas.geodataframe object.