Handling spatial objects with geoparquet
Recently, I found out that the people working on Arrow’s Parquet format developed geoparquet, which has implementations in both R and Python. This means that you can now write and read spatial objects without giving up on the advantages of the Parquet format.
In R, you can run:
library(geoarrow, arrow, sf)
sf_object |> arrow::write_parquet(path_to_file)
sf_file <- arrow::read_parquet(path_to_file, as_data_frame = FALSE) |> sf::st_as_sf()
In both cases, the geoarrow
package handles the geometry
column contained in the sf
object. Not including as_data_frame = FALSE)
and sf::st_as_sf()
will most likely fail – apparently, this has to do with the way the sf
package processes the input file.
I was then able to import that file in Python by running:
import geoarrow.pyarrow as ga
import geoarrow.pyarrow.io
pa_table = ga.io.read_parquet(path_to_file)
gdf_object = ga.to_geopandas(pa_table)
which created a geopandas.geodataframe
object.