Two geometry column input expressions
Expressions involving two geometry columns
Spatial polars has many expressions that can involve two geometry columns from the frame where we want to compute the results of the expression row wise between the geometries in two columns. Because polars expressions only operate on one column, both geometry columns need to first be added to a struct before we call the spatial expression to perform the computation.
Note
This example makes use the geodatasets python package to access some spatial data easily.
Calling geodatasets.get_path()
will download data the specified data to the machine and return the path to the downloaded file. If the file has already been downloaded it will simply return the path to the file.
See downloading and caching for further details.
To demonstrate, we'll start with our lake_boundary_df dataframe from the spatial join example
import polars as pl
from spatial_polars import scan_spatial
lake_df = (
scan_spatial("https://naciscdn.org/naturalearth/110m/physical/ne_110m_lakes.zip")
.select("name", "geometry")
.collect(engine="streaming")
)
print(f"There are {len(lake_df)} rows in lake_df")
boundary_df = (
scan_spatial(
"https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip"
)
.select("SOVEREIGNT", "geometry")
.collect(engine="streaming")
)
lake_boundary_df = (
lake_df.spatial.join(
other=boundary_df,
how="inner",
predicate="intersects",
on="geometry",
suffix="_boundary",
)
.select(
pl.col("name"),
pl.col("SOVEREIGNT"),
pl.col("geometry"),
pl.col("geometry_boundary"),
)
.sort("name")
)
print(lake_boundary_df)
Note
- For details about what's happening here see spatial join example
Currently in the lake_boundary_df, the lakes which cross a boundary are represented by two rows. Each row has a column "geometry" with the geometry of the lake, and a column "geometry_boundary" with the geometry of the boundary for the different boundaries. There is no differentiation of which portion of the lake is in which boundary.
To determine which part of the lake is in which boundary, we can use the .intersection() expression to determine the portion of the lake that intersects the boundary. If we wanted to find the intersection of ALL the lakes to a single other polygon we could use the other
parameter of the .intersection()
method similar to how we used .distance() in the geometry column and scalar geometry input expression, but since we want to know where the lakes from the geometry column intersect the boundary from the geometry_boundary column in a row-wise manner, we will add both the geometry and geometry_boundary columns to a struct and ignore the other
parameter. Spatial polars will then compute the intersection of the lake with the boundary and return the geometry of the lake which intersects the geometry of the boundary for each row, essentially cutting the lakes where they cross the bounary.
lake_boundary_map = (
lake_boundary_df.filter(
pl.col("SOVEREIGNT").is_in(["United States of America", "Canada"]) # (1)!
)
.with_columns(
pl.struct(
pl.col("geometry"),
pl.col("geometry_boundary"), # (2)!
).spatial.intersection() # (3)!
)
.drop("geometry_boundary") # (4)!
.spatial.viz("geometry", polygon_kwargs={"auto_highlight": True}) # (5)!
)
lake_boundary_map
- Filter the dataframe to the rows that are in USA and Canada (just to make the map we'll produce later show up in an area where we have a lot of lakes crossing a boundary)
- Add the 'geometry' and 'geometry_boundary' columns to a struct
-
Use the .spatial.intersection expression on the struct to compute a polygon that is the common area of the lake's geometry and the boundary's geometry for each set of lake/boundary row-wise
Note
Becasue we added the 'geometry' column to the struct first and, did not alias the result of the expression, the result of the expression will overwrite the data in the geometry column. in this case that's what we want, but if you need to preserve the original geometry column, you will need to alias the result of the expression.
-
Drop the geometry_boundary column (we dont need it anymore)
- Pass the dataframe to the .viz() function with
auto_highlight=True
so when we move our mouse over a polygon on the map it will change color