Skip to content

Group by Expressions

Spatial polars has expressions (.intersection_all and .union_all) that are designed to be used in a group_by context.

Note

This example makes use the geodatasets python package to access some spatial data easily.

Calling geodatasets.get_path() will download data the specified data to the machine and return the path to the downloaded file. If the file has already been downloaded it will simply return the path to the file. See downloading and caching for further details.

To demonstrate the usage of these functions, in the cell below we'll use polars to group some polygon data based on the COUNTYFP10 column (an int column which holds a different code for each of the 9 different counties in the data). Then we'll compute the sum one column, count the number of rows in each group and compute the intersection of all the geometries of the polygons in the group, and show the data on a map.

Computing the union of all geometries in a group_by context
import geodatasets
from lonboard import Map
from palettable.colorbrewer.diverging import RdYlGn_11
import polars as pl

from spatial_polars import scan_spatial

nyc_earnings_grouped_layer = (
    scan_spatial(geodatasets.get_path("geoda.nyc_earnings"))  # (1)!
    .group_by(pl.col("COUNTYFP10"))  # (2)!
    .agg(
        pl.col("CE03_14").sum(),  # (3)!
        pl.len().alias("original_row_cnt"),  # (4)!
        pl.col("geometry").spatial.union_all(),  # (5)!
    )
    .collect(engine="streaming")  # (6)!
    .spatial.to_polygonlayer(  # (7)!
        auto_highlight=True,
        fill_cmap_col="CE03_14",
        fill_cmap_type="continuous",
        fill_cmap=RdYlGn_11,
    )
)

nyc_earnings_grouped_map = Map(layers=[nyc_earnings_grouped_layer])  # (8)!
nyc_earnings_grouped_map
  1. Scan the nyc_earnings dataset into a lazyframe.
  2. Group the data by the COUNTYFP10 column.
  3. Sum the values in the CE03_14
  4. Count the number of rows of the input data are in each group
  5. Use the union_all expression to union all the geometries of the polygons that belong to each county.
  6. Collect the query to create a dataframe
  7. Make a polygon layer symbolized by the summed CE03_14 values
  8. Display the layer on a lonboard map.