ps.with_columns is a drop-in replacement for with_columns from polars that can handle some additional use cases like functions that need to peek at the full data for evaluation. It works efficiently on both DataFrame and LazyFrame.
Details
The key idea is FrameExpr — an expression that needs a peek at the data (schema or a small aggregation) before it resolves into a regular Polars expression. This unlocks operations like deriving Enum categories from the data, lumping rare levels, or reordering factor levels by a summary statistic, while keeping the rest of your pipeline lazy.
How FrameExpr stays efficient
ps.with_columns resolves each FrameExpr in two phases. First it runs a small aggregation (e.g. unique().sort() to discover categories) against the current lazy plan — so any preceding .filter() or .select() is already embedded and Polars’ predicate/projection pushdown keeps the peek cheap. Then it uses the result to build a concrete pl.Expr (e.g. .cast(pl.Enum(["a", "b", "c"]))) that goes back into the lazy plan and executes normally.
# Only the filtered rows are scanned for category discovery;# the cast itself remains lazy.lf = pl.scan_parquet("events.parquet")result = ( lf.filter(pl.col("country") =="DE") .ps.with_columns(pl.col("status").ps_enum.make()) .filter(pl.col("status") =="active") .collect())
See the FrameExpr docstring for the full explanation, including when the peek is larger and notes on parallel evaluation.
Dev Notes
To build the documentation run:
uv run quarto render
and then in a separate terminal
uv run quarto preview
To re-render the README.md run
quarto render README.qmd --to gfm
To upload to pypi run
uv build
uv publish
Acknowledgements
This package stands on the shoulders of several excellent projects:
The tidyverse team for establishing the tidy data philosophy and the vocabulary that shapes this package’s design.
Hadley Wickham and the forcats authors for the factor-manipulation functions that directly inspired the ps_enum namespace.
David Hugh-Jones for santoku, which inspired the ps_chop functions.
Allison Horst, Alison Hill, and Kristen Gorman for the palmerpenguins dataset used in the examples and walkthrough.