Overview
Frame helpers
Methods on the .ps namespace, available on both DataFrame and LazyFrame.
df.ps.with_columns
Like df.with_columns, but also accepts FrameExpr and multi-column selectors.
df.ps.select
Like df.select, but also accepts FrameExpr.
Expr.ps.apply
Apply a custom function with full LazyFrame context.
ps_enum — Enum column helpers
Methods on the .ps_enum expression namespace for working with categorical / Enum columns.
ps_chop — Binning helpers
Methods on the .ps_chop expression namespace for cutting a column into labelled intervals.
ps_str — String column helpers
Methods on the .ps_str expression namespace.
Internals
Building blocks for writing custom FrameExpr.
FrameExpr
An expression that requires a LazyFrame context to resolve into a list of pl.Expr.
df.ps.with_columns
df.ps.with_columns(
* exprs,
** named_exprs,
)
Like df.with_columns, but also accepts FrameExpr and multi-column selectors.
*exprs = ()
**named_exprs = {}
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.reorder(by= "weight" )
)
shape: (5, 2)
┌────────┬────────┐
│ animal ┆ weight │
│ --- ┆ --- │
│ enum ┆ f64 │
╞════════╪════════╡
│ dog ┆ 12.2 │
│ null ┆ 7.5 │
│ bird ┆ 0.5 │
│ cow ┆ 460.0 │
│ bird ┆ null │
└────────┴────────┘
df.ps.select
df.ps.select(
* exprs,
** named_exprs,
)
Like df.select, but also accepts FrameExpr.
*exprs = ()
**named_exprs = {}
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.select(pl.col("animal" ).ps_enum.make(), "weight" )
shape: (5, 2)
┌────────┬────────┐
│ animal ┆ weight │
│ --- ┆ --- │
│ enum ┆ f64 │
╞════════╪════════╡
│ dog ┆ 12.2 │
│ null ┆ 7.5 │
│ bird ┆ 0.5 │
│ cow ┆ 460.0 │
│ bird ┆ null │
└────────┴────────┘
Expr.ps.apply
Apply a custom function with full LazyFrame context.
fn Callable[[pl.LazyFrame, str], pl.Expr]
Called as fn(lf, col_name) → pl.Expr for each matched column.
Examples:
def center_scale(lf: pl.LazyFrame, col: str ) -> pl.Expr:
stats = lf.select(
pl.col(col).mean().alias("m" ), pl.col(col).std().alias("s" )
).collect()
m, s = stats["m" ][0 ], stats["s" ][0 ]
return ((pl.col(col) - m) / s).alias(col)
pl.DataFrame({"x" : [1.0 , 2.0 , 3.0 , 4.0 , 5.0 ]}).ps.with_columns(
pl.col("x" ).ps.apply (center_scale)
)
shape: (5, 1)
┌───────────┐
│ x │
│ --- │
│ f64 │
╞═══════════╡
│ -1.264911 │
│ -0.632456 │
│ 0.0 │
│ 0.632456 │
│ 1.264911 │
└───────────┘
import math
df = pl.DataFrame({
"doc_id" : [1 , 1 , 2 , 2 , 2 ],
"term" : ["cat" , "dog" , "cat" , "cat" , "bird" ],
})
def idf(lf: pl.LazyFrame, col: str ) -> pl.Expr:
n = lf.select(pl.col("doc_id" ).n_unique()).collect().item()
freq = lf.group_by(col).agg(
pl.col("doc_id" ).n_unique().alias("n" )
).collect()
scores = {r[col]: math.log(n / r["n" ]) for r in freq.iter_rows(named= True )}
return pl.col(col).replace_strict(
list (scores), list (scores.values()), return_dtype= pl.Float64
)
df.ps.with_columns(pl.col("term" ).ps.apply (idf).alias("idf" ))
shape: (5, 3)
┌────────┬──────┬──────────┐
│ doc_id ┆ term ┆ idf │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 │
╞════════╪══════╪══════════╡
│ 1 ┆ cat ┆ 0.0 │
│ 1 ┆ dog ┆ 0.693147 │
│ 2 ┆ cat ┆ 0.0 │
│ 2 ┆ cat ┆ 0.0 │
│ 2 ┆ bird ┆ 0.693147 │
└────────┴──────┴──────────┘
ps_enum — Enum column helpers
Expr.ps_enum.make
Expr.ps_enum.make(
categories= None ,
make_null= (),
)
Cast a string column to Enum, optionally deriving categories from the data.
categories Sequence[str] | None
Fixed set of allowed values. If omitted, derived from the data as the unique values in alphabetical order.
make_null Sequence[str] | str = ()
Values to replace with null before casting.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(pl.col("animal" ).ps_enum.make())
shape: (5, 2)
┌────────┬────────┐
│ animal ┆ weight │
│ --- ┆ --- │
│ enum ┆ f64 │
╞════════╪════════╡
│ dog ┆ 12.2 │
│ null ┆ 7.5 │
│ bird ┆ 0.5 │
│ cow ┆ 460.0 │
│ bird ┆ null │
└────────┴────────┘
pl.DataFrame({"x" : ["a" , "b" , "?" ]}).ps.with_columns(
pl.col("x" ).ps_enum.make(categories= ["a" , "b" , "z" ], make_null= "?" )
)['x' ].dtype
Enum(categories=['a', 'b', 'z'])
Expr.ps_enum.lump
Expr.ps_enum.lump(
n= 5 ,
other_label= 'Other' ,
lump_fn= None ,
)
Collapse infrequent categories into other_label.
By default keeps the top-n most frequent categories and collapses the rest. Pass lump_fn to use a custom rule instead (in which case n is ignored).
The order of the categories remains unchanged with other_label appended at the end.
The function also accepts a String or Categorical column, in addition to Enum, in which case ps_enum.make() is called first.
n int = 5
Number of categories to keep (ignored when lump_fn is provided).
other_label str = 'Other'
Label for the collapsed category.
lump_fn Callable[[pl.DataFrame], Iterable[bool]] | None
Optional callable that receives the non-null counts DataFrame (columns: category column + "n", sorted by frequency descending) and returns a boolean sequence where True marks categories to collapse.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.lump(n= 1 )
)
shape: (5, 2)
┌────────┬────────┐
│ animal ┆ weight │
│ --- ┆ --- │
│ enum ┆ f64 │
╞════════╪════════╡
│ Other ┆ 12.2 │
│ null ┆ 7.5 │
│ bird ┆ 0.5 │
│ Other ┆ 460.0 │
│ bird ┆ null │
└────────┴────────┘
Expr.ps_enum.relabel
Expr.ps_enum.relabel(
mapping,
strict= True ,
)
Rename categories, leaving any not present in the mapping unchanged.
The function also accepts a String or Categorical column, in addition to Enum, in which case ps_enum.make() is called first.
mapping Mapping[str, str] | Callable[[str], str]
A dict of old → new names, or a callable applied to each category name.
strict bool = True
If True (default), raise if any dict key is not an existing category.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.relabel({"bird" : "Bird" , "cow" : "Cow" })
)
shape: (5, 2)
┌────────┬────────┐
│ animal ┆ weight │
│ --- ┆ --- │
│ enum ┆ f64 │
╞════════╪════════╡
│ dog ┆ 12.2 │
│ null ┆ 7.5 │
│ Bird ┆ 0.5 │
│ Cow ┆ 460.0 │
│ Bird ┆ null │
└────────┴────────┘
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.relabel(str .upper)
)
shape: (5, 2)
┌────────┬────────┐
│ animal ┆ weight │
│ --- ┆ --- │
│ enum ┆ f64 │
╞════════╪════════╡
│ DOG ┆ 12.2 │
│ null ┆ 7.5 │
│ BIRD ┆ 0.5 │
│ COW ┆ 460.0 │
│ BIRD ┆ null │
└────────┴────────┘
Expr.ps_enum.rev
Reverse the order of categories.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.rev()
)["animal" ].dtype
Enum(categories=['dog', 'cow', 'bird'])
Expr.ps_enum.infreq
Expr.ps_enum.infreq(
descending= False ,
)
Reorder categories by frequency, most frequent first.
The function also accepts a String or Categorical column, in addition to Enum, in which case ps_enum.make() is called first.
descending bool = False
If True, least frequent first instead.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.infreq()
)["animal" ].dtype
Enum(categories=['bird', 'cow', 'dog'])
Expr.ps_enum.reorder
Expr.ps_enum.reorder(
by,
agg= pl.Expr.median,
descending= False ,
nulls_last= False ,
missing= 'drop' ,
)
Reorder categories by an aggregation of one or more columns within each group.
The function also accepts a String or Categorical column, in addition to Enum, in which case ps_enum.make() is called first.
by IntoExpr | Iterable[IntoExpr]
Column(s) to aggregate per category for ordering. Strings are treated as column names.
agg Callable[[pl.Expr], pl.Expr] = pl.Expr.median
Aggregation applied to each by column (default: median).
descending bool | Sequence[bool] = False
Sort descending. A single bool applies to all columns.
nulls_last bool | Sequence[bool] = False
Place null aggregates last. A single bool applies to all columns.
missing Literal['drop', 'last', 'first'] = 'drop'
How to handle categories whose aggregate is null — ‘drop’ excludes them, ‘last’ appends them, ‘first’ prepends them.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.reorder("weight" , agg= pl.Expr.mean)
)["animal" ].dtype
Enum(categories=['bird', 'dog', 'cow'])
Expr.ps_enum.set_categories
Expr.ps_enum.set_categories(
categories,
)
Set the exact category list. Values not in categories become null.
categories Sequence[str]
The new ordered category list.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.set_categories(["cow" , "dog" ])
)
shape: (5, 2)
┌────────┬────────┐
│ animal ┆ weight │
│ --- ┆ --- │
│ enum ┆ f64 │
╞════════╪════════╡
│ dog ┆ 12.2 │
│ null ┆ 7.5 │
│ null ┆ 0.5 │
│ cow ┆ 460.0 │
│ null ┆ null │
└────────┴────────┘
Expr.ps_enum.add_categories
Expr.ps_enum.add_categories(
categories,
after= float ('inf' ),
)
Insert new categories without changing any values.
categories Sequence[str]
New category labels to add.
after int | float = float('inf')
Insert after this 0-based index. Defaults to appending at the end.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
pl.col("animal" ).ps_enum.make().ps_enum.add_categories(["rabbit" ], after= 1 )
)["animal" ].dtype
Enum(categories=['bird', 'cow', 'rabbit', 'dog'])
Expr.ps_enum.drop_unused
Expr.ps_enum.drop_unused()
Remove categories that don’t appear in the data, preserving order.
Examples:
df = pl.DataFrame(
{'x' : pl.Series('x' , ['bird' , 'bird' ], dtype= pl.Enum(['fish' , 'bird' , 'cat' ]))}
)
df.ps.with_columns(pl.col("x" ).ps_enum.drop_unused())["x" ].dtype
Enum(categories=['bird'])
Expr.ps_enum.missing_to_category
Expr.ps_enum.missing_to_category(
name,
)
Convert null values into a new category name, appended at the end.
If name is already a category, null values are mapped to the existing category without modifying the category list.
name str
Label for the category to assign to null values.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
new_animals = pl.col("animal" ).ps_enum.make().ps_enum.missing_to_category("unknown" )
)
shape: (5, 3)
┌────────┬────────┬─────────────┐
│ animal ┆ weight ┆ new_animals │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ enum │
╞════════╪════════╪═════════════╡
│ dog ┆ 12.2 ┆ dog │
│ null ┆ 7.5 ┆ unknown │
│ bird ┆ 0.5 ┆ bird │
│ cow ┆ 460.0 ┆ cow │
│ bird ┆ null ┆ bird │
└────────┴────────┴─────────────┘
Expr.ps_enum.category_to_missing
Expr.ps_enum.category_to_missing(
name,
)
Convert all occurrences of one or more categories to null and remove them from the Enum.
name str | Sequence[str]
Category name(s) to nullify. Raises if any are not current categories.
Examples:
animals = polarstation.make_example_data("animals" )
animals.ps.with_columns(
new_animals = pl.col("animal" ).ps_enum.make().ps_enum.category_to_missing("bird" )
)
shape: (5, 3)
┌────────┬────────┬─────────────┐
│ animal ┆ weight ┆ new_animals │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ enum │
╞════════╪════════╪═════════════╡
│ dog ┆ 12.2 ┆ dog │
│ null ┆ 7.5 ┆ null │
│ bird ┆ 0.5 ┆ null │
│ cow ┆ 460.0 ┆ cow │
│ bird ┆ null ┆ null │
└────────┴────────┴─────────────┘
ps_chop — Binning helpers
Expr.ps_chop.chop
Expr.ps_chop.chop(
breaks,
labels= None ,
left_closed= True ,
fmt= None ,
extend= True ,
return_struct= False ,
)
Cut into intervals at explicit breakpoints.
Returns an Enum-typed column whose category names are the bin labels. Integer columns use fully-closed [a, b] notation; single-element bins are written as {x}.
breaks Sequence[Any]
Interior breakpoints; sorted automatically. Accepts numeric, string, or temporal Python values (datetime, date, timedelta, time).
labels Sequence[str] | None
Category labels (must be len(breaks) + 1). Auto-generated if omitted.
left_closed bool = True
If True (default), intervals are [lo, hi); otherwise (lo, hi].
fmt str | Callable | None
Formatter for auto-generated labels. For numeric, a format-spec string (e.g. “.2f”) or callable. For temporal, a callable or None (uses str()).
extend bool = True
For numeric only — if True (default), outermost labels extend to -∞/+∞. For unsigned integers: 0/+∞. If False, uses data min/max. Temporal breaks always use data bounds regardless of this setting.
return_struct bool = False
If True, return a struct {lo, hi} instead of just the label.
Examples:
scores = polarstation.make_example_data("scores" )
scores.ps.with_columns(
pl.col("score" ).ps_chop.chop([40 , 70 ], fmt= ".0f" ).alias("grade" )
)
shape: (7, 2)
┌───────┬──────────┐
│ score ┆ grade │
│ --- ┆ --- │
│ i64 ┆ enum │
╞═══════╪══════════╡
│ 12 ┆ (-∞, 39] │
│ 45 ┆ [40, 69] │
│ 67 ┆ [40, 69] │
│ 89 ┆ [70, +∞) │
│ 95 ┆ [70, +∞) │
│ 23 ┆ (-∞, 39] │
│ 78 ┆ [70, +∞) │
└───────┴──────────┘
Expr.ps_chop.width
Expr.ps_chop.width(
size,
start= None ,
labels= None ,
left_closed= True ,
fmt= None ,
extend= False ,
return_struct= False ,
)
Chop into equal-width bins of given size.
Returns an Enum-typed column whose category names are the bin labels.
size float | _dt.timedelta
Width of each bin. For numeric columns, a number. For temporal columns, a datetime.timedelta.
start Any | None
Left edge of the first bin. Defaults to the column minimum (or 0 for unsigned integer columns).
labels Sequence[str] | None
Category labels. Auto-generated if omitted.
left_closed bool = True
If True (default), intervals are [lo, hi); otherwise (lo, hi].
fmt str | Callable | None
Formatter for auto-generated labels. For numeric, a format-spec string or callable; defaults to “g”. For temporal, a callable or None (uses str()).
extend bool = False
If True, extend outermost labels to -∞ / +∞. If False (default), the first label opens at the anchor and the last closes at anchor + n_bins * size.
return_struct bool = False
If True, return a struct instead of just the label.
Examples:
scores = polarstation.make_example_data("scores" )
scores.ps.with_columns(pl.col("score" ).ps_chop.width(25 ).alias("band" ))
shape: (7, 2)
┌───────┬──────────┐
│ score ┆ band │
│ --- ┆ --- │
│ i64 ┆ enum │
╞═══════╪══════════╡
│ 12 ┆ [12, 36] │
│ 45 ┆ [37, 61] │
│ 67 ┆ [62, 86] │
│ 89 ┆ [87, 95] │
│ 95 ┆ [87, 95] │
│ 23 ┆ [12, 36] │
│ 78 ┆ [62, 86] │
└───────┴──────────┘
Expr.ps_chop.n_elements
Expr.ps_chop.n_elements(
n,
tail= 'split' ,
labels= None ,
left_closed= True ,
fmt= 'g' ,
extend= False ,
return_struct= False ,
)
Chop into groups of n observations each.
Returns an Enum-typed column whose category names are the bin labels. Boundaries are drawn after every nth element (sorted order). Ties are never split — the boundary advances to the next distinct value if needed.
n int
Number of observations per group.
tail Literal['split', 'merge'] = 'split'
What to do when the total doesn’t divide evenly. “split” (default) keeps the smaller final group; “merge” absorbs it into the preceding group.
labels Sequence[str] | None
Category labels. Auto-generated if omitted.
left_closed bool = True
If True (default), intervals are [lo, hi); otherwise (lo, hi].
fmt str | Callable[[float], str] = 'g'
Number formatter for auto-generated labels (numeric columns only).
extend bool = False
If True, extend outermost labels to -∞ / +∞ (or 0 / +∞ for unsigned integers). If False (default), the first label opens at the data minimum and the last closes at the data maximum.
return_struct bool = False
If True, return a struct instead of just the label.
Examples:
scores = polarstation.make_example_data("scores" )
scores.ps.with_columns(pl.col("score" ).ps_chop.n_elements(3 ).alias("tercile" ))
shape: (7, 2)
┌───────┬──────────┐
│ score ┆ tercile │
│ --- ┆ --- │
│ i64 ┆ enum │
╞═══════╪══════════╡
│ 12 ┆ [12, 66] │
│ 45 ┆ [12, 66] │
│ 67 ┆ [67, 94] │
│ 89 ┆ [67, 94] │
│ 95 ┆ {95} │
│ 23 ┆ [12, 66] │
│ 78 ┆ [67, 94] │
└───────┴──────────┘
Expr.ps_chop.n_groups
Expr.ps_chop.n_groups(
k,
labels= None ,
left_closed= True ,
fmt= None ,
raw= True ,
extend= False ,
return_struct= False ,
)
Chop into k equal-count groups (by quantile boundaries).
Returns an Enum-typed column whose category names are the bin labels.
k int
Number of groups.
labels Sequence[str] | None
Category labels (must be k). Auto-generated if omitted.
left_closed bool = True
If True (default), intervals are [lo, hi); otherwise (lo, hi].
fmt str | Callable | None
Formatter for auto-generated labels. For numeric, defaults to “g” when raw=True and “.0%” when raw=False. For temporal, a callable or None.
raw bool = True
If True (default), label with the actual break values. If False, use percentage labels (e.g. [0%, 25%)). Ignored for temporal columns.
extend bool = False
If True, extend outermost labels to -∞ / +∞ (only affects numeric raw=True). Default False. For unsigned columns, lower bound is 0.
return_struct bool = False
If True, return a struct instead of just the label.
Examples:
scores = polarstation.make_example_data("scores" )
scores.ps.with_columns(pl.col("score" ).ps_chop.n_groups(3 ).alias("tertile" ))
shape: (7, 2)
┌───────┬──────────┐
│ score ┆ tertile │
│ --- ┆ --- │
│ i64 ┆ enum │
╞═══════╪══════════╡
│ 12 ┆ [12, 44] │
│ 45 ┆ [45, 77] │
│ 67 ┆ [45, 77] │
│ 89 ┆ [78, 95] │
│ 95 ┆ [78, 95] │
│ 23 ┆ [12, 44] │
│ 78 ┆ [78, 95] │
└───────┴──────────┘
Expr.ps_chop.quantiles
Expr.ps_chop.quantiles(
probs,
labels= None ,
left_closed= True ,
fmt= None ,
raw= False ,
extend= False ,
return_struct= False ,
)
Chop at quantile boundaries.
Returns an Enum-typed column whose category names are the bin labels.
probs Sequence[float]
Quantile probabilities in (0, 1), e.g. [0.25, 0.5, 0.75] for quartiles.
labels Sequence[str] | None
Category labels (must be len(probs) + 1). Auto-generated if omitted.
left_closed bool = True
If True (default), intervals are [lo, hi); otherwise (lo, hi].
fmt str | Callable | None
Formatter for auto-generated labels. For numeric, defaults to “.0%” (percentages) when raw=False and “g” when raw=True. For temporal, a callable or None (uses str()).
raw bool = False
If True, label with the actual break values instead of percentages. Ignored for temporal columns (always uses actual values).
extend bool = False
If True, extend outermost labels to -∞ / +∞ (only affects numeric raw=True). Default False. For unsigned columns, lower bound is 0.
return_struct bool = False
If True, return a struct instead of just the label.
Examples:
scores = polarstation.make_example_data("scores" )
scores.ps.with_columns(
pl.col("score" ).ps_chop.quantiles([0.25 , 0.75 ]).alias("iqr_group" )
)
shape: (7, 2)
┌───────┬─────────────┐
│ score ┆ iqr_group │
│ --- ┆ --- │
│ i64 ┆ enum │
╞═══════╪═════════════╡
│ 12 ┆ [0%, 25%) │
│ 45 ┆ [25%, 75%) │
│ 67 ┆ [25%, 75%) │
│ 89 ┆ [75%, 100%] │
│ 95 ┆ [75%, 100%] │
│ 23 ┆ [0%, 25%) │
│ 78 ┆ [25%, 75%) │
└───────┴─────────────┘
ps_str — String column helpers
Expr.ps_str.count
Expr.ps_str.count(
pattern= '' ,
)
Count non-overlapping regex matches in each string.
Deprecated: thin wrapper around pl.Expr.str.count_matches; likely to be removed.
pattern = ''
Examples:
pl.DataFrame({"x" : ["hello world" , "foo bar baz" , "" ]}).select(
pl.col("x" ).ps_str.count(r" \b\w + \b " ).alias("word_count" )
)
shape: (3, 1)
┌────────────┐
│ word_count │
│ --- │
│ u32 │
╞════════════╡
│ 2 │
│ 3 │
│ 0 │
└────────────┘
Expr.ps_str.wrap
Expr.ps_str.wrap(
width= 80 ,
initial_indent= 0 ,
subsequent_indent= 0 ,
break_on_hyphens= True ,
** kwargs,
)
Wrap each string to at most width characters per line.
width int = 80
Maximum line length.
initial_indent int = 0
Number of spaces prepended to the first line.
subsequent_indent int = 0
Number of spaces prepended to every subsequent line.
break_on_hyphens bool = True
Allow breaks at hyphens in compound words.
**kwargs = {}
Examples:
text = pl.DataFrame({"x" : ["A long sentence that exceeds the column width." ]}).select(
pl.col("x" ).ps_str.wrap(width= 25 )
)['x' ].to_list()
text
['A long sentence that\nexceeds the column width.']
Expr.ps_str.trunc
Expr.ps_str.trunc(
width= 5 ,
side= 'right' ,
placeholder= '…' ,
)
Truncate each string to fit within width characters.
Collapses whitespace and appends placeholder when the text is cut.
width int = 5
Maximum length of the result, including the placeholder.
side Literal['right', 'left', 'center'] = 'right'
Which side to truncate — ‘right’ (default), ‘left’, or ‘center’.
placeholder str = '…'
String inserted where the text is cut.
Examples:
pl.DataFrame({"x" : ["short" , "a much longer string" ]}).select(
pl.col("x" ).ps_str.trunc(width= 10 )
)
shape: (2, 1)
┌────────────┐
│ x │
│ --- │
│ str │
╞════════════╡
│ short │
│ a much lo… │
└────────────┘
Internals
FrameExpr
FrameExpr(
col_expr,
resolver,
)
An expression that requires a LazyFrame context to resolve into a list of pl.Expr.
A plain pl.Expr is insufficient for operations like ps_enum.make() or ps_chop.chop() because Polars needs to know the output dtype (e.g. the exact pl.Enum([...]) category list) at plan-construction time — before any data is seen. FrameExpr defers that resolution to a two-phase execution model:
Phase 1 — peek ps.with_columns calls resolve(lf) with the current LazyFrame. The resolver runs a small aggregation (e.g. unique().sort() for category discovery, a handful of quantiles for binning) and collects it. Because the resolver receives the full lazy plan up to that point, any preceding .filter() or .select() calls are already embedded and Polars’ predicate/projection pushdown applies — only the relevant rows and columns are scanned.
Phase 2 — expression The resolver uses the aggregation result to construct a concrete pl.Expr with all dtype information baked in (e.g. pl.col("x").cast(pl.Enum(["a", "b", "c"]))). This expression is inserted back into the lazy plan and executed lazily together with all subsequent operations.
col_expr pl.Expr
resolver Callable[[pl.LazyFrame], list[pl.Expr]]