Published on: 5th September 2022
Don’t loop over columns in Polars
This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters
If you’re writing Polars code like this
1
2
for col in df.columns:
do stuff
then STOP!!!!
Instead, use expressions and then Polars will parallelise the loop over the columns for you. By looping explicitly in python you’re killing the parallelisation.
For example if we want to count the number of unique values in every column we do
1
df.select(pl.all().n_unique())
or if we wanted to count the number of unique values but only in string (Utf8) columns we do
1
df.select(pl.col(pl.Utf8)).select(pl.all().n_unique())
Doing it this way with expressions will will give you the 🚀 performance you expect!
Learn more
Want to know more about Polars for high performance data science? Then you can:
or let me know if you would like a Polars workshop for your organisation.