Published on: 4th October 2022
Breaking your bad habits with Polars
This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters
One comment we get on the Polars discourse is that the Polars syntax encourages people to break bad habits they developed in Pandas.
Take the .apply (or .applymap) function for example. I see lots of people using this in Kaggle comps, even though it’s bad news.
In this example we want to map positive values to 1 and negative values to -1 for all columns.
Using the standard pl.when method in Polars is 100x faster than an apply method in Pandas*
*Further optimizations are available on this toy problem in both libraries!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import polars as pl
import numpy as np
# Create a random DataFrame
N = 100_000
dfNumeric = pl.DataFrame(np.random.standard_normal((N,100)))
dfp = dfNumeric.to_pandas()
# Set values to 1 when they are positive and 0 otherwise
(
dfp
.applymap(lambda x: 1 if x > 0 else 0)
)
# Time: 2.5 seconds
(
dfNumeric
.with_columns(
[
pl.when(pl.col(col) > 0).then(1).otherwise(0).alias(col) for col in df.columns
]
)
)
# Time: 30 milliseconds
The shift away from .apply functions happened for me as well.
In Pandas I used to call .apply fairly often, but the only time I’ve used .apply in Polars was…when writing the docs to tell people not to use .apply!
Learn more
Want to know more about Polars for high performance data science and ML? Then you can:
- check out my Polars course on Udemy
- follow me on bluesky
- follow me on twitter
- connect with me at linkedin
- check out my youtube videos
or let me know if you would like a Polars workshop for your organisation.