Published on: 13th September 2022
Polars can help if your data is sorted
This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters
Check out a video version of this post here!
Polars has optimizations for when you’re working with sorted data.
To access them you tell Polars the data is sorted with the set_sorted
flag.
In this simple example we find the median 1500x faster when we tell Polars the series is sorted.
1
2
3
4
5
6
7
8
9
10
# Create a series with 10 million entries
s = pl.Series("a", range(0,int(1e7)))
# Call .median without set_sorted
s.median()
# Time: 0.3 s
# Call .median with set_sorted
s.set_sorted().median()
# Time: 0.0002 s
You may already be taking advantage of set_sorted
without realising it. Polars will apply set_sorted automatically if you do any operations with an implicit or explicit sort.
set_sorted
also works with other operations - in some of my workflows a groupby
on a large dataset is 40% faster on a column that Polars knows is sorted.
Learn more
Want to know more about Polars for high performance data science and ML? Then you can:
- check out my Polars course on Udemy
- follow me on bluesky
- follow me on twitter
- connect with me at linkedin
- check out my youtube videos
or let me know if you would like a Polars workshop for your organisation.