Published on: 12th September 2022
Time series aggregations with groupby_dynamic
This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters
Time series aggregations in Polars are fast and flexible.
In a recent project I had 10 years of two minute data from telemetry and needed hourly averages -approx 2.5 million rows X 100 columns.
In Polars I used groupby_dynamic - turned out to be 10x faster than Pandas leading to happy clients!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Create Polars DataFrame
df = pl.DataFrame({
'date':pl.date_range(datetime(2010,1,1),datetime(2021,1,2),interval='2m'),
})
df = pl.concat([df,pl.DataFrame(np.random.standard_normal((len(df),100)))],how='horizontal')
# Hourly groupby with Polars
df.groupby_dynamic("date",every='1h').agg(pl.all().exclude('date').mean())
# Time: 0.25 seconds
# Hourly groupby with Pandas
dfPandas.groupby(pd.Grouper(key='date',freq='1h')).mean()
# Time: 2.5 seconds
Hourly groupby over 3-hour windows with Polars
To handle high-frequency variability I also needed to do this on 3-hour rolling windows.
Not a problem - you can specify the interval period to get the rolling average required with no performance cost.
1
df.groupby_dynamic("date",every='1h',period='3h').agg(pl.all().exclude('date').mean())
Learn more
Want to know more about Polars for high performance data science and ML? Then you can:
- check out my Polars course on Udemy
- follow me on bluesky
- follow me on twitter
- connect with me at linkedin
- check out my youtube videos
or let me know if you would like a Polars workshop for your organisation.