Time series aggregations with `groupby_dynamic`

This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters

Time series aggregations in Polars are fast and flexible.

In a recent project I had 10 years of two minute data from telemetry and needed hourly averages -approx 2.5 million rows X 100 columns.

In Polars I used groupby_dynamic - turned out to be 10x faster than Pandas leading to happy clients!

        
      
# Create Polars DataFrame
df = pl.DataFrame({
    'date':pl.date_range(datetime(2010,1,1),datetime(2021,1,2),interval='2m'),
})
df = pl.concat([df,pl.DataFrame(np.random.standard_normal((len(df),100)))],how='horizontal')

# Hourly groupby with Polars
df.groupby_dynamic("date",every='1h').agg(pl.all().exclude('date').mean())
# Time: 0.25 seconds


# Hourly groupby with Pandas
dfPandas.groupby(pd.Grouper(key='date',freq='1h')).mean()
# Time: 2.5 seconds

Hourly groupby over 3-hour windows with Polars

To handle high-frequency variability I also needed to do this on 3-hour rolling windows.

Not a problem - you can specify the interval period to get the rolling average required with no performance cost.

        
      
df.groupby_dynamic("date",every='1h',period='3h').agg(pl.all().exclude('date').mean())

Learn more

Want to know more about Polars for high performance data science and ML? Then you can:

or let me know if you would like a Polars workshop for your organisation.

software

Polars Time Series

This post is licensed under CC BY 4.0 by the author.

Flexible time series aggregations in Polars

Time series aggregations with groupby_dynamic

Hourly groupby over 3-hour windows with Polars

Learn more

Further Reading

What is a Polars expression?

Reading from S3 with Polars (or DeltaLake) using AWS SSO

Fitting linear models within Polars

Time series aggregations with `groupby_dynamic`