Profiling a Polars query

By Liam Brannigan

Posted Nov 10, 2022 Updated 2 months ago 1 min read

This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters

You can’t optimise your code if you don’t know where the bottleneck is.

DataPolars now has a profiling tool to show you what it’s getting up to.

You can get this data by calling .profile on any lazy query. Even better, we can get a plot visualising the time spent on each step.

In this example we read from a CSV file, do a groupby and then a sort.

In the chart we see that reading the CSV file is the bottleneck. So we should focus our efforts on that step with strategies like specifying dtypes or - even better - converting to Parquet or Arrow.

Learn more

Want to know more about Polars for high performance data science and ML? Then you can:

or let me know if you would like a Polars workshop for your organisation.

software

Polars Optimisation

This post is licensed under CC BY 4.0 by the author.

Profiling a Polars query

Learn more

Further Reading

Polars ❤️ sorted data 1: statistics

Polars on a diet

Inspecting a Polars query