In my ML pipelines these days I find I replace some of the simpler scikit-learn metrics such as root-mean-squared-error with my own hand-rolled Polars expressions. This approach saves me from copyi...
Ordering of groupby and unique in Polars
Polars (and Apache Arrow) has been designed to be careful with your data so you don’t get surprises like the following Pandas code where the ints column has been cast to float because of the missin...
Concat, extend or vstack?
On the face of it the concat,extend and vstack functions in Polars can do the same job: they can take two initial DataFrames and turn them into a single DataFrame. In this post I show that they do ...
Filtering one df by another
One of the most common questions we get on the Polars discord is how to filter rows in one dataframe by values in another. I think people don’t realise this is a basically a join because they don’...
Embrace streaming mode in Polars
Polars can handle larger-than-memory datasets with its streaming mode. In this mode Polars processes your data in batches rather than all at once. However, the streaming mode is not some emergency ...
Lazy mode's hidden timesaver in Polars
Lazy mode in Polars does not only provide query optimisation and allow you to work with larger than memory datasets. It also provides some type security that can find errors in your pipeline before...
Polars 🤝 Seaborn
Update October 2023 As of Seaborn version v.13.0 Seaborn accepts Polars DataFrames natively🎆. Note that this is not full native support though. Polars copies the data internally to a Pandas Data...
Nested dtypes in Polars 1: the `pl.List` dtype
Polars uses Apache Arrow to store its data in-memory. One of the big advantages of Arrow is that it supports a variety of nested data types (or “dtypes”). In this post we look at the pl.List dtype ...
Talking Polars on the Real Python podcast
I appeared on the Real Python podcast to talk Polars! We chatted about: why lazy mode in Polars is so important working with larger-than-memory datasets transitioning from Pandas to Polars ...
Sinking larger-than-memory Parquet files
Polars now allows you to write Parquet files even when the file is too large to fit in memory. It does this by using streaming to process data in batches and then writing these batches to a Parquet...