On the face of it the concat,extend and vstack functions in Polars can do the same job: they can take two initial DataFrames and turn them into a single DataFrame. In this post I show that they do ...
Filtering one df by another
One of the most common questions we get on the Polars discord is how to filter rows in one dataframe by values in another. I think people don’t realise this is a basically a join because they don’...
Embrace streaming mode in Polars
Polars can handle larger-than-memory datasets with its streaming mode. In this mode Polars processes your data in batches rather than all at once. However, the streaming mode is not some emergency ...
Lazy mode's hidden timesaver in Polars
Lazy mode in Polars does not only provide query optimisation and allow you to work with larger than memory datasets. It also provides some type security that can find errors in your pipeline before...
Polars 🤝 Seaborn
Update October 2023 As of Seaborn version v.13.0 Seaborn accepts Polars DataFrames natively🎆. Note that this is not full native support though. Polars copies the data internally to a Pandas Data...
Nested dtypes in Polars 1: the `pl.List` dtype
Polars uses Apache Arrow to store its data in-memory. One of the big advantages of Arrow is that it supports a variety of nested data types (or “dtypes”). In this post we look at the pl.List dtype ...
Talking Polars on the Real Python podcast
I appeared on the Real Python podcast to talk Polars! We chatted about: why lazy mode in Polars is so important working with larger-than-memory datasets transitioning from Pandas to Polars ...
Sinking larger-than-memory Parquet files
Polars now allows you to write Parquet files even when the file is too large to fit in memory. It does this by using streaming to process data in batches and then writing these batches to a Parquet...
Polars ❤️ sorted data 2: groupby
In a previous post we saw that Polars has fast-track algorithms for calculating some statistics on sorted data. In this post we see that Polars also has a fast-track algorithm for getting groupby k...
To go big you must be lazy
I was consulting for a client recently who needs to process hundreds of Gb of CSV files. On their first pass with Polars they had read from their CSVs with a pattern like this (simplified) version....