Updated December 2023 This post was created while writing my Up & Running with Polars course. Check it out on Udemy with a 50% discount These days working in serverless environments with ...
What does ChatGPT's Advanced Data Analysis have installed?
How we do it So what does ChatGPT’s Advanced Data Analysis have installed? On Twitter last night I showed how you can get ChatGPT to install a python package like Polars from a wheel file. We do t...
Maybe they should just call it Regular Data Analysis
Open AI used to have a product called Code Interpreter. Which was a name that didn’t make much sense because it doesn’t interpret code. Instead it’s a language model that can ingest CSVs and genera...
Reading and writing files on S3 with Polars
Updated June 2024 for Polars version 1.0 In this post we see how to read and write from a CSV or Parquet file in S3 with Polars. We also see how to filter the file on S3 before downloading it to r...
Understanding the Polars nested column types
Polars has 4 native nested column types. These can be very helpful at solving problems such as: working with ML embeddings splitting strings working with nested JSON data working with aggr...
Comparison of Matplotlib and Plotly in Polars
Updated July 2023 From Plotly v5.15.0 onwards Plotly has native support for Polars😊. So you can pass the DataFrame as the first argument and the column names as strings to the x and y encoding argu...
Filling time series gaps in lazy mode
Two major advantages of Polars over Pandas is that Polars has a lazy mode with query optimization and that Polars can scale to larger-than-memory datasets with its streaming mode. Taking advantage ...
Crucial parameters for streaming in Polars
In this post we see how Polars sets some crucial parameters that affect streaming mode. Understanding these concepts is important if you want to optimize the performance of a large streaming query ...
Exploding a Polars pivot for feature engineering
In my ML pipelines these days I find I replace some of the simpler scikit-learn metrics such as root-mean-squared-error with my own hand-rolled Polars expressions. This approach saves me from copyi...
Ordering of groupby and unique in Polars
Polars (and Apache Arrow) has been designed to be careful with your data so you don’t get surprises like the following Pandas code where the ints column has been cast to float because of the missin...