Published on: 29th September 2022
Combining data with different schemas
This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters
You’ve got a bunch of data files in your project and they all follow a consistent data schema 😊
You get a new file and see that from now on there will be some useful extra columns. How are you going to combine this file with the old stuff?? 😣
A vertical concatenation won’t work as it doesn’t like schema changes.
This is where diagonal concatenation in Polars comes in.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Old schema year, exporter, importer
dfTrades2020 = pl.DataFrame(
[
{"year":2020,"exporter":"China","importer":"USA"},
{"year":2020,"exporter":"China","importer":"USA"},
]
)
# New schema includes value
dfTrades2021 = pl.DataFrame(
[
{"year":2021,"exporter":"China","importer":"USA","value":10},
{"year":2021,"exporter":"China","importer":"USA","value":100},
]
)
# Diagonal concatenation
pl.concat([dfTrades2020,dfTrades2021],how="diagonal")
Diagonal concatenation appends your new records with their new columns, and add nulls to the new columns for the old records to show the data is missing. Sorted.
Learn more
Want to know more about Polars for high performance data science and ML? Then you can:
- check out my Polars course on Udemy
- follow me on bluesky
- follow me on twitter
- connect with me at linkedin
- check out my youtube videos
or let me know if you would like a Polars workshop for your organisation.