Member-only story
As LLMs grow, will Python devs kick Pandas to the curb? Probably so.
Ah Pandas, cute fuzzy animals we all love…but also a Python library that has been the go-to for dealing with tabular data, until now. The problem with Pandas is well-known, toss it a normal(ish)-sized spreadsheet and it handles it like a champ, but pass it a massive amount of tabular data and it’s slow as molasses, if it even works.
Pandas introduced the concept of a DataFrame, essentially the standard for working with tabular data in Python. I love DataFrames, you love DataFrames, but in a world of LLMs and massive datasets, Pandas looks like it’s going the way of the Dodo, and that’ probably more than okay because the alternatives are pretty darn awesome.
Devs bidding farewell to Pandas isn’t an entirely new phenomenon, for years people have been complaining that Pandas is too slow for dealing with large datasets. Hop onto Reddit and you’ll find a zillion posts like this one ⬇️
As you can see, this was written two years ago, and over the last couple of years…well things have changed. LLMs are all the rage and they need data, lots of it, and try passing a 10GB file to Pandas, it’s not going to work.