Please log in to watch this conference skillscast.
This talk looks at Spark + Koalas, Dask, and Modin + Ray all of which attempt to provide the "holy grail" of big data – distributed pandas. We'll kick it old-school starting with "Sparkling Pandas", one of the OG distributed pandas (with the terrible performance to show for it).
No talk like this would be complete without a conflict of interest disclosure, which for me includes being one of the two original co-authors for Sparkling Pandas (and some funny stories), being a Spark committer, but this balances out with my current work on co-writing books on Dask and Ray.
At the end of this talk you will be questioning if you really want to scale pandas given all of the duct-tape involved, and have a good idea of how to choose which particular duct-taped-together solution is going to involve the least amount of rusty spoons in your eyeballs.
YOU MAY ALSO LIKE:
Distributed pandas – long promised, finally sort of
Holden Karau
Open Source Engineer
Netflix