Notes on (Baby)Pandas

Notes on (Baby)Pandas

In the zoo of tools used by data scientists, Python and Pandas are two of the most popular animals. Python is a general purpose programming language known for its simple syntax and its vibrant ecosystem of third-party packages. Pandas is a Python package for exploring and analyzing large data sets. As a working data scientist, there’s a very good chance that you’ll be using each on a daily basis.

While Pandas is a very powerful tool, it isn’t simple. There are often several ways of doing the same thing. This can make it difficult to get started with Pandas, especially for those new to programming.

Instead of tackling the full complexity of Pandas immediately, in these notes we will learn to use a simplified version of Pandas which we’ve called BabyPandas. As the name suggests, BabyPandas is Pandas, but smaller. To be precise, BabyPandas is a subset of Pandas, meaning that we have carefully taken some of the pieces of Pandas, but left much out. As a result, whatever code you write in BabyPandas will also work with Pandas, so you can be happy in the fact that you’re learning one of the most powerful tools in modern data science.

Authors and Contributors

These notes are authored by Parker Addison and Justin Eldridge. Thank you to the following people who made various contributions:

  • Janine Tiefenbruck

  • Devanshu Desai