8. (Baby)Pandas DataFrames¶

Most of the data sets that we are interested in are more complex that simple lists of numbers. For instance, consider a data set containing information about California wildfires. It might contain multiple pieces of data about each fire, including its name, size, location, and cause.

How would we store and analyze such a data set? While we could use NumPy arrays – one holding the fire’s name, another holding the size, etc. – there is a much better way: We will use a table to contain the data, like this:

	name	year	cause	acres	county
0	CAMP	2018	11 - Powerline	153335.562500	Butte
1	BUTTE	2015	14 - Unknown	70846.531250	Calaveras
2	KING	2014	7 - Arson	97684.546875	El Dorado
3	ROUGH	2015	1 - Lightning	151546.812500	Fresno
4	MEGRAM	1999	1 - Lightning	125072.531250	Humboldt
...	...	...	...	...	...
45	DAY	2006	5 - Debris	161815.656250	Ventura
46	MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura
47	THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura
48	SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura
49	COUNTY	2018	14 - Unknown	89831.148438	Yolo

50 rows × 5 columns

8.1. Pandas DataFrames¶

Like NumPy arrays, tables are provided by a third-party extension. The Python package which provides tables is called pandas. Pandas is the tool for doing data science in Python, and it is immensely popular – as of Summer 2020, it was downloaded nearly 1 million times per day. It is without a doubt a powerful tool, and you’ll need to know how to use it if you want to do serious data science. But there’s a problem: pandas is complicated. There are numerous ways to do even the simplest tasks. This makes it hard to learn, especially if you’re new to programming.

This leaves us in an interesting situation. On one hand, we want to learn pandas, because it is the tool used by actual data scientists. On the other hand, we don’t want to be thrown into the deep end. The solution? We’ll take pandas and remove everything that isn’t absolutely necessary, resulting in something simpler and easier to learn. What’s left is still pandas – just not all of it. Because this new package is a smaller (and cuter) version of pandas, we’re calling it babypandas.

To get access to the functionality that babypandas provides, we’ll need to import it:

import babypandas as bpd

Note

We’re going to be using babypandas in the rest of this book, but it should be stressed that babypandas is pandas, just a smaller version of it. So if someone asks if you have experience working with pandas (during a job interview, for instance), you’ll be able to say “yes!”.

In babypandas (and pandas), a table is called a DataFrame (though we’ll use the two terms interchangeably). Since DataFrames are often used to store very large data sets, they are not typically created by typing their entries one by one – instead, they are usually read from a file. We’ll see how to do that in a moment, but for now we assume that we have already loaded a DataFrame into a variable called fires. If we type fires in our Jupyter notebook cell and execute it, it will display the table with nice formatting:

fires

	name	year	cause	acres	county
0	CAMP	2018	11 - Powerline	153335.562500	Butte
1	BUTTE	2015	14 - Unknown	70846.531250	Calaveras
2	KING	2014	7 - Arson	97684.546875	El Dorado
3	ROUGH	2015	1 - Lightning	151546.812500	Fresno
4	MEGRAM	1999	1 - Lightning	125072.531250	Humboldt
...	...	...	...	...	...
45	DAY	2006	5 - Debris	161815.656250	Ventura
46	MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura
47	THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura
48	SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura
49	COUNTY	2018	14 - Unknown	89831.148438	Yolo

50 rows × 5 columns

If we ask for the type of fires, Python will tell us that it is a DataFrame:

type(fires)

babypandas.bpd.DataFrame

A DataFrame consists of columns and rows. Almost always, a row represents a single thing – in this case, a fire – and the columns provide different pieces of information about that thing. In this case, we have a column describing the name of the fire, another describing the cause, and so on.

We can get the number of rows and columns in a DataFrame by asking for its shape:

fires.shape

(50, 5)

This tells us that there are 50 rows and 9 columns. If for whatever reason we just wanted the number of rows, we could ask for the first element of this pair:

fires.shape[0]

Every row and column in a DataFrame has a label. We will use the row and column labels to refer to particular parts of the table and retrieve information from within it. The columns of the above DataFrame are labeled “year”, “name”, “cause”, and so on. The rows of the above table are simply labeled “0”, “1”, “2”, and so forth.

8.2. The Index¶

Together, the row labels are called the table index. By default, a table’s rows are labeled by numbering them. However, in many cases it makes more sense to label the rows in some other way. For example, each row in our current data set is a single fire. Perhaps it makes more sense to use the fire’s name as its row label. We can ask babypandas to use a particular column as the row labels with the .set_index method:

fires.set_index('name')

	year	cause	acres	county
name
CAMP	2018	11 - Powerline	153335.562500	Butte
BUTTE	2015	14 - Unknown	70846.531250	Calaveras
KING	2014	7 - Arson	97684.546875	El Dorado
ROUGH	2015	1 - Lightning	151546.812500	Fresno
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt
...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura
COUNTY	2018	14 - Unknown	89831.148438	Yolo

50 rows × 4 columns

The .set_index method accepts one argument: the label of the column that should be used as the index. It then creates a new DataFrame in which the index has been replaced with the information from this column; the old DataFrame is not changed. In order to save the results, we’ll need to assign the new table to a variable, like so:

fires_by_name = fires.set_index('name')
fires_by_name

	year	cause	acres	county
name
CAMP	2018	11 - Powerline	153335.562500	Butte
BUTTE	2015	14 - Unknown	70846.531250	Calaveras
KING	2014	7 - Arson	97684.546875	El Dorado
ROUGH	2015	1 - Lightning	151546.812500	Fresno
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt
...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura
COUNTY	2018	14 - Unknown	89831.148438	Yolo

50 rows × 4 columns

Notice that the fire names have been moved all the way to the left, and have been made bold – this is babypandas’ way of showing that these names are now the index.

Warning

The index is not a column – it is it’s own separate thing. When we use .set_index, the old index is thrown out and number of columns decreases by one.

Since we will later use row labels to refer to specific rows by name, the labels should be unique. In this case, that means that every fire should have a unique name. In this case, every fire is uniquely named, and it is fine to use the fire name as the index. Later, we’ll see a larger version of this data set in which there are multiple fires with the same name. In that case, the name should probably not be used as the index.

A table’s index is essentially an array. We can get the index by writing:

fires_by_name.index

Index(['CAMP', 'BUTTE', 'KING', 'ROUGH', 'MEGRAM', 'RANCH', 'VALLEY ',
       'ROCKY  ', 'FORK', 'RUSH', 'RAVENNA', 'WOOLSEY', 'STATION', 'FERGUSON',
       'DETWILER', 'SCARFACE', 'SOBERANES', 'INDIANS', 'KIRK', 'BASIN COMPLEX',
       'MARBLE-CONE', 'CHIPS', 'OLD', 'HARRIS 2', 'CEDAR', 'WITCH', 'LAGUNA',
       'LAS PILITAS', 'HIGHWAY 58', 'ZACA', 'REFUGIO', 'LA BREA', 'CARR ',
       'OAK', 'FRYING PAN', 'GLASS MOUNTAIN', 'KINCADE', 'CAMPBELL',
       'SKINNER MILL', 'HAPPY', 'MANTER', 'MCNALLY', 'RIM', 'CLAMPITT FIRE',
       'WHEELER #2', 'DAY', 'MATILIJA', 'THOMAS', 'SIMI FIRE', 'COUNTY'],
      dtype='object', name='name')

We can then access individual elements of the index using the same notation as used with arrays, remembering that Python starts counting from zero:

# the first element
fires_by_name.index[0]

'CAMP'

# the second element
fires_by_name.index[1]

'BUTTE'

# the last element
fires_by_name.index[-1]

'COUNTY'

8.3. Series¶

8.3.1. Getting a column with `.get`¶

We can retrieve a particular column from the table with the .get method. For instance, to get the column labeled “acres”, we would write:

fires_by_name.get('acres')

name
CAMP         153335.562500
BUTTE         70846.531250
KING          97684.546875
ROUGH        151546.812500
MEGRAM       125072.531250
                 ...      
DAY          161815.656250
MATILIJA     219999.281250
THOMAS       281790.875000
SIMI FIRE    107570.398438
COUNTY        89831.148438
Name: acres, Length: 50, dtype: float64

The result might look like a DataFrame with one column, but it’s actually a new type of object called a Series:

type(fires_by_name.get('acres'))

babypandas.bpd.Series

A Series is basically an array, but with an index. A Series represents a column in a DataFrame. This means that we can think of the columns of a DataFrame as being arrays (more or less).

8.3.2. Arithmetic¶

Because a Series is like an array, we can do similar things with it. For instance, we can perform elementwise arithmetic with a Series. Let’s try it out by converting the fire sizes from acres to square miles. Each acre is 0.0015625 square miles, so we can do the conversion with a simple multiplication:

fires_by_name.get('acres') * 0.0015625

name
CAMP         239.586816
BUTTE        110.697705
KING         152.632104
ROUGH        236.791895
MEGRAM       195.425830
                ...    
DAY          252.836963
MATILIJA     343.748877
THOMAS       440.298242
SIMI FIRE    168.078748
COUNTY       140.361169
Name: acres, Length: 50, dtype: float64

We can also perform arithmetic with two series, assuming that they are the same size.

8.3.3. Series methods: `.mean`, `.max`, `.describe`, etc.¶

Series objects also come with a bunch of useful methods attached, like .mean and .max. For example, the average size of a fire in this data set is:

fires_by_name.get('acres').mean()

135919.636875

The largest fire burned this many acres:

fires_by_name.get('acres').max()

410202.46875

And the earliest fire in the data set was in the year:

fires_by_name.get('year').min()

A very useful Series method is .describe. It gives us a quick look at the basic statistics of the data in a particular column:

fires_by_name.get('year').describe()

count      50.000000
mean     1998.840000
std        25.220141
min      1910.000000
25%      1996.000000
50%      2007.500000
75%      2015.000000
max      2019.000000
Name: year, dtype: float64

From this, we can see that there are 50 fires in the data set, the earliest from 1910 and the latest from 2019. The 25%, 50%, and 75% refer to percentiles. That is, 25% of the fires occurred during or before 1996, and half occurred during or before 2007. This also means that half occurred between 2007 and 2019!

We will see more Series methods throughout these notes, but only when we need to use them.

Jupyter Tip

You can ask Jupyter for some information on all of the Series methods available by writing help(bpd.Series). The methods starting with two underscores (__) are called “dunder” methods, and implement special behavior. You’re not meant to call them direcly, so you can pretty much ignore them.

8.4. Adding and removing columns¶

Above, we saw that we could convert the 'acres' column to square miles using a little bit of array math. But doing so doesn’t change the table. What if we want to add this column to our table?

8.4.1. Adding a column with `.assign`¶

Adding a column can be done with the .assign method, like this:

fires_by_name.assign(sqmiles=fires_by_name.get('acres') * 0.0015625)

	year	cause	acres	county	sqmiles
name
CAMP	2018	11 - Powerline	153335.562500	Butte	239.586816
BUTTE	2015	14 - Unknown	70846.531250	Calaveras	110.697705
KING	2014	7 - Arson	97684.546875	El Dorado	152.632104
ROUGH	2015	1 - Lightning	151546.812500	Fresno	236.791895
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt	195.425830
...	...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura	252.836963
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura	343.748877
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura	440.298242
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura	168.078748
COUNTY	2018	14 - Unknown	89831.148438	Yolo	140.361169

50 rows × 5 columns

There’s a lot going on here, so let’s break it down. First, the assign method takes a single argument: a series that will become the new column. But the way that we pass this argument is new. Instead of simply passing the argument itself, we also give the argument a name by writing sqmiles=. This will be the column’s label. Arguments written in the form argument_name=argument_value are called keyword arguments.

We can call the column anything we like, as long as it is a valid python variable name. This means that the variable name cannot contain spaces, or start with a number. If you try, you’ll get a SyntaxError:

fires_by_name.assign(square miles=fires_by_name.get('acres') * 0.0015625)

  Cell In [21], line 1
    fires_by_name.assign(square miles=fires_by_name.get('acres') * 0.0015625)
                         ^
SyntaxError: invalid syntax. Perhaps you forgot a comma?

Instead of spaces, we can use underscores:

fires_by_name.assign(square_miles=fires_by_name.get('acres') * 0.0015625)

	year	cause	acres	county	square_miles
name
CAMP	2018	11 - Powerline	153335.562500	Butte	239.586816
BUTTE	2015	14 - Unknown	70846.531250	Calaveras	110.697705
KING	2014	7 - Arson	97684.546875	El Dorado	152.632104
ROUGH	2015	1 - Lightning	151546.812500	Fresno	236.791895
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt	195.425830
...	...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura	252.836963
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura	343.748877
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura	440.298242
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura	168.078748
COUNTY	2018	14 - Unknown	89831.148438	Yolo	140.361169

50 rows × 5 columns

The second thing to note is that .assign creates an entirely new table containing the new column. It does not change the old table, as we can verify by recalling the value of fires_by_name:

fires_by_name

	year	cause	acres	county
name
CAMP	2018	11 - Powerline	153335.562500	Butte
BUTTE	2015	14 - Unknown	70846.531250	Calaveras
KING	2014	7 - Arson	97684.546875	El Dorado
ROUGH	2015	1 - Lightning	151546.812500	Fresno
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt
...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura
COUNTY	2018	14 - Unknown	89831.148438	Yolo

50 rows × 4 columns

Note

Wherever possible, DataFrame and Series methods return new objects instead of modifying existing ones. Creating copies like this results in code that is easier to reason about helps prevent strange bugs in your code.

In order to permanently add the column to the table, we need to save the result of .assign to a variable.

fires_with_sqmiles = fires_by_name.assign(
    sqmiles=fires_by_name.get('acres') * 0.0015625
)
fires_with_sqmiles

	year	cause	acres	county	sqmiles
name
CAMP	2018	11 - Powerline	153335.562500	Butte	239.586816
BUTTE	2015	14 - Unknown	70846.531250	Calaveras	110.697705
KING	2014	7 - Arson	97684.546875	El Dorado	152.632104
ROUGH	2015	1 - Lightning	151546.812500	Fresno	236.791895
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt	195.425830
...	...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura	252.836963
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura	343.748877
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura	440.298242
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura	168.078748
COUNTY	2018	14 - Unknown	89831.148438	Yolo	140.361169

50 rows × 5 columns

8.4.2. Removing a column with `.drop`¶

Columns can be removed using the .drop method. It accepts one keyword argument: columns. The argument can either be the label of a single column as a string, or a list of column labels. As with .assign, the result is a new DataFrame (a copy).

For example, to get rid of the 'sqmiles' column:

fires_with_sqmiles.drop(columns='sqmiles')

	year	cause	acres	county
name
CAMP	2018	11 - Powerline	153335.562500	Butte
BUTTE	2015	14 - Unknown	70846.531250	Calaveras
KING	2014	7 - Arson	97684.546875	El Dorado
ROUGH	2015	1 - Lightning	151546.812500	Fresno
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt
...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura
COUNTY	2018	14 - Unknown	89831.148438	Yolo

50 rows × 4 columns

If we didn’t want the cause or the county:

fires_with_sqmiles.drop(columns=['cause', 'county'])

	year	acres	sqmiles
name
CAMP	2018	153335.562500	239.586816
BUTTE	2015	70846.531250	110.697705
KING	2014	97684.546875	152.632104
ROUGH	2015	151546.812500	236.791895
MEGRAM	1999	125072.531250	195.425830
...	...	...	...
DAY	2006	161815.656250	252.836963
MATILIJA	1932	219999.281250	343.748877
THOMAS	2017	281790.875000	440.298242
SIMI FIRE	2003	107570.398438	168.078748
COUNTY	2018	89831.148438	140.361169

50 rows × 3 columns

Note that the argument name (columns) is not something we can change, unlike the keyword argument name used in .assign. We must use columns=..., or else Python will not understand us. And if you don’t use the keyword name, Python will be upset:

fires_with_sqmiles.drop('county')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [27], line 1
----> 1 fires_with_sqmiles.drop('county')

TypeError: DataFrame.drop() takes 1 positional argument but 2 were given

8.4.3. Renaming columns¶

How do we rename a column? Suppose we want to rename sqmiles to square_miles. To do so, we:

Add a new column with the desired name by copying the old column.
Drop the old column

For instance:

fires_with_new_name = fires_with_sqmiles.assign(
    square_miles=fires_with_sqmiles.get('sqmiles')
)
fires_with_new_name.drop(columns='sqmiles')

	year	cause	acres	county	square_miles
name
CAMP	2018	11 - Powerline	153335.562500	Butte	239.586816
BUTTE	2015	14 - Unknown	70846.531250	Calaveras	110.697705
KING	2014	7 - Arson	97684.546875	El Dorado	152.632104
ROUGH	2015	1 - Lightning	151546.812500	Fresno	236.791895
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt	195.425830
...	...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura	252.836963
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura	343.748877
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura	440.298242
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura	168.078748
COUNTY	2018	14 - Unknown	89831.148438	Yolo	140.361169

50 rows × 5 columns

We can also do this in a single piece of code, without intermediate variables:

(
    fires_with_sqmiles
    .assign(square_miles=fires_with_sqmiles.get('sqmiles'))
    .drop(columns='sqmiles')
)

	year	cause	acres	county	square_miles
name
CAMP	2018	11 - Powerline	153335.562500	Butte	239.586816
BUTTE	2015	14 - Unknown	70846.531250	Calaveras	110.697705
KING	2014	7 - Arson	97684.546875	El Dorado	152.632104
ROUGH	2015	1 - Lightning	151546.812500	Fresno	236.791895
MEGRAM	1999	1 - Lightning	125072.531250	Humboldt	195.425830
...	...	...	...	...	...
DAY	2006	5 - Debris	161815.656250	Ventura	252.836963
MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura	343.748877
THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura	440.298242
SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura	168.078748
COUNTY	2018	14 - Unknown	89831.148438	Yolo	140.361169

50 rows × 5 columns

Tip

You can break up long expressions by surrounding the whole expression with parentheses and inserting line breaks wherever makes sense. We’ll often break right at a method call.

This trick of applying two methods, one after the other, in one line of code is called method chaining. It works because the result of .assign is itself a table. When Python evalautes the expression, it first evaluates the .assign, then uses this table during the call to .drop.

Method chaining is useful and can save us some typing, but it can be overused. It is sometimes better to save intermediate results.

Tip

If your method-chaining code isn’t working as you’d expect, break apart the code and save intermediate variables. Print out the values of these variables to do some debugging.

8.5. Reading CSV files¶

As mentioned above, DataFrames are not typically created by typing their entries by hand, one-by-one. Instead, we usually download a data set in a standard format and read it from disk. One such standard format is CSV, or comma-separated values.

A CSV file is simply a text file in a certain format. Here are the first few lines of the CSV file containing our wildfire data:

name,year,cause,acres,county
CAMP,2018,11 - Powerline,153335.5625,Butte
BUTTE,2015,14 - Unknown,70846.53125,Calaveras
KING,2014,7 - Arson,97684.546875,El Dorado
ROUGH,2015,1 - Lightning,151546.8125,Fresno
MEGRAM,1999,1 - Lightning,125072.53125,Humboldt
RANCH,2018,14 - Unknown,410202.46875,Lake
VALLEY ,2015,14 - Unknown,76084.8359375,Lake
ROCKY  ,2015,9 - Miscellaneous,69438.1640625,Lake
FORK,1996,7 - Arson,83056.9453125,Lake

As it’s name suggests, a CSV file consists of values, separated by commas. The first line of the file usually contains the column labels. CSV is a widely-used format, and can be read by many pieces of software, including Excel and Google Sheets.

We can read a CSV file into a babypandas DataFrame using the bpd.read_csv function. We give this function a string containing the filepath to the CSV file we want to read. For example, our wildfire data exists in a file called calfire.csv contained in the data/ directory. We can read it into a DataFrame as follows:

calfire = bpd.read_csv('data/calfire.csv')
calfire

	name	year	cause	acres	county
0	CAMP	2018	11 - Powerline	153335.562500	Butte
1	BUTTE	2015	14 - Unknown	70846.531250	Calaveras
2	KING	2014	7 - Arson	97684.546875	El Dorado
3	ROUGH	2015	1 - Lightning	151546.812500	Fresno
4	MEGRAM	1999	1 - Lightning	125072.531250	Humboldt
...	...	...	...	...	...
45	DAY	2006	5 - Debris	161815.656250	Ventura
46	MATILIJA	1932	9 - Miscellaneous	219999.281250	Ventura
47	THOMAS	2017	9 - Miscellaneous	281790.875000	Ventura
48	SIMI FIRE	2003	14 - Unknown	107570.398438	Ventura
49	COUNTY	2018	14 - Unknown	89831.148438	Yolo

50 rows × 5 columns

Modifying the DataFrame will not affect the data on disk in any way.

Notes on (Baby)Pandas

(Baby)Pandas DataFrames

Contents

8. (Baby)Pandas DataFrames¶

8.1. Pandas DataFrames¶

8.2. The Index¶

8.3. Series¶

8.3.1. Getting a column with `.get`¶

8.3.2. Arithmetic¶

8.3.3. Series methods: `.mean`, `.max`, `.describe`, etc.¶

8.4. Adding and removing columns¶

8.4.1. Adding a column with `.assign`¶

8.4.2. Removing a column with `.drop`¶

8.4.3. Renaming columns¶

8.5. Reading CSV files¶

Notes on (Baby)Pandas

(Baby)Pandas DataFrames

Contents

8. (Baby)Pandas DataFrames¶

8.1. Pandas DataFrames¶

8.2. The Index¶

8.3. Series¶

8.3.1. Getting a column with .get¶

8.3.2. Arithmetic¶

8.3.3. Series methods: .mean, .max, .describe, etc.¶

8.4. Adding and removing columns¶

8.4.1. Adding a column with .assign¶

8.4.2. Removing a column with .drop¶

8.4.3. Renaming columns¶

8.5. Reading CSV files¶

8.3.1. Getting a column with `.get`¶

8.3.3. Series methods: `.mean`, `.max`, `.describe`, etc.¶

8.4.1. Adding a column with `.assign`¶

8.4.2. Removing a column with `.drop`¶