6. Defining Functions

We have seen that Python comes with a bunch of useful functions for performing common tasks. For instance, the built-in round function rounds a number to a specified number of decimal places.

round(3.1415, 2)

We have also seen that we can access even more functions by installing and importing a library.

In some cases, however, there might not be a library providing the function that you need. Luckily, Python allows us to define our own functions. In this section, we’ll see how to create functions and apply them to tables.

6.1. Example

Suppose you are working with a dataset containing a bunch of street addresses, such as the following address of the University of California, San Diego:

ucsd = '9500 Gilman Dr, La Jolla, CA 92093'

Suppose we only care about the city and state. That is, we’d like to extract the string 'La Jolla, CA' from the full address. Python doesn’t come with a function that does exactly this, but we can write our own without too much work.

6.1.1. Splitting Strings

A typical address has several parts: the street address, the city name, the state, and the zip code. The parts are separated by commas (with the exception of the state and zip code). Python strings have a helpful .split method which will split a string into parts according to whatever delimiter we provide. To split by a comma, we write:

['9500 Gilman Dr', ' La Jolla', ' CA 92093']

The result is a list of strings, each of them a part of the original list.

If we do not provide a delimiter, the default behavior of .split is to split based on whitespace (such as spaces):

['9500', 'Gilman', 'Dr,', 'La', 'Jolla,', 'CA', '92093']

We can use .split to retrieve the city and state name. Notice that when we split by commas, the city name will always be the second-to-last entry of the resulting list. This is because the last comma separates the city from the state and zip code. Remember that we can retrieve the second-to-last element of a list using square bracket notation, combined with using -2 as the index:

city = ucsd.split(',')[-2]
' La Jolla'

The result has a leading space that we might want to get rid of – we’ll deal with that in a moment. For now, let’s retrieve the state name. To do this, it might be easiest to split based on whitespace – then the state abbreviation will again be the second-to-last element of the list:

state = ucsd.split()[-2]

We’d like to put the city and state together into a single string, like 'La Jolla, CA'. To do so, remember that the + operator concatenates strings:

city_and_state = city + ', ' + state
' La Jolla, CA'

This is almost perfect, but let’s get rid of the leading space. We can do this with the .strip() string method, which removes leading and trailing whitespace.

'La Jolla, CA'

Great! Putting it all together, here’s the code we used to retrieve the city and state:

city = ucsd.split(',')[-2]
state = ucsd.split()[-2]
city_and_state = city + ', ' + state
'La Jolla, CA'

This code might seem simple enough, but suppose we have another address that we’d like to process:

lego = 'LEGOLAND California Resort 1 Legoland Dr, Carlsbad, CA 92008'

We could copy and paste the code above, but there is a better way: let’s define a function.

6.1.2. The def statement

In Python, new functions are created using the def statement. Here is an example of a function which retrieves the city and state name from an address:

def city_comma_state(address):
    """Return CITY, ST from an address string."""
    city = address.split(',')[-2]
    state = address.split()[-2]
    city_and_state = city + ', ' + state
    return city_and_state.strip()

There is a lot to say about this, but first let’s test the function to see if it works. We call user-defined functions just like any other function:

city_comma_state('9500 Gilman Dr, La Jolla, CA 92093')
'La Jolla, CA'
'La Jolla, CA'
'Carlsbad, CA'

Let’s take a closer look at the anatomy of a function definition. Fig. 6.1 below shows all of the different parts.


Fig. 6.1 The anatomy of a function. Name

A function definition starts with a name. Above, we’ve named our function city_comma_state, but any valid variable name would do. A function’s name should be short but descriptive. Arguments

Next come the function’s arguments. These are the “inputs” to the function. In this case, there is only one argument: the address that will be processed. We’ll see how to define functions with more than one argument in a moment. A function can also have zero arguments, in which case we would write def function_with_no_args():. The arguments can be named anything, as long as they are valid variable names. The arguments are surrounded by parentheses, and separated by commas. Body

The body of the function contains the code that will be executed when the function is called. The arguments can be used within the body of the function. The body of the function must be indented – we usually do this with the tab key. Docstring

The docstring is a piece of documentation that tells the reader what the function does. Including it is optional but recommended. If you ask Python for information on your function using help, the docstring will be displayed!

Help on function city_comma_state in module __main__:

    Return CITY, ST from an address string. Return

A function should usually return some value – this is done using the return statement, followed by an expression whose value will be returned.

6.1.3. Function Behavior

The code we include in a function behaves differently than the code we are used to writing in a couple of key ways. Functions are “recipes”

The code inside of a function is not executed until we call the function. For instance, suppose we try to do something impossible inside of a function – like dividing by zero:

def foo():
    x = 1/0
    return x

If you run the cell defining this function, everything will be fine: you won’t see an error. But when you call the function, Python let’s you know that you’re doing something that is mathematically impossible:

ZeroDivisionError                         Traceback (most recent call last)
Cell In [17], line 1
----> 1 foo()

Cell In [16], line 2, in foo()
      1 def foo():
----> 2     x = 1/0
      3     return x

ZeroDivisionError: division by zero

This is because function definition are like recipes in the sense that handing someone a recipe is not the same as following the recipe and preparing the meal. Scope

Variables defined within a function are available only inside of the function. We can define variables inside a function just as we normally would:

def foo():
    x = 42
    y = 5
    return x + y

If we run the function, we’ll see the number 47 displayed:


However, if we try to use the variable x, Python will yell at us:

NameError                                 Traceback (most recent call last)
Cell In [20], line 1
----> 1 x

NameError: name 'x' is not defined

This is because variables defined within a function are accessible only within the function. If we want to use that variable outside of the function, we need to pass it back to the caller using a return statement.

Note that arguments count as “variables defined within a function”. For instance:

def foo(my_argument):
    return my_argument + 2

If we call the function, everything will act as expected:


But if we try to access my_argument outside of the function, Python tells us that we can’t:

NameError                                 Traceback (most recent call last)
Cell In [23], line 1
----> 1 my_argument

NameError: name 'my_argument' is not defined

On the other hand, variables defined outside of a function are available inside the function. Consider for instance:

x = 42
def foo():
    return x + 10

Use this behavior sparingly – it is usually better to “isolate” a function from the outside world by passing in all of the variables that it needs. return exits the function

As soon as Python encounters a return statement, it stops executing the function and returns the corresponding value. As an example, consider the code below which has three returns. Only the first return statement will ever run:

def foo():
    print('Starting execution.')
    return 1
    print('Hey, I made it!')
    return 2
    print('On to number three...')
    return 3
Starting execution.
1 printing versus returning

As we saw above, functions are somewhat isolated from the rest of the world in the sense that variables defined within them cannot be used outside of the function. The “correct” way of transmitting values back to the world is to use a return statement. However, a common mistake is to think that print does the same thing. This is understandable, since printing and returning looks similar in a Jupyter notebook. For example, let’s define a function that both prints and returns:

def foo():
    x = 42
    y = 52
    return x

When we run this function, we’ll see both values:

z = foo()

Only 42 is the output of the cell and can be “saved” to a variable. 52, on the other hand, is simply displayed to the screen and is afterwards lost forever. This can be checked by displaying the value of z:


Nevertheless, using print inside of a function can be helpful in “debugging” – more on that in a moment. Lastly, if you truly want to return two values from a function, the right way to do so is by separating them with a comma, as follows:

def foo():
    x = 42
    y = 52
    return x, y

When the function is run, it will return a tuple of two things:

(42, 52)

A tuple is like a list, so we can use square bracket notation to retrieve each element:


We won’t usually need to return more than one thing from a function, though.

6.2. Examples

Given a year, produce the decade

Given a year, such as 1994, we’d like to retrieve the decade; in this case, 1990. At first we might think that round is useful:

round(1994, -1)

But it won’t work for years like 1997, since it will round up:

round(1997, -1)

There are a few approaches that do work. One way is to use the % operator. Remember that x % y returns the remainder upon dividing x by y. For example:

1992 % 10

To find the decade, we can simply subtract the remainder obtained by dividing by ten:

1992 - (1992 % 10)
1997 - (1997 % 10)
2000 - (2000 % 10)

Placing this code in a function makes it so we don’t have to remember this trick, and makes our code more readable:

def decade_from_year(year):
    return year - year % 10

Given height and width, compute the area of a triangle

We need to define a function with two variables. We do so by separating the argument names with a comma, like so:

def area_of_triangle(base, height):
    return 1/2 * base * height
area_of_triangle(10, 5)

Note that the order of the arguments matters. When area_of_triangle(10, 5) is executed, Python assigns the value of 10 to base and assigns the value of 5 to height. If you wish, you can use the keyword argument form to call the function, in which case arguments can be provided in any order. This is slightly more readable, too:

area_of_triangle(height=4, base=10)