6. Defining Functions¶
We have seen that Python comes with a bunch of useful functions for performing common tasks. For instance, the built-in
round function rounds a number to a specified number of decimal places.
We have also seen that we can access even more functions by installing and importing a library.
In some cases, however, there might not be a library providing the function that you need. Luckily, Python allows us to define our own functions. In this section, we’ll see how to create functions and apply them to tables.
Suppose you are working with a dataset containing a bunch of street addresses, such as the following address of the University of California, San Diego:
ucsd = '9500 Gilman Dr, La Jolla, CA 92093'
Suppose we only care about the city and state. That is, we’d like to extract the string
'La Jolla, CA' from the full address. Python doesn’t come with a function that does exactly this, but we can write our own without too much work.
6.1.1. Splitting Strings¶
A typical address has several parts: the street address, the city name, the state, and the zip code. The parts are separated by commas (with the exception of the state and zip code). Python strings have a helpful
.split method which will split a string into parts according to whatever delimiter we provide. To split by a comma, we write:
['9500 Gilman Dr', ' La Jolla', ' CA 92093']
The result is a list of strings, each of them a part of the original list.
If we do not provide a delimiter, the default behavior of
.split is to split based on whitespace (such as spaces):
['9500', 'Gilman', 'Dr,', 'La', 'Jolla,', 'CA', '92093']
We can use
.split to retrieve the city and state name. Notice that when we split by commas, the city name will always be the second-to-last entry of the resulting list. This is because the last comma separates the city from the state and zip code. Remember that we can retrieve the second-to-last element of a list using square bracket notation, combined with using
-2 as the index:
city = ucsd.split(',')[-2] city
' La Jolla'
The result has a leading space that we might want to get rid of – we’ll deal with that in a moment. For now, let’s retrieve the state name. To do this, it might be easiest to split based on whitespace – then the state abbreviation will again be the second-to-last element of the list:
state = ucsd.split()[-2] state
We’d like to put the city and state together into a single string, like
'La Jolla, CA'. To do so, remember that the
+ operator concatenates strings:
city_and_state = city + ', ' + state city_and_state
' La Jolla, CA'
This is almost perfect, but let’s get rid of the leading space. We can do this with the
.strip() string method, which removes leading and trailing whitespace.
'La Jolla, CA'
Great! Putting it all together, here’s the code we used to retrieve the city and state:
city = ucsd.split(',')[-2] state = ucsd.split()[-2] city_and_state = city + ', ' + state city_and_state.strip()
'La Jolla, CA'
This code might seem simple enough, but suppose we have another address that we’d like to process:
lego = 'LEGOLAND California Resort 1 Legoland Dr, Carlsbad, CA 92008'
We could copy and paste the code above, but there is a better way: let’s define a function.
In Python, new functions are created using the
def statement. Here is an example of a function which retrieves the city and state name from an address:
def city_comma_state(address): """Return CITY, ST from an address string.""" city = address.split(',')[-2] state = address.split()[-2] city_and_state = city + ', ' + state return city_and_state.strip()
There is a lot to say about this, but first let’s test the function to see if it works. We call user-defined functions just like any other function:
city_comma_state('9500 Gilman Dr, La Jolla, CA 92093')
'La Jolla, CA'
'La Jolla, CA'
Let’s take a closer look at the anatomy of a function definition. Fig. 6.1 below shows all of the different parts.
A function definition starts with a name. Above, we’ve named our function
city_comma_state, but any valid variable name would do. A function’s name should be short but descriptive.
Next come the function’s arguments. These are the “inputs” to the function. In this case, there is only one argument: the address that will be processed. We’ll see how to define functions with more than one argument in a moment. A function can also have zero arguments, in which case we would write
def function_with_no_args():. The arguments can be named anything, as long as they are valid variable names. The arguments are surrounded by parentheses, and separated by commas.
The body of the function contains the code that will be executed when the function is called. The arguments can be used within the body of the function. The body of the function must be indented – we usually do this with the tab key.
The docstring is a piece of documentation that tells the reader what the function does. Including it is optional but recommended. If you ask Python for information on your function using
help, the docstring will be displayed!
Help on function city_comma_state in module __main__: city_comma_state(address) Return CITY, ST from an address string.
A function should usually return some value – this is done using the
return statement, followed by an expression whose value will be returned.
6.1.3. Function Behavior¶
The code we include in a function behaves differently than the code we are used to writing in a couple of key ways.
18.104.22.168. Functions are “recipes”¶
The code inside of a function is not executed until we call the function. For instance, suppose we try to do something impossible inside of a function – like dividing by zero:
def foo(): x = 1/0 return x
If you run the cell defining this function, everything will be fine: you won’t see an error. But when you call the function, Python let’s you know that you’re doing something that is mathematically impossible:
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) Cell In , line 1 ----> 1 foo() Cell In , line 2, in foo() 1 def foo(): ----> 2 x = 1/0 3 return x ZeroDivisionError: division by zero
This is because function definition are like recipes in the sense that handing someone a recipe is not the same as following the recipe and preparing the meal.
Variables defined within a function are available only inside of the function. We can define variables inside a function just as we normally would:
def foo(): x = 42 y = 5 return x + y
If we run the function, we’ll see the number
However, if we try to use the variable
x, Python will yell at us:
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In , line 1 ----> 1 x NameError: name 'x' is not defined
This is because variables defined within a function are accessible only within the function. If we want to use that variable outside of the function, we need to pass it back to the caller using a
Note that arguments count as “variables defined within a function”. For instance:
def foo(my_argument): return my_argument + 2
If we call the function, everything will act as expected:
But if we try to access
my_argument outside of the function, Python tells us that we can’t:
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In , line 1 ----> 1 my_argument NameError: name 'my_argument' is not defined
On the other hand, variables defined outside of a function are available inside the function. Consider for instance:
x = 42 def foo(): return x + 10
Use this behavior sparingly – it is usually better to “isolate” a function from the outside world by passing in all of the variables that it needs.
return exits the function¶
As soon as Python encounters a
return statement, it stops executing the function and returns the corresponding value. As an example, consider the code below which has three
returns. Only the first return statement will ever run:
def foo(): print('Starting execution.') return 1 print('Hey, I made it!') return 2 print('On to number three...') return 3
As we saw above, functions are somewhat isolated from the rest of the world in the sense that variables defined within them cannot be used outside of the function. The “correct” way of transmitting values back to the world is to use a
return statement. However, a common mistake is to think that
returning looks similar in a Jupyter notebook. For example, let’s define a function that both
def foo(): x = 42 y = 52 print(y) return x
When we run this function, we’ll see both values:
z = foo() z
42 is the output of the cell and can be “saved” to a variable.
52, on the other hand, is simply displayed to the screen and is afterwards lost forever. This can be checked by displaying the value of
def foo(): x = 42 y = 52 return x, y
When the function is run, it will return a tuple of two things:
A tuple is like a list, so we can use square bracket notation to retrieve each element:
We won’t usually need to return more than one thing from a function, though.
Given a year, produce the decade
Given a year, such as 1994, we’d like to retrieve the decade; in this case, 1990. At first we might think that
round is useful:
But it won’t work for years like 1997, since it will round up:
There are a few approaches that do work. One way is to use the
% operator. Remember that
x % y returns the remainder upon dividing
y. For example:
1992 % 10
To find the decade, we can simply subtract the remainder obtained by dividing by ten:
1992 - (1992 % 10)
1997 - (1997 % 10)
2000 - (2000 % 10)
Placing this code in a function makes it so we don’t have to remember this trick, and makes our code more readable:
def decade_from_year(year): return year - year % 10
Given height and width, compute the area of a triangle
We need to define a function with two variables. We do so by separating the argument names with a comma, like so:
def area_of_triangle(base, height): return 1/2 * base * height
Note that the order of the arguments matters. When
area_of_triangle(10, 5) is executed, Python assigns the value of 10 to
base and assigns the value of 5 to
height. If you wish, you can use the keyword argument form to call the function, in which case arguments can be provided in any order. This is slightly more readable, too: