3. Expressions

Now that we know the basics of Jupyter notebooks, we can start exploring Python itself. Before we do, it might interest you to know that most of the pages in these notes were actually written as Jupyter notebooks – including this page! In fact, you can see the notebook version of this page by hovering over the rocket icon above and selecting “JupyterHub”. This will open the current page as a notebook on JupyterHub.

../_images/open_as_notebook.gif

Fig. 3.1 Open this page as a notebook.

Feel free to experiment by changing the notebook’s code cells and running them. Don’t worry: the changes will only appear in your notebook, and won’t affect the course notes.

Note

Keep in mind the philosophy of these notes: we’ll learn only as much Python as necessary to do data science. This chapter will discuss the absolute basics, and we’ll introduce other Python features later when they become necessary.

3.1. Python as a Calculator

In Tools of the Trade, we mentioned that data scientists use computers as if they are advanced calculators. As such, we’ll start our exploration of Python by doing some basic mathematical calculations. At first it might seem like Python isn’t any more useful than the calculator app on your phone, but by the end of this chapter you’ll start to see why it is such a powerful tool.

In the last section, we saw how to add two numbers with Python:

3 + 8
11

The first line shows the Python code we write in our notebook’s code cell, and the line below shows the output we would see if we were to run that code cell.

If this isn’t amazing enough, Python can subtract, multiply, and divide, too:

3 - 8
-5
3 * 8
24
3 / 8
0.375

Notice that multiplication is performed by writing an asterisk, *, in between two numbers. This is opposed to writing something like 3 x 8. In fact, if we were to write 3 x 8, Python would complain:

3 x 8
  Cell In [5], line 1
    3 x 8
      ^
SyntaxError: invalid syntax

This angry message is called an exception. In this case, the exception is a SyntaxError. This is Python’s way of telling us that we’ve written some code that it doesn’t understand. Remember, a human might understand what is meant by 3 x 8, but Python doesn’t. We have to be careful to use the precise syntax that Python expects when writing code – in this case, it’s as simple as writing 3 * 8 instead of 3 x 8. We’ll see more about exceptions later.

Two other arithmetic operators might be useful to you. To exponentiate a number (raise it to a power), use **:

5**2
25
4**.5
2.0

To find the remainder when dividing two numbers, use %:

14 % 5
4

The above lines of Python are examples of expressions. An expression is a piece of code that can be evaluated to produce a value. For instance, 3 - 8 is an expression which evaluates to -5. While these examples are of arithmetic expressions, general Python expressions do not need to evaluate to numbers – they could produce text, matrices (from math), images, etc.

More complex expressions can be built by combining simpler expressions, as we’ll see throughout this course. In particular, arithmetic expressions can be combined just as you might imagine. For instance:

(12 + 3)*5 + 40
115

Notice the use of parenthesis to group the expression 12 + 3. Python follows the same order of operations you learned in kindergarten: expressions within parenthesis are evaluated first, then exponentiation, then multiplication, division, addition, and subtraction.

Suppose the parenthesis are removed from (12 + 3)*5 + 40. What is the result?

3.1.1. Example

How many seconds are in one year? There are 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 365 days in a year, for a total of

60 * 60 * 24 * 365
31536000

seconds in a normal year.

Hint

It is sometimes useful to include a comment alongside your code, explaining what it does. In Python, a line starting with # is treated as a comment, and is effectively ignored. For instance:

# (seconds in a minute) * (minutes in an hour) * (hours in a day) * (days in a year)
60 * 60 * 24 * 365

It is good to comment your code, but don’t go overboard – it is possible to have too many comments! For example, the comment below probably doesn’t help:

# multiply three and five
3 * 5

3.2. Saving the results

When performing a long computation, it is often useful to store the result of a calculation so that we can use it again later. In most programming languages, we can do this by giving a name to the result of an expression. In Python, the result of an expression can be given a name using =. For example:

the_answer_to_everything = 21 * 2

This line of code stores the number 42 in a variable named the_answer_to_everything. To recall the value of this variable, we can write its name in a cell:

the_answer_to_everything
42

We can also use this variable in expressions. For example, to add 12 to the_answer_to_everthing:

the_answer_to_everything + 12
54

Multiple variables can be defined in the same cell:

a = 1
b = 2
c = 3

And multiple variables can be used in the same expression:

a + b + c
6

Names can be reassigned. Up above, we assigned the value of 1 to the variable a. If we change our mind and want to assign, say, 99 instead, we simply do as such:

a = 99

Now the value of a is 99, and its previous value has been “forgotten”:

a
99

Notice that a cell containing only an assignment has no output:

seconds_in_a_year = 60 * 60 * 24 * 365

This might seem strange: in the previous section, the result of our expressions was displayed as the cell’s output. What gives?

The reason we see no output is because the code seconds_in_a_year = 60 * 60 * 24 * 365 is not an expression, it is a statement. While expressions have values, statements do not. When a notebook cell is run, the value of the last line in the cell is used as the output. But since an assignment statement has no value, there is no value to print.

However, the line of code consisting only of a variable’s name is an expression – its value is the value of the variable. For instance:

seconds_in_a_year
31536000

This suggests the following workaround to the fact that assignments result in no output:

Jupyter Tip

To have Jupyter display the value of a variable that has just been assigned, write the variable’s name as the last line of the cell. For example:

number_of_seconds_in_a_year = 60 * 60 * 24 * 365
number_of_seconds_in_a_year

Testing it out:

seconds_in_a_year = 60 * 60 * 24 * 365
seconds_in_a_year
31536000

Valid names may include letters, underscores, and numbers – but they must start with a letter or underscore. Names are case-sensitive, meaning that My_variable and my_variable are two different, distinct names.

You can experiment with variable assignments in order to get a better feeling for how Python works. For instance, suppose we define two variables, a and b, and a third variable c to be their sum:

a = 5
b = 3
c = a + b
c
8

We see that, as expected, the value of c is 8. Now suppose we create and run a new code cell containing:

a = 42

If we were to print the value of c, what would we see? Would it still be 8? Or now it “update” to become 45 now that a has changed? Try it by creating a new cell and writing the necessary code. The answer is below:

Suppose we were to print the variable c. What would we observe its value to be?

3.2.1. Example

A lightyear is a unit of measurement equal to the distance that light travels in one Earth year. Because light is very fast, a lightyear is quite a large distance.

Suppose we want to calculate the number of lightyears between the Earth and the Sun. Let’s start by assuming that we know two things:

  1. The speed of light is 186,000 miles per second

  2. The Earth is 93 million miles away from the sun

Here’s our strategy: we’ll first calculate how long a lightyear is in miles, then we’ll divide 93 million miles by this number to find how many lightyears are between the Sun and the Earth.

So how long is a lightyear, in miles? That is, how far does light travel in one year. Well, it travels 186,000 miles per second. We have already calculated the number of seconds in a year above, and stored the result in a variable called seconds_in_a_year. Therefore, light travels:

hours in one day. And since there are 365 days in a (normal) year, light travels

186_000 * seconds_in_a_year
5865696000000

miles in one year. Now we can get what we came for. Dividing 93 million by this number will give us the distance to the Sun in lightyears:

93_000_000 / 5865696000000
1.5854895991882292e-05

The result is a relatively small number expressed in scientific notation. Remember that 1.585e-5 is shorthand for \(1.585 \times 10^{-5}\).

Tip

Giving names to variables makes your code easier to understand. It’s often a good idea to break a long calculation up into intermediate steps and give names to the result of each part.

3.3. The Kernel: The “Brains” of a Notebook

When we define a new variable, we expect the notebook to remember its value. But where is this value kept, precisely?

It might help to understand a little more about how notebooks work. When you launch a Jupyter notebook, an instance of Python called a kernel is started on a remote server. When you run a cell, the code is sent over the internet to the kernel, which then evaluates the code and sends back the result for display in our browser. In this way, the kernel is the “brain” of the notebook, since it not only does the calculations, but also remembers the values of the variables we have defined.

Just knowing that the kernel exists can help us better understand how notebooks work, and therefore avoid some common mistakes. Consider the following problematic situation. You need to calculate the area of a triangle, so your first define variables base and height as such:

base = 3
height = 4

You then use the formula for the area of a triangle, but make a mistake: you forget to multiple by 1/2:

area = base * height

At this point, the kernel has the following values for each variable:

  • base: 3

  • height: 4

  • area: 12

Unfortunately, you don’t recognize the mistake in the area formula immediately. You continue on, creating new code cells and executing them.

Later, you are working with a different shape and overwrite the height variable, giving it a new value of 10:

height = 10

Now the kernel has the following values:

  • base: 3

  • height: 10

  • area: 12

Note that the value of area did not change since we did not re-run the cell containing its definition.

Now suppose that you realize that you got the formula for the area of a triangle wrong. You go back and fix the cell and re-run it. The question is: which value of height is used? The value 4, from above the cell? Or the value 10 from below? Try it and see! You’ll find that the new value of height is what is used when re-computing the area. This is obviously not what we wanted.

It turns out that it isn’t the order of the cells within the notebook that matters, it is the order in which they are executed. Typically, cells are executed in the order in which they appear in the notebook, but there are many instances where the cells are executed out-of-order. In such instances, weird “bugs” can occur, such as the one described above.

Luckily there is a simple fix to this problem. Select “Kernel -> Restart and Run All” from the menu. This will restart the kernel, causing it to forget the value of all variables it currently knows. All of the cells will also be re-run, from top to bottom. Restarting a kernel like this is like pushing “reset”.

Jupyter Tip

If you notice strange behavior while working with a Jupyter notebook, remember the number one rule of debugging: try turning it off and then back on. The equivalent of this with a Jupyter notebook is selecting “Kernel -> Restart and Run All” from the top menu.