Basic Python types and containers

Like most computer languages, Python allows one to work with numbers and text. Let's start with numbers. The following is not intended to be comprehensive; it is intended to give you enough information and examples to get started. Then it is up to you to learn by experimenting, by looking in the documentation, by looking at more code examples, and by asking questions.

Note: in this notebook we are working with functionality that comes with Python itself; we will introduce the numpy package and other key components for scientific computing in the subsequent notebooks.

Numbers and Booleans

Python's system of numbers is quite simple: there are integers, floating point numbers, complex numbers, and Boolean (True or False) types, together with their associated operations. The types are determined dynamically, so operations involving integers and floating point numbers yield floating point numbers.

In [1]:
print("1 + 2 = ", 1 + 2)
print("1.0 + 2 =", 1.0 + 2)
1 + 2 =  3
1.0 + 2 = 3.0

Division can be tricky because traditional computer languages use integer division on integers. This was the case by default with Python 2, but there it can be changed via the first line in the following block, which you will often see in code written for both Python versions. Python 3 uses "true division" for '/' and integer division only with the explicit integer division operator, '//'. Notice that the even with '//' the type of the number returned is a float if either of the arguments is a float.

In [2]:
from __future__ import division # get Py3 behavior if on Py2
print("4 / 3 = ", 4 / 3) # would return 1 on Py2 without the import
print("4.0 / 3 =", 4.0 / 3)
print("4 // 3 = ", 4 // 3)
print("4.0 // 3 = ", 4.0 // 3)
4 / 3 =  1.3333333333333333
4.0 / 3 = 1.3333333333333333
4 // 3 =  1
4.0 // 3 =  1.0

There are some built-in functions that operate on numbers, e.g.:

In [3]:
print(int(4/3))
print(round(4/3, 1))
print(abs(4.2 - 5)) # Note binary floating point inability
                    # to represent a decimal number exactly.
print(pow(2, 3))
1
1.3
0.7999999999999998
8

Arithmetic operators include exponentiation with the following notation as an alternative to the pow() function:

In [4]:
print(2**3)
8

For more math functions, one can import the math module from the standard library:

In [5]:
import math
print(math.sin(1))
print(math.sqrt(2))
0.8414709848078965
1.4142135623730951

We will rarely use the math module, however, because numpy provides the same functionality and much more.

Complex numbers use this notation:

In [6]:
print("complex number:", 1.0 + 2.3j)
complex number: (1+2.3j)

Boolean values are either True or False, and result from conditional expressions like this:

In [7]:
print("1 > 2 is", 1 > 2, "but 1 < 2 is", 1 < 2)
1 > 2 is False but 1 < 2 is True

Here is a more complex conditional expression:

In [8]:
print(1 > 2 or 3 < 4)
True

Strings

Much computer programming requires working with text. A piece of text is held in an object called a string. There is an additional layer of complexity, and of difference between Python 2 and 3, associated with how text is represented internally (in memory) versus externally (on the page or screen). This is the dreaded "unicode". We will blissfully ignore it for now.

Python strings can be created with the str() function, and as literals using single quotes, double quotes, triple single quotes, or triple double quotes. The triples can enclose blocks of text spanning multiple lines. The following are all valid strings, each assigned to a variable name:

In [9]:
a = 'Single quotes'
b = "Double quotes"
c = "a string that has 'single quotes' inside it"
d = """This is a multiline
sentence string, ending with a linefeed.
"""
e = '''
This is also valid. It starts and ends with a linefeed.
'''
for example in (a, b, c, d, 3):
    print(example)
Single quotes
Double quotes
a string that has 'single quotes' inside it
This is a multiline
sentence string, ending with a linefeed.

3

Strings can be added (concatenated):

In [10]:
print(d + e)
This is a multiline
sentence string, ending with a linefeed.

This is also valid. It starts and ends with a linefeed.

Jargon alert: Any particular string is an object that is an instance of a class (or type), and therefore operations on it can be performed by invoking methods of that class (str). Methods are invoked using "dot" syntax, e.g.:

In [11]:
print("The variable '%s' is an instance of the %s class" % ('a', a.__class__))
print("Converted to upper case:", a.upper())
print("Does it end with 'quotes'?", a.endswith("quotes"))
print("Split into a list of words:", a.split())
The variable 'a' is an instance of the <class 'str'> class
Converted to upper case: SINGLE QUOTES
Does it end with 'quotes'? True
Split into a list of words: ['Single', 'quotes']

Reminder: to see a list of methods for "a", you can enter "a." and then hit the Tab key. Try it in the cell above, or make a new cell for testing.

A string is a sequence of characters; as with any Python sequence, we can use indexing to access parts of the sequence:

In [12]:
print(a)
print(a[:2])
print(a[-2:])
print(a[0])
print(a[-1])
print(a[::2])
Single quotes
Si
es
S
s
Snl uts

More jargon: Strings are immutable--that is, unchangeable--so you cannot alter an existing string object, but you can make a new string object out of parts of an old one. That is what indexing is doing in the example above.

Based on these examples, see if you can figure out how Python's indexing syntax works.

Now let's consider two other sequence objects.

Sequences: tuples and lists

Tuples and lists are very general sequences--that is, they are containers that preserve order, and they can contain any kind of object at all. There is one big practical difference: tuples are immutable, lists are mutable. One might wonder why we need tuples at all. In most cases, if you need a general sequence type in your program, you can use a list, regardless of whether you plan to change its contents. It turns out that in more advanced work there are situations where only a tuple will do, however---specifically, dictionary keys must be immutable, so a tuple can be used as a dictionary key but a list cannot. We will introduce dictionaries shortly, but first we need to become comfortable with tuples and lists.

To create a tuple from scratch, use round parentheses and commas; for lists, use square parentheses:

In [13]:
print("empty tuple:", tuple())
print("tuple with one element (the comma is required):", (1,))
print("tuple:", (1, 2))
print("tuple:", (3, (4, 5), 7, 8, "some string"))
print("list with one element (trailing comma is optional):", [1])
print("another list:", ["first element", 2, 3, "last element"])
empty tuple: ()
tuple with one element (the comma is required): (1,)
tuple: (1, 2)
tuple: (3, (4, 5), 7, 8, 'some string')
list with one element (trailing comma is optional): [1]
another list: ['first element', 2, 3, 'last element']

Lists have many methods, and support addition and multiplication:

In [14]:
a = ["list1", 1, 2]
b = ["list2", 3, 4]
print("adding is concatenation:", a + b)
print("multiplication is repetition:", a * 2)
c = a.extend(b) # assigning to c only to suppress printing
print("We extended a by tacking list b on the end:", a)
print("  (This is identical to the sum, a + b, above.")
c = a.append("added to end")
print("Then we added a string:", a)
adding is concatenation: ['list1', 1, 2, 'list2', 3, 4]
multiplication is repetition: ['list1', 1, 2, 'list1', 1, 2]
We extended a by tacking list b on the end: ['list1', 1, 2, 'list2', 3, 4]
  (This is identical to the sum, a + b, above.
Then we added a string: ['list1', 1, 2, 'list2', 3, 4, 'added to end']

Sequences: indexing

Lists, tuples, and strings all support the same indexing syntax.

  • Python indexing starts from zero.
  • A sequence of N elements therefore has indices ranging from 0 through N-1.
  • Negative indices count backwards from the end; they are handled by adding N to the negative index, so -1 is the last element, -2 the one before that, etc.
  • Basic indexing accesses a single element or a range (slice).
  • A slice includes the start of a range but excludes the end.
  • A slice has an optional step, for subsampling a range.

The zero-based indexing and the exclusion of the end of a range may seem odd if you are coming from Matlab, but they work very well in practice, often leading to simpler code.

In [15]:
# Using the built-in "range" function,
# make a list on which we can practice indexing:
x = list(range(0, 100, 10))
print(x)
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

Take the first 5 values, then the last 5 values:

In [16]:
print(x[:5])
print(x[-5:])
[0, 10, 20, 30, 40]
[50, 60, 70, 80, 90]

See how easy that was? Now take every second value starting from the second (index 1):

In [17]:
print(x[1::2])
[10, 30, 50, 70, 90]

So the order is start, stop, step, separated by colons. If a value is the default, it can be omitted. The default step is 1. When the step is positive, the default start is 0, and the default stop is one past the last. Remember, the slice includes the start, but not the stop.

When the step is negative, the indices are selected in descending order, so the default start is the highest index. The default stop, to include the zero index, is None. (One might guess it would be 0 minus 1, so -1, but that would not work, because the index -1 refers to the last element in the array.)

Let's illustrate the negative step first with a string:

In [18]:
print('abcd'[::-1])
print('abcd'[:None:-1])
print('abcd'[-1:None:-1])
print('abcd'[-2:0:-1])
dcba
dcba
dcba
cb

Back to the list, showing indexing backwards by two, then a case with only one element, and then with none:

In [19]:
print(x[::-2])
print(x[1:2])
print(x[1:1])
[90, 70, 50, 30, 10]
[10]
[]

In the above examples we are indexing a list with a slice, so we are getting a list back, even if it has only one element in it, or if it is empty. If we want to get an element from the list, then we index with a single integer:

In [20]:
print(x[0])
print(x[-1])
0
90

There is one thing that you might like to do, but cannot do directly with native Python types (though you can with Numpy arrays, as we will see later): that is to index with an arbitrary set of indices.

You can do it indirectly, however, using a powerful construct called a list comprehension, which is easier to understand and use than its name might suggest. Here are some examples:

In [21]:
xlist = list(range(0, 100, 10))
print(xlist)
# arbitrary set of indices we want to select:
ilist = [3, 7, 2, 1] 

xi = [xlist[i] for i in ilist]
print(xi)

# More list comprehension examples:
print([x for x in xlist if x < 30 or x > 70])
print([x ** 2 for x in xlist if x % 20 == 0])
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
[30, 70, 20, 10]
[0, 10, 20, 80, 90]
[0, 400, 1600, 3600, 6400]

Because lists are mutable, you can alter an element in a list:

In [22]:
xlist = list(range(20, 50, 10))
print(xlist)
xlist[-1] = - xlist[-1]  # flipped the sign
print(xlist)
[20, 30, 40]
[20, 30, -40]

Dictionaries

Items in tuples and lists can be found easily only based on their locations, that is, their indices. In a dictionary, items can be located efficiently based on their keys--and this is as true of a Python dict as it is of the Oxford English Dictionary. Dictionaries are central to the internal workings of Python, and to much code written in Python. They can be very small, very large, or anywhere in between. Keys are usually strings (text), but can be any immutable Python object, such as a tuple. Here are some examples of dictionary creation and indexing:

In [23]:
yesno = dict(yes=True, no=False)
print(yesno)
print(yesno['yes'])
print(yesno['no'])
{'no': False, 'yes': True}
True
False
In [24]:
# Use an alternative syntax for initializing a dictionary:
responses = {1 : True, 0 : False, 'yes' : True, 'no' : False}
print(responses)
print(responses[1], responses['yes'])
{0: False, 1: True, 'no': False, 'yes': True}
True True

The two examples above illustrate the two basic ways of creating a dictionary. The first calls a function with keyword arguments; it works only when the keys are strings. The second uses curly brackets, and works for any valid keys (immutable objects).

In [25]:
month_by_abbrev = dict(Jan="January", Feb="February", Mar="March",
                       Apr="April", May="May", Jun="June") # etc...
print(month_by_abbrev)
{'May': 'May', 'Feb': 'February', 'Apr': 'April', 'Jun': 'June', 'Mar': 'March', 'Jan': 'January'}

Entries can be retrieved, and new entries added, by indexing:

In [26]:
print(month_by_abbrev["Apr"])
month_by_abbrev["Jul"] = "July"
print(month_by_abbrev)
April
{'May': 'May', 'Jul': 'July', 'Feb': 'February', 'Apr': 'April', 'Jun': 'June', 'Mar': 'March', 'Jan': 'January'}

Dictionaries have their own methods, of course; refer to the Python docs for a full list. Here is just one example:

In [27]:
print("list of keys:", month_by_abbrev.keys())
list of keys: dict_keys(['May', 'Jul', 'Feb', 'Apr', 'Jun', 'Mar', 'Jan'])

Note: dictionaries do not preserve the order in which the key-value pairs were added. Usually this doesn't matter; but if it does, then you can use an OrderedDict.

Membership testing

Strings, tuples, lists, and dictionaries all support a simple syntax for testing whether a given item is present. In the case of dictionaries, it is sought in the keys, not the values:

In [28]:
# membership in strings: looks for substrings
print("any" in "many")
print("few" in "many")
True
False
In [29]:
# membership in a dictionary tests the presence of a key:
print("Apr" in month_by_abbrev)
True
In [30]:
# To check for a value in a dictionary, use the values() method:
print("April" in month_by_abbrev.values())
print("Mars" in month_by_abbrev.values())
True
False
In [ ]: