Getting Started

How do I learn a new computer language?

Different people have different styles of learning, different aptitudes, and different backgrounds, so there is no single simple answer to this question. In most cases, however, it involves a gradual process with several simultaneous subprocesses. Think of it as being incremental.

  • Tutorials and books provide organized introductions, and sometimes comprehensive treatments.
  • Modifying working code is a way to plunge in and see how a language works without having to understand all aspects.
  • After some introduction, one starts writing code from scratch.
  • As early as possible, learn how to find the information you need. Browse the official documentation to get an idea of what sorts of information are available, and how to answer specific questions.
  • Learn how to use the “help” facilities from your interactive environment.
  • Learn debugging techniques, from the humble “print” function to use of an actual debugger.
  • Learn how to understand the error messages provided by the language’s compiler or interpreter.
  • Ask questions, get assistance as needed. Don’t waste time stewing in frustration if you don’t know how to solve a problem.
  • Try to understand what is really going on, from the underlying structure of the language to the specific algorithms you are using.

The more thoroughly you know one computer language, the easier it is to learn another. At some time in your career–maybe soon–you will need to be able to at least read, and perhaps modify, code in some other language.

Preliminaries: the command line

For effective scientific computing you will need to become comfortable working on the command line in a terminal. Here are some tutorials:

The first two of these are directed towards Linux, the second two are OSX-specific, but you will see that most of the content is the same; Linux and OSX are patterned after two different dialects of UNIX, and both of them use the same shell, called “bash”, which is the program that interprets your command line inputs.

If you find better tutorials, or if these are not helpful, let me know.

Getting started with Python

There are many online tutorials, as you will see if you google “scientific python tutorials”. Try it. If you find one that seems particularly useful, let me know, and perhaps I will add a link. Based on a very quick perusal, I think the scipy lecture notes might be helpful, as they are much more comprehensive than what I have assembled for this course web page. For the Python language itself, take a look at the original Python tutorial. Parts might be hard to understand without a computer science background, but nevertheless it provides a nice overview. You can execute its examples by typing them in a plain ipython session, or in a jupyter notebook.

In addition to online materials, a recent book appears to be a good supplementary resource for this course: Effective Computation in Physics. Don’t be put off by the “Physics” in the title; the book’s contents are suitable for anyone taking this course.

We will use a combination of text on this web site with links to Jupyter notebooks. You should download each notebook and run it. Use it interactively to experiment and expand upon the examples.

The Jupyter Notebook

We will assume that you can run the Jupyter Notebook. Invoke it from the command line in a terminal window using:

jupyter notebook

Most of the notebook functionality is easily discoverable via the menus and toolbar; I will give only a quick and partial introduction here.

  • The notebook presents a series of cells, each of which can contain a heading, or markup text, or plain text, or python code.
  • Each type of cell can be edited and executed. You can edit a code cell by selecting it with a single click. A double click is needed to make any other type of cell editable when you go back to it.
  • Each type of cell can be executed by typing shift-Return in the cell. This is particularly useful for editing and re-running code.
  • Code output appears below the code cell when the cell is executed.
  • Code is executed in a single Python process, so once you have executed a cell, variables and functions created there are available for use in other cells. Changing a variable in a given cell and re-executing it does not trigger execution of other cells using that variable, however; the notebook is not a spreadsheet.
  • The Cell menu has additional commands for working with cells, including executing all of them, and clearing all output.
  • Use the Help menu! It provides direct access to the standard documentation for Python itself, for IPython, and for the main code libraries we use (Numpy, Matplotlib, and Scipy).

The Jupyter Notebook is particularly useful for learning Python, for exploratory calculations, and for keeping a record of what you have done and the conclusions you have reached; it is a tool for reproducible science. IPython without the notebook can and should be used in other ways; I often have one or more IPython sessions running while I am working in a Notebook. For large-scale code development one would normally use an editor, not a Notebook; here too, an IPython session in a terminal is used for testing the code as it is developed. Integrated development environments (IDE) such as Spyder are also useful.

Finding information

The Python scientific code ecosystem can be overwhelming, with documentation split among the various libraries. All of the main libraries we use, and Python itself, are documented using software called Sphinx. The resulting web pages can differ in style, but typically they include a sidebar with a search box. For example, try using the Jupyter Notebook help menu to bring up the Numpy documentation, and then use the search bar to look for ‘eigenvalue’. You will get many hits, but the first one will probably be what you need.

Note

Spend some time every now and then browsing the online Python and library documentation, to become familiar with how it is organized, and with where you can find various types of functionality. A little curiosity goes a long way!

Most Sphinx documentation includes a tutorial section and an API section, with the latter usually auto-generated from the docstrings in the code. The API sections usually have links to the actual code of each class, method, and function–take a look. The tutorial sections of documentation are never perfect, but they can be very helpful in providing an overview as well as examples of how to do things.

For plotting, look at the Matplotlib gallery to get an idea of the different styles and effects that are available, each with example code a click away.

Within IPython (jupyter notebook or ipython console), take advantage of tab completion. For example, if you type np.linalg. followed by the tab key, you will see a list of functions etc. that are part of the numpy linalg sub-package. If you type np.linalg.eig? you will see the internal documentation–the “docstring”–for the eig function. If you type np.linalg.eig?? you will see the source code for that function, including the docstring.

Some initial concepts and terminology

Uniquely in Python, indentation is part of the syntax. We will illustrate this later when we introduce control structures and functions.

Python bodies of code are organized and categorized as follows:

Script
a file of code designed to be executed directly from the command line. Sometimes a module within a package is a script; other times scripts are stand-alone files.
Module
a file of code, usually with definitions of functions and/or classes. A module is typically part of a package.
Package or sub-package
a directory containing one or more modules and zero or more sub-packages. Each package or sub-package must have one module with the special name __init__.py so that the Python interpreter can recognize it.

Python searches for packages based on a list of standard locations that is built in when Python is compiled, augmented by any additional locations found in the PYTHONPATH environment variable. The list of search locations can also be augmented within a Python module or IPython session (or jupyter notebook) by a method that we will illustrate later.

To use a package within a module or script, or in IPython, it must first be imported. Names defined within a package, and within modules in the package, may also be imported. When you invoke IPython with the --pylab option (or, better, just use the %pylab magic) it is automatically importing the numpy package under the abbreviation np, the matplotlib.pyplot sub-package under the abbreviation plt, along with all of the functions, subpackages, etc. contained within each of those packages. A minimal matplotlib example script looks like this:

import numpy as np
import matplotlib.pyplot as plt

t = np.linspace(0, 10, 100)
y = np.sin(t * np.pi / 5)
plt.plot(t, y)
plt.savefig("sine_wave.png")

Now we need another key concept with associated jargon:

Namespace
the set of names of objects, functions, and variables, that are recognized (or “visible”) at a given place in the program code.

When the Python interpreter starts, only Python’s builtin names are recognized; see sections 2, 4, 5, and 6 in the Python Library Documentation. In the example script above, we have added np and plt to the namespace, so they are recognized within the script. To access the linspace function in the numpy namespace, we therefore use np.linspace. Equivalently, we could have written the script as:

from numpy import linspace, sin
from matplotlib.pyplot import plot, savefig

t = linspace(0, 10, 100)
y = sin(t * np.pi / 5)
plot(t, y)
savefig("sine_wave.png")

In this case we imported linspace directly from the numpy namespace into the script’s namespace, etc.

Why all the fuss about namespaces? Why doesn’t Python simply provide instant access to everything accessible via its search path, the way Matlab does? The problem is that the Matlab approach does not scale well; the more packages or toolboxes there are, the greater the likelihood of a name collision. Once a name is used anywhere, it cannot be used somewhere else without clobbering the earlier use. With the Python approach, you have much greater scope in naming your variables and functions without worrying about duplication, because names get into a given namespace only when they are imported explicitly; they can be imported as abbreviations or aliases; and the dot syntax can be used to identify exactly where a given name originates.

Time for a little more jargon before we plunge in:

Object
Everything in Python is an object; an object is an organized, well-defined collection of information together with operations that can work with that information.
Class or type
A class is the definition used to make particular objects, and an
Instance
is one such particular object of a given class or type.
Method
A function defined within a class, designed to work with the information (including other methods) in an instance of the class.

Don’t worry if these (approximate) definitions don’t make much sense yet; they will become clear soon enough as you see examples.