Introduction

During the last couple of decades, Matlab has been the most commonly-used scripting language in physical oceanography, and it has a large user base in many other fields. Recently, however, Python has been gaining ground, often being adopted by former Matlab users as well as by newcomers. Here is a little background to help you understand this shift, and why we advocate using Python from the start.

Matlab and Python

Matlab evolved from a simple scripting language designed to provide interactive access to a public-domain linear algebra library. It has since grown into a more sophisticated but still specialized language with an extensive set of routines written, organized, and documented by one of the great successes in the software world, Mathworks.

Python, in contrast, was designed by a computer scientist as a general-purpose scripting language for easy adoption and widespread use. People tried it and liked it, and the result is that it is widely used throughout the software world, for all sorts of tasks, large and small. There is a vast array of Python packages that are freely available to do all sorts of things—including the sorts of things that oceanographers and other scientists do; but these packages are not neatly bound up in a single product, and the documentation for the language itself and for the packages is similarly scattered and of varying quality.

Why use Python instead of Matlab?

  1. Python is fundamentally a better computer language in many ways.
    • It is suitable for a wider variety of tasks.
    • It scales better from the shortest of scripts to large software projects.
    • It facilitates writing clearer and more concise code.
    • With associated tools, it makes for easier access to existing high-performance codes in compiled languages, and for using smaller pieces of compiled code to speed up critical sections.
  2. Because Python is Free and Open Source Software (FOSS), you can install it on any machine without having to deal with a license manager.
  3. For the same reason, Python code that is part of a research project can be run by anyone, anywhere, to verify or extend the results.
  4. Most Python packages you are likely to want to use are developed in an open environment. The scientific Python ecosystem is dynamic and friendly.

What are the potential disadvantages?

  1. Installation of all the packages one needs can take time and expertise; but distributions like Anaconda, combined with other improvements in python packaging software and repositories, are rapidly solving this problem.
  2. Although progress is being made, the scientific Python stack is not as uniformly well-documented as Matlab; it might take longer to figure out how to do something in Python. You might also find that a routine available in Matlab is not yet available in a Python package.
  3. Matlab is still mainstream in oceanography–at least among many of the old guard; with Python, you are an early adopter. (If you have a spirit of adventure, this might be considered an advantage.)

What about R?

The R language (and its commercial counterpart, S) is specialized for statistical calculations and plots. It is widely used in the social sciences but it is less suitable than Python for general computing in oceanography. If one has a need for some of its statistical capabilities, they can be accessed from Python using the interface module, Rpy2.

Python version 2 versus version 3?

We are nearing the end of a slow and awkward transition between the well-established Python 2.x line of development, and version 3.x, on which all new language development is taking place. Although the differences are few, and often seem arcane, Python version 3.x includes some changes that are not backwards compatible with version 2.x code. It is possible to write code that runs without modification in either version, but it takes some care. There are still a few third-party libraries and applications that are not yet ready for Python 3, but the number is dwindling. I have a very small number of programs that depend on one such library, but apart from those, all of my public software has been modified to run with either Python 2.7 or Python 3.4+.

I recommend that new users stick with Python 3.5+, and that is what we will use for the course. I am gradually phasing out the version 2.7 information and compatibility in the tutorial material.

The Python scientific stack for oceanography

Essentials

In addition to the Python interpreter and standard library, we need the following:

numpy
N-dimensional arrays and functions for working with them. This is the heart of the scientific stack for Python.
matplotlib
A two-dimensional plotting library with a Matlab-like interface.
IPython
An interactive shell, and much more.
jupyter notebook
A web interface to IPython, with integrated plots and rich text, including equations. (It can also be used with languages like R and Julia.)
scipy
A large number of functions for statistics, linear algebra, integration, etc. It also includes functions for reading and writing Matlab’s binary matfiles.
Mercurial
A Distributed Revision Control System, discussed below.

Other packages of interest

pandas
Data structures inspired by R (Series, DataFrame, Panel), with associated functionality for fast and sophisticated indexing, subsetting, and calculations.
Sympy
Symbolic math.
statsmodels
A package for statistical analysis, going beyond what is available in scipy.
scikit-learn
Machine learning, data mining, “Data Science”.
Sphinx
A tool for documenting software. It is also useful for more general web page generation, such as the one you are reading now.
Cython
A language with augmented Python syntax for writing compiled Python extensions. I use it for writing interfaces to code written in C, and for writing extensions to speed up critical operations.
basemap
Matplotlib toolkit for making maps with any of a wide variety of projections. Matplotlib’s full plotting capabilities can then be used on the map.
netCDF4
Python interface to the NetCDF libraries, supporting the new version 4 in addition to the widespread version 3, and also supporting access to datasets via OPeNDAP.
pytide
A fast interface to the OSU tide models, for predicting or hindcasting the barotropic tides anywhere in the ocean.
UTide
A reimplementation of most of Dan Codiga’s Matlab tidal analysis package.
GSW-Python
The TEOS-10 equation of state algorithms.

That’s just a sampling, not an attempt at a comprehensive list.

Operating systems

A computer’s operating system (OS) is the software that stands between the application and the hardware. Because we interact with the application via the hardware (keyboard, screen, etc.), the OS also stands between us and the application. It is important to have an OS that facilitates our work rather than getting in the way.

The three operating systems you are likely to encounter are Windows, OS X (Apple), and Linux. All three can be, and are, used for scientific computing with Python; but in my experience, Linux makes things easy, OS X is a bit trickier, and Windows makes it difficult to work effectively with anything but pre-packaged software. If you must use Windows, consider doing your real work on a Linux virtual machine under either VirtualBox or VmWare. This can also be helpful if you are working on a Mac, at least until you have gained experience. VirtualBox is free; VmWare is a bit easier to install and use. (I sometimes work in a Linux virtual machine running under VmWare on a Mac.)

At present, I use and recommend the Xubuntu Linux distribution, although there are many reasonable alternatives.

Installation of software

Most of the Python software we need can be installed using the Anaconda distribution. Non-python programs may come from a variety of sources; on Linux, they are normally available via the distribution’s package manager, and on OS X via Homebrew. Python software that is not available from Anaconda can be installed from the net or from a local directory via the pip command. More information and specific instructions are given in the next section, Installing a Python working environment with UH software.

Repositories

One of the greatest recent advances in application software for programmers and scientists is the development of Distributed Version Control Systems (DVCS). As the name indicates, the purpose of the DVCS is to make it easy to track successive or alternative versions of a project, so that one can see what was changed when, and so that one can jump from one version to another. In practice, a good DVCS does more than that: it facilitates making remote backups, exchanging code, reviewing proposed changes, propagating updates, and distributing packages.

Several DVCSs were developed during a burst of activity a few years ago, but two have emerged as the clear winners: git and Mercurial.

Git was originally written by Linus Torvalds to serve the Linux project. It has since grown to be the dominant system for open source projects, including much of the scientific stack. Git is implemented via a combination of compiled C code and scripts.

Mercurial is a Python package with C extension code. It seems to be losing the competition with git on most fronts, but it has been adopted by Facebook for managing their enormous and rapidly changing internal codebase, and it is still under active development. Unfortunately, and ironically, it does not yet run on Python 3. It can be installed and used even if your primary Python version is 3, however; different Python versions can coexist on a single system.

I use both git and Mercurial, but I like starting with Mercurial; it is easier to learn, understand, and use for individual or small-group projects. Longer term, for working on collaborations hosted on Github, you will need to use git as well.

Editors

You will need a text editor–and maybe more than one. I would like to be able to tell you that there is a great programmer’s editor, free, available for all operating systems, easy to learn and use, but powerful, etc. Unfortunately, I can’t. I believe that the selection of free editors is remarkably poor; there is no commonly-available editor that behaves the way I think an editor should. Personally, I still often use an obscure one written decades ago by an Italian who then dropped out of sight. I’m probably the only remaining user in the world, and I won’t be able to maintain it indefinitely. Therefore I will not offer it to you, but instead I will comment on some options.

Vim or vi
This is the original Unix-based text editor, and it is always available on Linux and other Unix-like systems such as OS X. If you are going to do system administration, you need to know how to use it at a minimal level. It’s mind-bending, but many programmers master it and love it. Not me.
Emacs
This is the elephant of Unix-based text editors–big, slow to start, but powerful. It’s not quite as bizarre as vi, but it’s more complicated and it still has quite a learning curve. It’s widely used and loved, but again, not by me.
Nano
The mouse. Small, quick, but weak. It is now widely available on Linux and OS X, so it can save you from having to figure out how to do something simple with vi. That’s what I use it for.
Atom
A new entry, it uses web technologies and is available for Windows as well as Linux and OS X. It’s huge, and works only in a graphical environment, but subject to those limitations it looks promising. I use it on occasion.
BBEdit
OS X only, and only for a graphical environment. It works; I sometimes use it.

There are other OS-specific options, such as gedit on Linux, and there are for-pay options such as Sublime, which many people like.

In addition to standalone editors such as the above, there are editors built into environments such as the Jupyter (formerly “IPython”) notebook and the Spyder IDE. You will be using at least the first of these, and possibly the second, in this course.