Introduction

During the last couple of decades, Matlab has been the most commonly-used scripting language in physical oceanography, and it has a large user base in many other fields. Recently, however, Python has been gaining ground, often being adopted by former Matlab users as well as by newcomers. Here is a little background to help you understand this shift, and why we advocate using Python from the start.

Matlab and Python

Matlab evolved from a simple scripting language designed to provide interactive access to a public-domain linear algebra library. It has since grown into a more sophisticated but still specialized language with an extensive set of routines written, organized, and documented by one of the great successes in the software world, Mathworks.

Python, in contrast, was designed by a computer scientist as a general-purpose scripting language for easy adoption and widespread use. People tried it and liked it, and the result is that it is widely used throughout the software world, for all sorts of tasks, large and small. There is a vast array of Python packages that are freely available to do all sorts of things—including the sorts of things that oceanographers and other scientists do; but these packages are not neatly bound up in a single product, and the documentation for the language itself and for the packages is similarly scattered and of varying quality.

Why use Python instead of Matlab?

  1. Python is fundamentally a better computer language in many ways.

    • It is suitable for a wider variety of tasks.

    • It scales better from the shortest of scripts to large software projects.

    • It facilitates writing clearer and more concise code.

    • With associated tools, it makes for easier access to existing high-performance codes in compiled languages, and for using smaller pieces of compiled code to speed up critical sections.

  2. Because Python is Free and Open Source Software (FOSS), you can install it on any machine without having to deal with a license manager.

  3. For the same reason, Python code that is part of a research project can be run by anyone, anywhere, to verify or extend the results.

  4. Most Python packages you are likely to want to use are developed in an open environment. The scientific Python ecosystem is dynamic and friendly.

What are the potential disadvantages?

  1. Installation of all the packages one needs can take time and expertise; but distributions like Anaconda, combined with other improvements in python packaging software and repositories (especially Miniconda and conda-forge), are rapidly solving this problem.

  2. Although progress is being made, the scientific Python stack is not as uniformly well-documented as Matlab; it might take longer to figure out how to do something in Python. You might also find that a routine available in Matlab is not yet available in a Python package.

  3. Matlab is still common in oceanography–at least among many of the old guard; with Python, you are an early adopter. (If you have a spirit of adventure, this might be considered an advantage.)

What about R?

The R language (and its commercial counterpart, S) is specialized for statistical calculations and plots. It is widely used in the social sciences but it is less suitable than Python for general computing in oceanography. If one has a need for some of its statistical capabilities, they can be accessed from Python using the interface module, Rpy2.

Python versions

We have reached the end of a slow and awkward transition from the Python 2 line of development, which ended with 2.7, to version 3. Almost all python packages of potential interest to us have made the switch, and most current releases no longer support Python 2 at all. The main reason you should be aware of this history is that the web is still full of old Python 2 code examples, some of which will not run, or will run erroneously, on Python 3.

The Python scientific stack for oceanography

Essentials

In addition to the Python interpreter and standard library, we need the following:

numpy

N-dimensional arrays and functions for working with them. This is the heart of the scientific stack for Python.

matplotlib

A two-dimensional plotting library with a Matlab-like interface.

IPython

An interactive shell, and much more.

jupyter notebook

A web interface to IPython, with integrated plots and rich text, including equations. (It can also be used with languages like R and Julia.)

scipy

A large number of functions for statistics, linear algebra, integration, etc. It also includes functions for reading and writing Matlab’s binary matfiles.

Mercurial

A Distributed Revision Control System, discussed below.

Other packages of interest

In no particular order:

pandas

Data structures inspired by R (Series, DataFrame), with associated functionality for fast and sophisticated indexing, subsetting, and calculations.

Sympy

Symbolic math.

statsmodels

A package for statistical analysis, going beyond what is available in scipy.

scikit-learn

Machine learning, data mining, “Data Science”.

Sphinx

A tool for documenting software. It is also useful for more general web page generation, such as the one you are reading now.

Cython

A language with augmented Python syntax for writing compiled Python extensions. I use it for writing interfaces to code written in C, and for writing extensions to speed up critical operations.

basemap

Matplotlib toolkit for making maps with any of a wide variety of projections. Matplotlib’s full plotting capabilities can then be used on the map.

cartopy

A newer and more elegant package for mapping with Matplotlib.

netCDF4

Python interface to the NetCDF libraries, supporting the new version 4 in addition to the widespread version 3, and also supporting access to datasets via OPeNDAP.

xarray

A higher-level interface to netcdf files and to data structures based on the netcdf data model. It is especially useful for working with ocean and atmosphere numerical model output, and gridded climatologies and reanalysis products.

pytide

A fast interface to the OSU tide models, for predicting or hindcasting the barotropic tides anywhere in the ocean.

UTide

A reimplementation of most of Dan Codiga’s Matlab tidal analysis package.

GSW-Python

The TEOS-10 equation of state algorithms.

That’s just a sampling, not an attempt at a comprehensive list.

Operating systems

A computer’s operating system (OS) is the software that stands between the application and the hardware. Because we interact with the application via the hardware (keyboard, screen, etc.), the OS also stands between us and the application. It is important to have an OS that facilitates our work rather than getting in the way.

The three operating systems you are likely to encounter are Windows, OS X (Apple), and Linux. All three can be, and are, used for scientific computing with Python; but in my experience, Linux makes things easy, OS X is a bit trickier, and Windows makes it difficult to work effectively with anything but pre-packaged software. If you must use Windows, consider doing your real work on a Linux virtual machine under either VirtualBox or VmWare. This can also be helpful if you are working on a Mac, at least until you have gained experience. VirtualBox is free; VmWare is perhaps a bit easier to install and use. (I sometimes work in a Linux virtual machine running under VirtualBox on a Mac. I actually prefer VirtualBox over VmWare, which I don’t use anymore.)

I use and recommend the Xubuntu Linux distribution, although there are many reasonable alternatives. The XFCE desktop system used by Xubuntu is simple and stable, and runs well on low-power hardware and in virtual machines.

The shell and the command line

Most communication between you and the operating system goes through a command interpreter called a “shell”, often via commands typed into a program called a “terminal”. A terminal is just a window that can display text as you type it in for processing by whatever program is serving as the command interpreter, and can also display the text generated by that program. There are various programs (applications) that can provide a terminal in a typical graphical user environment, but for our purposes their differences are minor; you can choose to use any among those that are built-in or readily available in your particular system. The terminal programs you will use on OS X or Linux have configuration options (accessible via menus) for initial size, color scheme, font size and appearance, etc. It can be worth spending a little time to investigate these options and configuring them to your tastes.

There are two shells you should be aware of: the venerable “bash”, which has long been the standard on Linux systems, and which until recently was the default on OS X; and “zsh” or “z shell”, which is the new default on OS X, starting with Catalina. Zsh is also available on modern Linux distributions. With either operating system you can choose to override the default for your user account.

Like the terminal programs, the shells are configurable; but configuring them requires using a text editor to modify specific text files. I suggest using a few configuration options as a starting point. If you are coming back to this after having run conda init bash and/or conda init zsh, as described later, then be sure to put the suggested chunks of text before any text that is marked as managed by conda. If the file mentioned below does not yet exist, you can create it. If you have installed vscode as suggested below, you can uncomment the EDITOR line. (A hash mark turns everything to the right into a comment, so to uncomment the line, delete the hash mark.)

Configuration for bash

At the top of .bash_profile in your home directory, insert:

# include .bashrc if it exists
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

Then at the top of .bashrc in your home directory, insert:

# If not running interactively, don't do anything
[ -z "$PS1" ] && return

# don't put duplicate lines in the history. See bash(1) for more options
export HISTCONTROL=ignoredups

export PAGER=/usr/bin/less
# export EDITOR='code -nw'
export LESS="-R"

# check the window size after each command and, if necessary,
# update the values of LINES and COLUMNS.
shopt -s checkwinsize

# make an informative prompt, set off by its color
export PS1='\[\e[35m\]\w \\$\[\e[m\] '

# Alias definitions.
# enable color support of ls and also add handy aliases
if [ "$TERM" != "dumb" ]; then
    alias ls='ls -FG'
fi

# some more ls aliases
alias ll='ls -l'

Configuration for zsh

At the top of .zshrc in your home directory, insert:

export PROMPT='%F{magenta}%0~ $%f '
export PAGER=/usr/bin/less
# export EDITOR='code -nw'
export LESS="-R"
if [ "$TERM" != "dumb" ]; then
    alias ls='ls -FG'
fi
alias ll='ls -l'

Installation of software

Most of the Python software we need, and quite a bit of non-Python software, can be installed using the conda installation system, starting with either Anaconda or Miniconda (my preference), and taking advantage of the enormous conda-forge package system. Non-python programs may come from a variety of sources; on Linux, they are normally available via the distribution’s package manager, and on OS X via Homebrew. Python packages that are not available from conda-forge usually can be installed from the net or from a local directory via the pip command. More information and specific instructions are given in the next section, Installing a Python working environment with UH software.

Repositories

One of the greatest recent advances in application software for programmers and scientists is the development of Distributed Version Control Systems (DVCS). As the name indicates, the purpose of the DVCS is to make it easy to track successive or alternative versions of a project, so that one can see what was changed when, and so that one can jump from one version to another. In practice, a good DVCS does more than that: it facilitates making remote backups, exchanging code, reviewing proposed changes, propagating updates, and distributing packages.

Several DVCSs were developed during a burst of activity a few years ago, but two have emerged as the clear winners: git and Mercurial.

Git was originally written by Linus Torvalds to serve the Linux project. It has since grown to be the dominant system for open source projects, including much of the scientific stack. Git is implemented via a combination of compiled C code and scripts.

Mercurial is a Python package with C extension code. It seems to be losing the competition with git on most fronts, but it has been adopted by Facebook for managing their enormous and rapidly changing internal codebase, and it is still under active development. Unfortunately, and ironically, it was slow to develop Python 3 compatibility, but starting with version 5.2 it runs fine on Python 3.

I use both git and Mercurial, but I like starting with Mercurial; it is easier to learn, understand, and use for individual or small-group projects. Longer term, for working on collaborations hosted on Github, you will need to use git as well.

Editors

You will need a text editor–and probably more than one. I’ve listed some options below. Summary: I suggest using nano for quick work on small files, when you don’t have a graphical interface available, and vscode for everything else.

Vim or vi

This is the original Unix-based text editor, and it is always available on Linux and other Unix-like systems such as OS X. If you are going to do system administration, you need to know how to use it at a minimal level. It’s mind-bending, but many programmers master it and love it. Not me.

Emacs

This is the elephant of Unix-based text editors–big, slow to start, but powerful. It’s not quite as bizarre as vi, but it’s more complicated and it still has quite a learning curve. It’s widely used and loved, but again, not by me.

Nano

The mouse. Small, quick, but weak. It is now widely available on Linux and OS X, so it can save you from having to figure out how to do something simple with vi. That’s what I use it for.

Atom

A new entry, it uses web technologies and is available for Windows as well as Linux and OS X. It’s huge, and works only in a graphical environment. Open source, but developed by Github. I tried it, but switched to vscode.

Vscode

Recommended. Similar to Atom, but developed as an open source project by Microsoft. I’ve found it to be reliable and to have a wealth of capabilities, both built in and available as add-ons. I configure mine to turn off some features I find annoying.

If you install vscode on a Mac, see https://code.visualstudio.com/docs/setup/mac and follow the instructions to make it possible to run the editor from the command line.

There are other OS-specific options, such as gedit on Linux, and there are proprietary options such as BBEdit and Sublime, which many people like.

In addition to standalone editors such as the above, there are editors built into environments such as the Jupyter (formerly “IPython”) notebook and the Spyder IDE. You will be using at least the first of these, and possibly the second, in this course.

Note

Regardless of what editor you end up using for your python coding, it should be configured to do the following:

  • The tab key should generate 4 spaces, not a tab character.

  • Trailing whitespace should be deleted whenever a file is saved.