Handling dates and times¶

Dates and times are messy; a given point in time can be represented in many different ways. Calculating the interval between two dates and times requires taking into account leap years, and if one of the dates is far in the past, it requires knowing which calendar was used to to specify the date. On shorter time scales we can be tripped up by time zones and daylight savings time. And on very short time scales, if we need high accuracy, we need to take into account leap seconds. If you need to be very careful about time intervals--you can't afford to ignore leap seconds--then you should be using something like Astropy, which has support for International Atomic Time (TAI).

In our scientific work we usually use UTC, so in the following we will mostly ignore time zones.

That still leaves us needing to deal with different data formats, string representations, and library routines. We will lightly review some of these here.

ISO 8601 is a standard for writing dates and times.

Netcdf files often use the CF convention for metadata needed to interpret a time variable in the file.

The python standard library includes the time module with functions from the C standard library, and the datetime module with date, time, datetime, and timedelta classes.

A few years ago, numpy added 2 new dtypes for working with dates and times: datetime64 and timedelta64. These are used internally by pandas and xarray, but with nanosecond precision, so they cannot be used for years before 1678.

For plotting with dates and times on one or both axes, matplotlib internally uses floating point days (sometimes called a "datenum", following Matlab) since an origin (known as an "epoch"), but recognizes both standard library datetime objects (which may be aggregated in arrays with the "object" dtype) and datetime64 arrays. Related functions and classes are in matplotlib's dates module.

We will also look at some relevant utility functions from pycurrents.

Our emphasis will be on time as floating point days or other units since an epoch, and on datetime64; they are more efficient and mostly easier to use than the python standard datetime library classes.

In [1]:
%matplotlib inline

import datetime
import time

import numpy as np

import matplotlib.dates as mdates
import matplotlib.pyplot as plt

plt.rcParams['figure.dpi'] = 80

Basics, and conversions¶

For initial purposes of illustration, let's use a short sequence of times at 6-hour intervals starting at the beginning of 2020. First, suppose it is time in days read from the 'time' variable in a netcdf file. To simulate it:

In [2]:
t_days_2020 = np.arange(0, 2.01, 0.25)
print(t_days_2020)
[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]

Float to datetime.datetime¶

To convert this to a sequence of datetime.datetime objects we need to convert to timedelta objects and add the epoch. Notice that because the datetime classes are not vectorized, we need to use a loop (here encoded in the list comprehension) and then convert the list to an object array.

In [3]:
epoch_dt_2020 = datetime.datetime(2020, 1, 1)
dt_2020 = np.array([epoch_dt_2020 + datetime.timedelta(days=d) 
                    for d in t_days_2020], dtype=object)
print(dt_2020)
[datetime.datetime(2020, 1, 1, 0, 0) datetime.datetime(2020, 1, 1, 6, 0)
 datetime.datetime(2020, 1, 1, 12, 0) datetime.datetime(2020, 1, 1, 18, 0)
 datetime.datetime(2020, 1, 2, 0, 0) datetime.datetime(2020, 1, 2, 6, 0)
 datetime.datetime(2020, 1, 2, 12, 0) datetime.datetime(2020, 1, 2, 18, 0)
 datetime.datetime(2020, 1, 3, 0, 0)]

Float to datetime64¶

We do something similar for datetime64, but it's all vectorized. Notice the syntax for making the scalar epoch, including using an ISO-8601 string representation of the date as the argument.

In [4]:
epoch_dt64_2020 = np.datetime64("2020-01-01")
dt64_2020 = epoch_dt64_2020 + np.round(24 * t_days_2020).astype('timedelta64[h]')
print(dt64_2020)
print(dt64_2020.dtype)
['2020-01-01T00' '2020-01-01T06' '2020-01-01T12' '2020-01-01T18'
 '2020-01-02T00' '2020-01-02T06' '2020-01-02T12' '2020-01-02T18'
 '2020-01-03T00']
datetime64[h]

The second line above needs some explanation. The critical points are

  • datetime64 and timedelta64 are 64-bit integers, and
  • they have units attached to them.

We had to multiply the time in days by 24 to get time in hours. Since we happen to know that the times fall on the hour, we chose hours as the largest unit giving us enough precision. We could have used any smaller unit.

It is safest to do arithmetic (addition and subtraction) exclusively with datetime64 and timedelta64. They don't have to have the same units; if they differ, the result will have the smallest of the units. Numpy will take care of doing the necessary conversions to that smallest common unit before adding or subtracting.

It is also possible to add plain integers to a datetime64, in which case it will be interpreted as timedelta64 with the same units as the datetime64 to which it is being added. Notice that in the example below we do this by including a second argument with the units when we make the epoch.

In [5]:
epoch_dt64_2020_h = np.datetime64("2020-01-01", 'h')
# The following works, but is not recommended.
dt64_2020_h = epoch_dt64_2020_h + np.round(24 * t_days_2020).astype(int)
print(dt64_2020_h)
print(dt64_2020_h.dtype)
['2020-01-01T00' '2020-01-01T06' '2020-01-01T12' '2020-01-01T18'
 '2020-01-02T00' '2020-01-02T06' '2020-01-02T12' '2020-01-02T18'
 '2020-01-03T00']
datetime64[h]

I prefer the first of these methods, in which the use of datetime64 and timedelta64 is consistent and explicit.

More datetime64¶

Note that datetime64 arrays display as string representations, and can be created from strings, but they are actually stored and manipulated internally as 64-bit integers. For the epoch we made a scalar datetime64; we can also make an array from a list of string representations. Example:

In [6]:
times = np.array(["2019-01-01", "2019-07-01", "2020-01-01"], dtype='datetime64[D]')
print(times)
['2019-01-01' '2019-07-01' '2020-01-01']

We can use numpy array methods:

In [7]:
dt = np.diff(times)
print(dt)
print(dt.dtype)
[181 184]
timedelta64[D]

And we can use np.arange to make a uniformly-spaced array:

In [8]:
t0 = np.datetime64("2020-03-01")
t1 = np.datetime64("2020-03-03")
dt = np.timedelta64(6, 'h')
six_hourly = np.arange(t0, t1, dt)
print(six_hourly)
print(six_hourly.dtype)
['2020-03-01T00' '2020-03-01T06' '2020-03-01T12' '2020-03-01T18'
 '2020-03-02T00' '2020-03-02T06' '2020-03-02T12' '2020-03-02T18']
datetime64[h]

Datetime64 to float¶

Although calculations can be done with datetime64, there are some limitations of working with integers, so usually one needs time as a floating point array in some units since an epoch. Here is an example, going from the dt64_2020 array we calculated earlier to an array matching our original t_days_2020:

In [9]:
t_days_2020_r = (dt64_2020 - epoch_dt64_2020).astype(np.float64) / 24
print(t_days_2020_r)
print(t_days_2020_r.dtype)
print((t_days_2020_r == t_days_2020).all())
[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
float64
True

The subtraction in the line above gave us a timedelta64[h] array, and it was critical that we converted that to floating point before dividing by 24 to get back to days.

Plotting with a date-time axis¶

Matplotlib has tick locating and labeling code that is customized for dates and times. It is invoked automatically if a coordinate array is supplied as either an object array of datetime.datetime instances or an ndarray with a datetime64 dtype. Let's start with a simple example. We will make a floating point time array in days since the beginning of the year, with arbitrary but close spacing, and then generate a sine wave with one cycle per day so we have something to plot. For plotting, we will make a datetime64 array. Precision at the 1-s level will be adequate.

In [10]:
t = np.linspace(20, 23, 100)
y = np.sin(2 * np.pi * t)
t_dt64 = np.datetime64("2020-01-01") + np.round(t * 86400).astype("timedelta64[s]")
print(t_dt64[:3])
['2020-01-21T00:00:00' '2020-01-21T00:43:38' '2020-01-21T01:27:16']

Date-time tick labels tend to be long and potentially overlapping, so we use fig.autofmt_xdate() to add extra space to the bottom and then tilt the labels.

In [11]:
fig, ax = plt.subplots()
ax.plot(t_dt64, y)
fig.autofmt_xdate()

Currently, autofmt_xdate doesn't work with the new "constrained layout" option, but we can make a function to get the same effect.

In [12]:
def tilt_xlabels(ax, rotation=30):
    for label in ax.get_xticklabels(which='major'):
        label.set_ha('right')
        label.set_rotation(rotation)
        
fig, ax = plt.subplots(constrained_layout=True)
ax.plot(t_dt64, y)
tilt_xlabels(ax)

String representations¶

For labeling plots or generating other text output one needs a string representation of a date and time. When working with datetime64, an ISO standard representation can be made very easily:

In [13]:
print(str(six_hourly[0]))
2020-03-01T00

The format depends on the units, so one has a little bit of control:

In [14]:
print(str(six_hourly[0].astype("datetime64[s]")))
2020-03-01T00:00:00

There is a numpy function for this (https://numpy.org/doc/stable/reference/generated/numpy.datetime_as_string.html). Note that it can operate on a datetime64 array, not just a scalar.

In [15]:
print(np.datetime_as_string(six_hourly, unit="s"))
['2020-03-01T00:00:00' '2020-03-01T06:00:00' '2020-03-01T12:00:00'
 '2020-03-01T18:00:00' '2020-03-02T00:00:00' '2020-03-02T06:00:00'
 '2020-03-02T12:00:00' '2020-03-02T18:00:00']

That "T" in the middle hurts readability for things like plot labels. You can remove it with ordinary Python string operations:

In [16]:
print([s.replace("T", " ") for s in np.datetime_as_string(six_hourly, unit="s")])
['2020-03-01 00:00:00', '2020-03-01 06:00:00', '2020-03-01 12:00:00', '2020-03-01 18:00:00', '2020-03-02 00:00:00', '2020-03-02 06:00:00', '2020-03-02 12:00:00', '2020-03-02 18:00:00']

For more control of the format when parsing strings or when converting date-time specifications into strings you can drop back to the Python standard library time or datetime modules. They use different representations of time, but both have an strftime function for conversion to a string, and strptime for parsing a string.

Pycurrents utility functions¶

The pycurrents library includes a few functions that originally long pre-dated numpy's datetime64; they began as part of our C library for working with shipboard ADCP data. Despite their ancient origins, they can still be helpful if you have pycurrents installed.

The original functions work with two primary ways of specifying date and time:

  • as time in days ("decimal days") since an epoch that is the start of a specified year (the "yearbase").
  • as a sequence of 6 16-bit integers for year, month, day, hour, minute, and second ("YMDHMS")
In [17]:
from pycurrents.num import rangeslice 
from pycurrents.data.timetools import (to_day, 
                                       to_date, 
                                       day_to_dt64, 
                                       dt64_to_day,
                                       dt64_to_ymdhms)

to_day and to_date provide vectorized conversion between these two specifications:

In [18]:
# Make an example array of 3 YMDHMS times 6 hours apart.
# Start with February 12, 2022.
yearbase = 2022
ymdhms = np.zeros((3, 6), np.int16)
ymdhms[:, :3] = [2022, 2, 12]
ymdhms[:, 3] = [0, 6, 12]
print(ymdhms)
ddays = to_day(yearbase, ymdhms)
print(f"Decimal days with yearbase={yearbase} are {ddays}")
ymdhms_back = to_date(yearbase, ddays)
print(ymdhms_back)
[[2022    2   12    0    0    0]
 [2022    2   12    6    0    0]
 [2022    2   12   12    0    0]]
Decimal days with yearbase=2022 are [42.   42.25 42.5 ]
[[2022    2   12    0    0    0]
 [2022    2   12    6    0    0]
 [2022    2   12   12    0    0]]

There is a function for converting decimal day to datetime64:

In [19]:
dt64_triplet = day_to_dt64(yearbase, ddays)
print(dt64_triplet)
['2022-02-12T00:00:00.000' '2022-02-12T06:00:00.000'
 '2022-02-12T12:00:00.000']

...and a function for converting datetime64 to decimal days:

In [20]:
# Note: the following is new as of 2022-04-17, so be sure your pycurrents
# is up to date.
print(dt64_to_day(yearbase, dt64_triplet))
[42.   42.25 42.5 ]

...and a function for converting datetime64 to YMDHMS:

In [21]:
print(dt64_to_ymdhms(dt64_triplet))
[[2022    2   12    0    0    0]
 [2022    2   12    6    0    0]
 [2022    2   12   12    0    0]]

In addition, the pycurrents rangeslice function has been updated to work with datetime64, generating a slice object from a datetime64 array and datetime64 "start" and "stop" arguments:

In [22]:
sl = rangeslice(six_hourly, np.datetime64("2020-03-01T17"), 
                np.datetime64("2020-03-02T15"))
print(sl)
print(six_hourly[sl])
slice(3, 7, None)
['2020-03-01T18' '2020-03-02T00' '2020-03-02T06' '2020-03-02T12']

If you need to work with files written by Matlab ("matfiles"), then you probably need to know how to work with Matlab datenums. A function for helping with this is available in pycurrents, but it is so simple that I suggest you copy or modify the version below instead of importing it.
Or, you might want to use the alternative that goes straight from datenum to a datetime64 without requiring that the datetime module be imported.

In [23]:
def datenum_to_day(dnum, yearbase):
    "Convert MATLAB datenum(s) to decimal day relative to a yearbase."
    return np.asarray(dnum) - (366 + datetime.date(yearbase, 1, 1).toordinal())

def datenum_to_datetime64(dnum):
    days = np.asarray(dnum) - 719529  # shift to unix epoch (1970-01-01)
    return np.round((days * 86400000)).astype("datetime64[ms]")

# Test with octave, a Matlab clone.
# octave:5> datenum(2022, 1, 2, 3, 4, 5)
# ans = 738523.1278356481
# octave:6> datenum(2022, 5, 4, 3, 2, 1)
# ans = 738645.1264004629

dnums = [738523.1278356481, 738645.1264004629]

print(datenum_to_day(dnums, 2022))
print(datenum_to_datetime64(dnums))
[  1.12783565 123.12640046]
['2022-01-02T03:04:05.000' '2022-05-04T03:02:01.000']
In [ ]: