Coordinate subscripting in Python

One of the things that I love about NCL is the ability to subscript array variables by either index or (nearest) coordinate:

; Subset by index
subset = data(min_lat_ix:max_lat_ix, min_lon_ix:max_lon_ix)

; Subset by coordinate
subset = data({min_lat:max_lat}, {min_lon:max_lon})

Sadly, this is not so straightforward in Python. In this post, I'll cover two ways in which you can coordinate subscript your data. Both approaches assume that you are working with NetCDF files that have some basic metadata described.

1. Needle/Haystack Method

The first approach, which almost always works for me is the "Needle/Haystack" method. What this method does is look for a value (needle), in a list of values (haystack) and returns the index that minimises the difference between the needle and the values in the haystack. What this means is that if the needle exists in the haystack, you simply get the index of its position, when it doesn't, you get the index of the value that is nearest to it.

This is all achieved with the numpy.argmin function, but I usually like to write a wrapper that explains what I'm actually doing called get_nearest_index or similar.

To see this approach in action, see the following code:

# Load the required modules
from netCDF4 import Dataset
import numpy as np

# Define a function that gets the index for us
def get_nearest_index(needle, haystack):
    """Find the index of the nearest value to needle in haystack.

    Args:
        needle (float) : The coordinate to find
        haystack (iterable) : The values in which to search

    Returns:
        int : The index of the value in haystack closest to needle
    """
    # This gets the index of the minimum absolute difference
    return np.argmin(abs(haystack - needle))

# Open the file and extract the latitudes
nc = Dataset('myfile.nc', 'r')
lats = nc.variables['latitudes'][:]

# Let's subset the Southern Ocean
min_lat, max_lat = -90.0, -50.0

# Get the indices of the nearest values to our latitudes
min_lat_ix = get_nearest_index(min_lat, lats)
max_lat_ix = get_nearest_index(max_lat, lats)

# Subset the data, note we add 1 to the max lat to make it inclusive
southern_ocean = data.variables['mydata'][min_lat_ix:max_lat + 1, :]

This approach is reasonably versatile and allows us to effectively subscript on coordinates in the data set. It can be used for almost any coordinate (time, lev, lat, lon, etc.) provided numpy can perform an argmin on the coordinate data.

There are more sophisticated methods, but this approach is well worth knowing when the "smarter" approaches fail.

2. Iris Constraints method

A more sophisticated way of doing coordinate subscripting is using the Iris Python Package. This package does a whole range of amazing things with Earth System data, but my favourite feature is the iris.Constraint.

An Iris Constraint is a construct which describes how to subset data based on coordinate information. The benefit of this is that Iris will lazily load the data (i.e. not bring everything into memory), so you can get some performance benefits in using it. It makes coordinate subscripting much simpler, however, the caveat is that the data must be well-described with appropriate coordinate metadata or it will simply fail.

# Import the iris module
import iris

# Define out boundaries
min_lat, max_lat = -90.0, -50.0

# Create a constraint based on our boundaries
# This tells Iris to "get me cells that fall between min_lat and max_lat inclusively"
latitude_constraint = iris.Constraint(
    latitude=lambda cell: min_lat <= cell <= max_lat
)

# Load just the southern ocean in a single line
southern_ocean = iris.load_cube('myfile.nc', 'mydata' & latitude_constraint)

The Constraint approach is elegant but very strict; if your metadata is ill-defined or your coordinate data has other issues (i.e. it is non-monotonic) then you can quickly waste a lot of time trying to figure out why. Hence, why I've also covered the Needle/Haystack method to save you some time.

So there you have it, two ways of coordinate subscripting in Python. I haven't applied any strict error handling here but hopefully you've gained something from this short post.