Creating and saving objects with units

Attaching units

Usually, when loading data from disk we get a Dataset or DataArray with units in attributes:

In [1]: ds = xr.Dataset(
   ...:     {
   ...:         "a": (("lon", "lat"), [[11.84, 3.12, 9.7], [7.8, 9.3, 14.72]]),
   ...:         "b": (("lon", "lat"), [[13, 2, 7], [5, 4, 9]], {"units": "m"}),
   ...:     },
   ...:     coords={"lat": [10, 20, 30], "lon": [74, 76]},
   ...: )
   ...: ds
   ...: 
Out[1]: 
<xarray.Dataset>
Dimensions:  (lon: 2, lat: 3)
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76
Data variables:
    a        (lon, lat) float64 11.84 3.12 9.7 7.8 9.3 14.72
    b        (lon, lat) int64 13 2 7 5 4 9

In [2]: da = ds.b
   ...: da
   ...: 
Out[2]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)>
array([[13,  2,  7],
       [ 5,  4,  9]])
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76
Attributes:
    units:    m

In order to get pint.Quantity instances, we can use the Dataset.pint.quantify() or DataArray.pint.quantify() methods:

In [3]: ds.pint.quantify()
Out[3]: 
<xarray.Dataset>
Dimensions:  (lon: 2, lat: 3)
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76
Data variables:
    a        (lon, lat) float64 11.84 3.12 9.7 7.8 9.3 14.72
    b        (lon, lat) int64 [m] 13 2 7 5 4 9

We can also override the units of a variable:

In [4]: ds.pint.quantify(b="km")
Out[4]: 
<xarray.Dataset>
Dimensions:  (lon: 2, lat: 3)
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76
Data variables:
    a        (lon, lat) float64 11.84 3.12 9.7 7.8 9.3 14.72
    b        (lon, lat) int64 [km] 13 2 7 5 4 9

In [5]: da.pint.quantify("degree")
Out[5]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)>
<Quantity([[13  2  7]
 [ 5  4  9]], 'degree')>
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76

Overriding works even if there is no units attribute, so we could use this to attach units to a normal Dataset:

In [6]: temporary_ds = xr.Dataset({"a": ("x", [0, 5, 10])}, coords={"x": [1, 2, 3]})
   ...: temporary_ds.pint.quantify({"a": "m"})
   ...: 
Out[6]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 1 2 3
Data variables:
    a        (x) int64 [m] 0 5 10

Of course, we could use pint.Unit instances instead of strings to specify units, too.

Note

Unit objects tied to different registries cannot interact with each other. In order to avoid this, DataArray.pint.quantify() and Dataset.pint.quantify() will make sure only a single registry is used per xarray object.

If we wanted to change the units of the data of a DataArray, we could do so using the DataArray.name attribute:

In [7]: da.pint.quantify({da.name: "J", "lat": "degree", "lon": "degree"})
Out[7]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)>
<Quantity([[13  2  7]
 [ 5  4  9]], 'joule')>
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76

However, xarray currently doesn’t support units in indexes, so the new units were set as attributes. To really observe the changes the quantify methods make, we have to first swap the dimensions:

In [8]: ds_with_units = ds.swap_dims({"lon": "x", "lat": "y"}).pint.quantify(
   ...:     {"lat": "degree", "lon": "degree"}
   ...: )
   ...: ds_with_units
   ...: 
Out[8]: 
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
    lat      (y) int64 [deg] 10 20 30
    lon      (x) int64 [deg] 74 76
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) float64 11.84 3.12 9.7 7.8 9.3 14.72
    b        (x, y) int64 [m] 13 2 7 5 4 9

In [9]: da_with_units = da.swap_dims({"lon": "x", "lat": "y"}).pint.quantify(
   ...:     {"lat": "degree", "lon": "degree"}
   ...: )
   ...: da_with_units
   ...: 
Out[9]: 
<xarray.DataArray 'b' (x: 2, y: 3)>
<Quantity([[13  2  7]
 [ 5  4  9]], 'meter')>
Coordinates:
    lat      (y) int64 [deg] 10 20 30
    lon      (x) int64 [deg] 74 76
Dimensions without coordinates: x, y

By default, Dataset.pint.quantify() and DataArray.pint.quantify() will use the unit registry at pint_xarray.unit_registry (the application registry). If we want a different registry, we can either pass it as the unit_registry parameter:

In [10]: ureg = pint.UnitRegistry(force_ndarray_like=True)

In [11]: da.pint.quantify("degree", unit_registry=ureg)
Out[11]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)>
<Quantity([[13  2  7]
 [ 5  4  9]], 'degree')>
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76

or overwrite the default registry:

In [12]: pint_xarray.unit_registry = ureg

In [13]: da.pint.quantify("degree")
Out[13]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)>
<Quantity([[13  2  7]
 [ 5  4  9]], 'degree')>
Coordinates:
  * lat      (lat) int64 10 20 30
  * lon      (lon) int64 74 76

Note

To properly work with xarray, the force_ndarray_like or force_ndarray options have to be enabled on the custom registry.

Without it, python scalars wrapped by pint.Quantity may raise errors or have their units stripped.

Saving with units

In order to not lose the units when saving to disk, we first have to call the Dataset.pint.dequantify() and DataArray.pint.dequantify() methods:

In [14]: ds_with_units.pint.dequantify()
Out[14]: 
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
    lat      (y) int64 10 20 30
    lon      (x) int64 74 76
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) float64 11.84 3.12 9.7 7.8 9.3 14.72
    b        (x, y) int64 13 2 7 5 4 9

In [15]: da_with_units.pint.dequantify()
Out[15]: 
<xarray.DataArray 'b' (x: 2, y: 3)>
array([[13,  2,  7],
       [ 5,  4,  9]])
Coordinates:
    lat      (y) int64 10 20 30
    lon      (x) int64 74 76
Dimensions without coordinates: x, y
Attributes:
    units:    meter

This will get the string representation of a pint.Unit instance and attach it as a units attribute. The data of the variable will now be whatever pint wrapped.