Automatically Derived Arrays#

Pynbody includes a system which automatically derives one array from others. There are two goals for this system:

Help make user code more independent of the underlying file format, by providing missing arrays. For example, some simulation formats store temperature while others store internal energy. By always presenting a temp array to you, and handling the conversion automatically, the need to explicitly distinguish between different scenarios is removed.
Keep arrays up-to-date when they depend on other arrays. For example, if you are interested in the radius of particles from the origin, a derived r array is offered which is automatically recalculated if the position of particles changes.

Example: the radius array#

The quantities listed under derived are calculated for all simulation types. If you want, for example, to calculate the radius of particles away from the origin, you can just access the r array and pynbody will calculate it:

In [1]: import pynbody

In [2]: s = pynbody.load('testdata/gasoline_ahf/g15784.lr.01024.gz')

In [3]: s['r']
Out[3]: 
SimArray([0.17058903, 0.16980713, 0.17045854, ..., 0.12941349, 0.12942831,
          0.12946079], '6.85e+04 kpc a')

Here, r has been calculated in the same units as the position array, which in turn are in the units of the simulation. You can call physical_units() as normal to convert to more recognizable units. Note that you cannot directly alter a derived array:

In [4]: s['r'][0] = 0
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 s['r'][0] = 0

File ~/checkouts/readthedocs.org/user_builds/pynbody/envs/latest/lib/python3.11/site-packages/pynbody/array/__init__.py:809, in _dirty_fn.<locals>.q(a, *y, **kw)
    807     return w(a, *y, **kw)
    808 else:
--> 809     return w(a, *y)

File ~/checkouts/readthedocs.org/user_builds/pynbody/envs/latest/lib/python3.11/site-packages/pynbody/array/__init__.py:566, in SimArray.__setitem__(self, item, to)
    564     np.ndarray.__setitem__(self, item, to.in_units(self.units))
    565 else:
--> 566     np.ndarray.__setitem__(self, item, to)

ValueError: assignment destination is read-only

But the array updates itself when one of its dependencies changes:

In [5]: s['x']-=1

In [6]: s['r']
Out[6]: 
SimArray([1.00383365, 1.00300699, 1.00292416, ..., 0.98382042, 0.98378768,
          0.98368316], '6.85e+04 kpc a')

The recalculation only takes place when required – in the example above the r array is stored (just like a normal array) until the pos array is updated, at which point it is deleted. When you ask for the r array again, it is recalculated.

This is why you’re not allowed to change values in a derived array – your changes could be overwritten, leading to confusing bugs. However, if you want to make a derived array back into a normal array you can do so.

In [7]: s['r'].derived = False

In [8]: s['r'][0] = 0

In [9]: s['r']
Out[9]: 
SimArray([0.        , 1.00300699, 1.00292416, ..., 0.98382042, 0.98378768,
          0.98368316], '6.85e+04 kpc a')

At this point you’ve taken full responsibility for the array, so if you update its dependencies the framework won’t help you out any longer:

In [10]: s['x']+=2

In [11]: s['r'] # NOT updated because we took control by setting derived = False
Out[11]: 
SimArray([0.        , 1.00300699, 1.00292416, ..., 0.98382042, 0.98378768,
          0.98368316], '6.85e+04 kpc a')

To get the framework back on your side, you can delete the modified array:

In [12]: del s['r']

In [13]: s['r']
Out[13]: 
SimArray([1.02494841, 1.02549787, 1.02579498, ..., 1.03227568, 1.0323106 ,
          1.03241834], '6.85e+04 kpc a')

Here we’ve deleted then re-derived the r array, so it’s now accurate (and will start auto-updating again).

Derived functions for specific formats#

Some derived arrays are specific to certain simulation formats. For example, ramses simulations need to derive masses for their gas cells and as such mass() is registered as a derived array specifically for the RamsesSnap class.

Defining your own deriving functions#

You can easily define your own derived arrays. The easiest way to do this is using the decorator pynbody.snapshot.simsnap.SimSnap.derived_array(). This is handily aliased to pynbody.derived_array.

Here’s an example of a derived array that calculates the specific kinetic energy of a particle:

In [14]: @pynbody.derived_array
   ....: def specific_ke(sim):
   ....:     return 0.5 * (sim['vel']**2).sum(axis=1)
   ....: 

In [15]: s['specific_ke']
Out[15]: 
SimArray([0.01443704, 0.01435218, 0.01545113, ..., 0.02690037, 0.02084463,
          0.02266956], '2.98e+06 km**2 a**2 s**-2')

When your function is called, the framework monitors any arrays it retrieves from the simulation. It automatically marks the accessed arrays as dependencies for your function. So, if the velocities now get changed, your derived array will be recalculated:

In [16]: s['vel']*=2

In [17]: s['specific_ke']
Out[17]: 
SimArray([0.05774817, 0.05740871, 0.0618045 , ..., 0.10760149, 0.08337851,
          0.09067823], '2.98e+06 km**2 a**2 s**-2')

To create a derived array associated with a specific subclass, use the derived_array() method of that subclass, e.g.

In [18]: @pynbody.snapshot.tipsy.TipsySnap.derived_array
   ....: def half_mass(sim):
   ....:     return 0.5 * sim['mass']
   ....: 

In [19]: s['half_mass'] # this is a TipsySnap, so this will work
Out[19]: 
SimArray([1.59655292e-11, 1.59655292e-11, 1.59655292e-11, ...,
          6.20879997e-12, 6.20879997e-12, 6.20879997e-12], '4.75e+16 Msol')

In [20]: another_snap = pynbody.new(dm=100) # NOT a TipsySnap!

In [21]: another_snap['half_mass']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[21], line 1
----> 1 another_snap['half_mass']

File ~/checkouts/readthedocs.org/user_builds/pynbody/envs/latest/lib/python3.11/site-packages/pynbody/snapshot/simsnap.py:233, in SimSnap.__getitem__(self, i)
    230 from . import subsnap
    232 if isinstance(i, str):
--> 233     return self._get_array_with_lazy_actions(i)
    234 elif isinstance(i, slice):
    235     return subsnap.SubSnap(self, i)

File ~/checkouts/readthedocs.org/user_builds/pynbody/envs/latest/lib/python3.11/site-packages/pynbody/snapshot/simsnap.py:351, in SimSnap._get_array_with_lazy_actions(self, name)
    347                 self.__derive_if_required(name)
    349 # At this point we've done everything we can to get the array into memory. If it's still not there, we'll
    350 # get a KeyError from the below.
--> 351 return self._get_array(name)

File ~/checkouts/readthedocs.org/user_builds/pynbody/envs/latest/lib/python3.11/site-packages/pynbody/snapshot/simsnap.py:1226, in SimSnap._get_array(self, name, index, always_writable)
   1216 def _get_array(self, name, index=None, always_writable=False):
   1217     """Get the array of the specified *name*, optionally for only the particles specified by *index*.
   1218 
   1219     If *always_writable* is True, the returned array is writable. Otherwise, it is still normally writable, but
   (...)
   1223     _get_array_with_lazy_actions.
   1224     """
-> 1226     x = self._arrays[name]
   1227     if x.derived and not always_writable:
   1228         x = x.view()

KeyError: 'half_mass'

The derived array will only be available for the class it was defined for, so the final command raised an error.

Changed in version 2.0: The method name has been changed from derived_quantity to derived_array. The old name is still available but will be removed in a future version.

Stable derived arrays#

Occasionally, you may want to define a derived array that is not automatically recalulated when its underlying dependencies change. An example from within the framework is the smoothing length (smooth). This is expensive to calculate, and changes to the underlying position coordinates – while in theory capable of changing the smoothing length – are far more commonly associated with translations or rotations in the course of a normal analysis. In this case, it would be wasteful to recalculate the smoothing length every time the position coordinates change.

To define a stable derived array, use stable_derived_array() in place of derived_array(). The array will be derived for the first time, but will not automatically update:

In [22]: @pynbody.snapshot.simsnap.SimSnap.stable_derived_array
   ....: def stable_x_copy(sim):
   ....:     return sim['x']
   ....: 

In [23]: s['stable_x_copy']
Out[23]: 
SimArray([1.01070931, 1.01140571, 1.01159962, ..., 1.02442261, 1.02445675,
          1.02456377], '6.85e+04 kpc a')

In [24]: s['x']+=1

In [25]: s['x']
Out[25]: 
SimArray([2.01070931, 2.01140571, 2.01159962, ..., 2.02442261, 2.02445675,
          2.02456377], '6.85e+04 kpc a')

In [26]: s['stable_x_copy']
Out[26]: 
SimArray([1.01070931, 1.01140571, 1.01159962, ..., 1.02442261, 1.02445675,
          1.02456377], '6.85e+04 kpc a')