Performance optimisation in pynbody#
Pynbody is built on top of numpy, which means that learning how to optimize numpy
array manipulations is the most important route to writing efficient code; see, for example,
the Scientific Python lectures
for an introduction.
However there are a couple of issues which are specific to pynbody.
Many of
pynbody’s most common operations are parallelized: make sure you have set up these routines to behave in a way that matches your needs by reading the page on threads.Sometimes, manipulating arrays from a
SubSnapcan be slower than manipulating the equivalent arrays from the parent snapshot. This is because the arrays returned from aSubSnapare not truenumpyarrays but are insteadIndexedSimArrayobjects. This is explained in more detail below.
See also
A separate document covers parallelism in pynbody, which can also be
important for performance-critical code.
Overheads of SubSnaps#
A template for performance-critical code#
To cut a long story short, if your routine does a lot of array access on an object which might
be a SubSnap of a certain flavour (explained further below),
you will find that wrapping your code as follows speeds it up.
def my_expensive_operation(sim_or_subsim) :
with sim_or_subsim.immediate_mode :
mass_array = sim_or_subsim['mass']
other_array = sim_or_subsim['other']
# ... get other arrays required...
#
# perform multiple operations on arrays
#
# At end, copy back results if the arrays have
# changed
sim_or_subsim['other'] = other_array
The remainder of this document unpacks what this does and why it should be necessary.
What is a SubSnap, really?#
When you construct a sub-view of a simulation, the framework records which
particles of the underlying SimSnap are included and which are not.
Thereafter, if you access an array from the sub-view, it is
constructed in one of two ways.
If the set of particles in the sub-view is expressible as a slice, the type of the sub-view is
SubSnap, and any arrays accessed are still returned asnumpyarrays, albeit with non-contiguous strides. This will be the case if you explicitly slice the simulation (e.g.f[2:100:3]), or if you ask for a specific particle family (e.g.f.dm).If the set of particles is not expressible in this way, instead of a
SubSnapyou will get aIndexedSubSnap. In this case, arrays returned from the new view are emulatingnumpyarrays and this can become expensive (see below). This will be the case if you ask for a list of particles (e.g.f[[2,10,15,22]]) use anumpy-like indexing trick (e.g.f[f['x']>10]orf[numpy.where(f['x']>10)]) or use afilter.
In the case of IndexedSubSnap, performance
can be rather different from that of the parent snapshot.
To understand how and why, we
need to look at the difference between an indexed and a sliced numpy array.
The need for array emulation#
When you get an array from a IndexedSubSnap, it is of
type IndexedSimArray.
This section explains why the reason and implications.
The pynbody framework is designed to allow users to interact with data without worrying
too much about whether it is an entire simulation or a small portion of a simulation.
Consistency then requires all sub-arrays to continue pointing to the original data.
But a simple experiment with numpy shows that it does not enable this behaviour in all
cases that we want to cover.
Here’s what happens when you use a slice of an existing numpy array.
In [1]: import numpy as np
In [2]: a = np.zeros(10)
In [3]: b = a[3:5]
In [4]: b[1] = 100
In [5]: a
Out[5]: array([ 0., 0., 0., 0., 100., 0., 0., 0., 0., 0.])
The a array has been updated as required, because the b and a objects
actually point back to the same part of the computer memory.
On the other hand, when you index a numpy array, the behaviour is different.
In [6]: c = a[[4,5,6]]
In [7]: c[1] = 200
In [8]: a
Out[8]: array([ 0., 0., 0., 0., 100., 0., 0., 0., 0., 0.])
Here changing c has not updated a. That’s because the construction of c actually
copied the relevant data instead of just pointing back at it. This is necessitated by
the underlying design of numpy arrays requiring the data to lie in a predictable
pattern in the memory.
The IndexedSimArray class fixes this problem:
In [9]: import pynbody
In [10]: d = pynbody.array.IndexedSimArray(a, [4,5,6])
In [11]: d[1] = 200
In [12]: a
Out[12]: array([ 0., 0., 0., 0., 100., 200., 0., 0., 0., 0.])
Note that a has been updated correctly. This is achieved by the IndexedSimArray
emulating, rather than wrapping, a numpy array; internally
the syntax d[1]=200 is then translated into a[[4,5,6][1]]=200.
The cost of this is that each time you call a function that requires a numpy array
as an input, the IndexedSimArray has to generate a proxy for this purpose. This can become slow.
Have a look at the following timings:
In [13]: %time for i in range(10000) : d+=1
CPU times: user 135 ms, sys: 0 ns, total: 135 ms
Wall time: 135 ms
In [14]: %time for i in range(10000) : a+=1
CPU times: user 8.8 ms, sys: 0 ns, total: 8.8 ms
Wall time: 8.8 ms
Adding to the subarray is slower than adding to the entire array!
This is because of the overheads of continually constructing proxy
numpy arrays to pass to the __add__ method.
How to remove this bottleneck#
We should emphasize that the example above is quite contrived, since it forces
re-construction of the numpy proxy 10000 times. In user code,
the number of numpy proxies that have to be constructed will be vastly smaller,
so the fractional overheads are normally quite small.
Nonetheless, construction of these proxy arrays does sometimes become a problem for
performance-critical code. For that reason, it’s possible to avoid constructing
IndexedSimArray s altogether
and force only SimArray to be returned. This is a thin wrapper
around a numpy array (see Overheads of SimArrays below) and, as such, is enormously more efficient.
But it can be less convenient since you have to keep track of when to copy data back.
Pynbody refers to this approach as immediate mode; it can be activated using a context manager
(i.e. python’s with keyword).
Let’s create a test snapshot and a subview into that snapshot to try it out.
In [15]: f = pynbody.new(dm=100)
In [16]: sub_f = f[[20,21,22]]
Under normal conditions, the type of arrays returned from sub_f is
IndexedSimArray.
Updating one of these arrays will transparently update the main snapshot.
In [17]: sub_mass = sub_f['mass']
In [18]: type(sub_mass)
Out[18]: pynbody.array.IndexedSimArray
In [19]: sub_mass[:]=3
In [20]: f['mass'][[20,21,22]]
Out[20]: SimArray([3., 3., 3.])
Conversely, in immediate mode, the type of arrays returned from sub_f is
SimArray.
This is faster, but updating the returned numpy array has no effect on the
parent snapshot.
In [21]: with f.immediate_mode :
....: sub_mass = sub_f['mass']
....:
In [22]: type(sub_mass)
Out[22]: pynbody.array.SimArray
In [23]: sub_mass
Out[23]: SimArray([3., 3., 3.])
In [24]: sub_mass[:]=5
In [25]: sub_mass # updated as expected
Out[25]: SimArray([5., 5., 5.])
In [26]: f['mass'][[20,21,22]] # NOT updated - should still be 3,3,3!
Out[26]: SimArray([3., 3., 3.])
So it becomes your responsibility to copy the results back in this case, if required. A template for performance
critical code which might be operating on a SubSnap was given above, in
A template for performance-critical code.
In summary, the template code:
stores a copy of the data for the subset of particles
works on the copy
(if necessary) updates the main snapshot data explicitly before returning
Note
with f_sub.immediate_mode
is equivalent to with f.immediate_mode where f_sub is any
sub-view of f.
Overheads of SimArrays#
Note
This information is provided for interest. We have never come across a realistic use case where the following is necessary.
In pynbody, arrays are implemented by the class SimArray. This is a thin wrapper
around a numpy array. There is a small extra cost associated with every operation to allow
units to be matched and updated. For long arrays such as those found in typical simulations, this is usually a tiny fraction of the
actual computation time. We have never found it to be a problem, but if you want to disable the
unit tracking you can always do so using numpy’s view mechanism to get a raw numpy array.
Suppose you have a SimSnap f; then pos = f['pos'].view(numpy.ndarray) (for example) will return the position
array without any of the SimArray wrapping. The new pos variable can be manipulated without
any unit handling code being called.