pynbody.chunk.LoadControl#
- class pynbody.chunk.LoadControl(family_slice: dict[family.Family, slice], max_chunk: int, clauses: np.ndarray | None)[source]#
Bases:
object
LoadControl provides the logic required for partial loading.
See the documentation for
pynbody.chunk
for more information.Methods
iterate
(families_on_disk, families_in_memory)Yields step-by-step instructions for partial-loading an array with the specified families.
iterate_with_interrupts
(families_on_disk, ...)Yields instructions for loading an array with the specified families, breaking at specified file offsets
generate_family_id_lists
- __init__(family_slice: dict[family.Family, slice], max_chunk: int, clauses: np.ndarray | None)[source]#
Initialize a LoadControl object.
Inputs:
- family_slice: a dictionary of family slices describing the contiguous
layout of families on disk
- max_chunk: the guaranteed maximum chunk of data to load in a single
read operation. Larger values are likely more efficient, but also require bigger temporary buffers in your reader code.
- clauses: a description of the type of partial loading to implement. If None, all data is loaded.
Otherwise, currently the only supported option is a numpy array of particle ids to load.
- iterate(families_on_disk: list[family.Family], families_in_memory: list[family.Family], multiskip: bool = False) Iterator[tuple[int, slice | None, slice | None]] [source]#
Yields step-by-step instructions for partial-loading an array with the specified families.
A typical read loop should be as follows:
for readlen, buffer_index, memory_index in ctl.iterate(fams_on_disk, fams_in_mem) : data = read_entries(count=readlen) if memory_index is not None : target_array[memory_index] = data[buffer_index]
Obviously this can be optimized, for instance to skip through file data when memory_index is None rather than read and discard it.
- Parameters:
families_on_disk (list) – List of families for which the array exists on disk
families_in_memory (list) – List of families for which we want to read the array into memory
multiskip (bool) – If True, skip commands (i.e. entries with buffer_index=None) can have readlen greater than the block length
- Yields:
readlen (int) – Number of entries to read from disk
buffer_index (slice | None) – Slice to read from the resulting buffer, or None if this particular read is to be ignored (skipped)
memory_index (slice | None) – Slice to write into memory, or None if
buffer_index
is None
- iterate_with_interrupts(families_on_disk: list[family.Family], families_in_memory: list[family.Family], disk_interrupt_points: Iterable[int], disk_interrupt_fn: Callable, multiskip: bool = False)[source]#
Yields instructions for loading an array with the specified families, breaking at specified file offsets
Performs the same function as
iterate()
but additionally takes a list of exact file offsets disk_interrupt_points at which to interrupt the loading process and call a user-specified function disk_interrupt_fn.- Parameters:
disk_interrupt_points (Iterable) – List (or other iterable) of disk offsets at which to call the interrupt function, in ascending order
disk_interrupt_fn (Callable) – Function which takes the file offset as an argument, and is called precisely at the point that the disk interrupt point is reached
See
iterate()
for other parameters.