pynbody.chunk.LoadControl#

class pynbody.chunk.LoadControl(family_slice: dict[family.Family, slice], max_chunk: int, clauses: np.ndarray | None)[source]#

Bases: object

LoadControl provides the logic required for partial loading.

See the documentation for pynbody.chunk for more information.

Methods

`iterate`(families_on_disk, families_in_memory)	Yields step-by-step instructions for partial-loading an array with the specified families.
`iterate_with_interrupts`(families_on_disk, ...)	Yields instructions for loading an array with the specified families, breaking at specified file offsets

generate_family_id_lists

__init__(family_slice: dict[family.Family, slice], max_chunk: int, clauses: np.ndarray | None)[source]#

Initialize a LoadControl object.

Inputs:

family_slice: a dictionary of family slices describing the contiguous
layout of families on disk

max_chunk: the guaranteed maximum chunk of data to load in a single
read operation. Larger values are likely more efficient, but also require bigger temporary buffers in your reader code.

clauses: a description of the type of partial loading to implement. If None, all data is loaded.
Otherwise, currently the only supported option is a numpy array of particle ids to load.

iterate(families_on_disk: list[family.Family], families_in_memory: list[family.Family], multiskip: bool = False) → Iterator[tuple[int, slice | None, slice | None]][source]#

Yields step-by-step instructions for partial-loading an array with the specified families.

A typical read loop should be as follows:

for readlen, buffer_index, memory_index in ctl.iterate(fams_on_disk, fams_in_mem) :
  data = read_entries(count=readlen)
  if memory_index is not None :
    target_array[memory_index] = data[buffer_index]

Obviously this can be optimized, for instance to skip through file data when memory_index is None rather than read and discard it.

Parameters:

families_on_disk (list) – List of families for which the array exists on disk
families_in_memory (list) – List of families for which we want to read the array into memory
multiskip (bool) – If True, skip commands (i.e. entries with buffer_index=None) can have readlen greater than the block length

Yields:

readlen (int) – Number of entries to read from disk
buffer_index (slice | None) – Slice to read from the resulting buffer, or None if this particular read is to be ignored (skipped)
memory_index (slice | None) – Slice to write into memory, or None if buffer_index is None

iterate_with_interrupts(families_on_disk: list[family.Family], families_in_memory: list[family.Family], disk_interrupt_points: Iterable[int], disk_interrupt_fn: Callable, multiskip: bool = False)[source]#

Yields instructions for loading an array with the specified families, breaking at specified file offsets

Performs the same function as iterate() but additionally takes a list of exact file offsets disk_interrupt_points at which to interrupt the loading process and call a user-specified function disk_interrupt_fn.

Parameters:

disk_interrupt_points (Iterable) – List (or other iterable) of disk offsets at which to call the interrupt function, in ascending order
disk_interrupt_fn (Callable) – Function which takes the file offset as an argument, and is called precisely at the point that the disk interrupt point is reached

See iterate() for other parameters.

pynbody.chunk.LoadControl

Contents

pynbody.chunk.LoadControl#