pynbody.chunk.LoadControl#

class pynbody.chunk.LoadControl(family_slice: dict[family.Family, slice], max_chunk: int, clauses: np.ndarray | None)[source]#

Bases: object

LoadControl provides the logic required for partial loading.

See the documentation for pynbody.chunk for more information.

Methods

iterate(families_on_disk, families_in_memory)

Yields step-by-step instructions for partial-loading an array with the specified families.

iterate_with_interrupts(families_on_disk, ...)

Yields instructions for loading an array with the specified families, breaking at specified file offsets

generate_family_id_lists

__init__(family_slice: dict[family.Family, slice], max_chunk: int, clauses: np.ndarray | None)[source]#

Initialize a LoadControl object.

Inputs:

family_slice: a dictionary of family slices describing the contiguous

layout of families on disk

max_chunk: the guaranteed maximum chunk of data to load in a single

read operation. Larger values are likely more efficient, but also require bigger temporary buffers in your reader code.

clauses: a description of the type of partial loading to implement. If None, all data is loaded.

Otherwise, currently the only supported option is a numpy array of particle ids to load.

iterate(families_on_disk: list[family.Family], families_in_memory: list[family.Family], multiskip: bool = False) Iterator[tuple[int, slice | None, slice | None]][source]#

Yields step-by-step instructions for partial-loading an array with the specified families.

A typical read loop should be as follows:

for readlen, buffer_index, memory_index in ctl.iterate(fams_on_disk, fams_in_mem) :
  data = read_entries(count=readlen)
  if memory_index is not None :
    target_array[memory_index] = data[buffer_index]

Obviously this can be optimized, for instance to skip through file data when memory_index is None rather than read and discard it.

Parameters:
  • families_on_disk (list) – List of families for which the array exists on disk

  • families_in_memory (list) – List of families for which we want to read the array into memory

  • multiskip (bool) – If True, skip commands (i.e. entries with buffer_index=None) can have readlen greater than the block length

Yields:
  • readlen (int) – Number of entries to read from disk

  • buffer_index (slice | None) – Slice to read from the resulting buffer, or None if this particular read is to be ignored (skipped)

  • memory_index (slice | None) – Slice to write into memory, or None if buffer_index is None

iterate_with_interrupts(families_on_disk: list[family.Family], families_in_memory: list[family.Family], disk_interrupt_points: Iterable[int], disk_interrupt_fn: Callable, multiskip: bool = False)[source]#

Yields instructions for loading an array with the specified families, breaking at specified file offsets

Performs the same function as iterate() but additionally takes a list of exact file offsets disk_interrupt_points at which to interrupt the loading process and call a user-specified function disk_interrupt_fn.

Parameters:
  • disk_interrupt_points (Iterable) – List (or other iterable) of disk offsets at which to call the interrupt function, in ascending order

  • disk_interrupt_fn (Callable) – Function which takes the file offset as an argument, and is called precisely at the point that the disk interrupt point is reached

See iterate() for other parameters.