|
IRON 1a5eed49d3c0721a318ac369f725acc96b7c4584
|
Functions | |
| ExternalFunction | _reduce_kernel (str op, int tile_size, dtype, bool vectorized) |
| ExternalFunction | reduce_add (int tile_size=1024, dtype=np.int32, bool vectorized=True) |
| ExternalFunction | reduce_min (int tile_size=1024, dtype=np.int32, bool vectorized=True) |
| ExternalFunction | reduce_max (int tile_size=1024, dtype=np.int32, bool vectorized=True) |
| ExternalFunction | compute_max (dtype=np.int32) |
Variables | |
| str | _REDUCE_MAX_OBJ = "reduce_max.cc.o" |
Reduction kernel factories: reduce_add, reduce_min, reduce_max, compute_max.
|
protected |
Shared implementation for :func:`reduce_add` and :func:`reduce_min`.
| ExternalFunction iron.kernels.reduce.compute_max | ( | dtype = np.int32 | ) |
Pairwise scalar max — companion to :func:`reduce_max` for multi-core
reductions where each core produces a partial max and a final tree
reduces them pairwise.
Lives in the same ``reduce_max.cc`` as :func:`reduce_max`; sharing the
output ``.o`` (via ``shared_object_file_name``) means both factories
in the same design compile the source exactly once.
Args:
dtype: Element data type (``np.int32`` or ``bfloat16``).
Returns:
ExternalFunction configured for the ``compute_max`` kernel; signature
is ``(out_ty, out_ty, out_ty)`` where ``out_ty`` is a one-element
(DMA-aligned) tile of ``dtype``.
Raises:
ValueError: When ``dtype`` is not ``np.int32`` or ``bfloat16``.
| ExternalFunction iron.kernels.reduce.reduce_add | ( | int | tile_size = 1024, |
dtype = np.int32, |
|||
| bool | vectorized = True |
||
| ) |
Reduction kernel: sums all elements of a tile to a scalar.
Args:
tile_size: Number of elements in the input tile.
dtype: Element data type (only ``np.int32`` supported).
vectorized: If ``True`` use vectorized path; ``False`` selects scalar.
Returns:
ExternalFunction configured for the reduce_add kernel.
Raises:
ValueError: When ``dtype`` is not ``np.int32``.
| ExternalFunction iron.kernels.reduce.reduce_max | ( | int | tile_size = 1024, |
dtype = np.int32, |
|||
| bool | vectorized = True |
||
| ) |
Reduction kernel: finds the maximum element of a tile (int32 or bfloat16).
Args:
tile_size: Number of elements in the input tile.
dtype: Element data type (``np.int32`` or ``bfloat16``).
vectorized: If ``True`` use vectorized path; ``False`` selects scalar.
Returns:
ExternalFunction configured for the reduce_max kernel.
Raises:
ValueError: When ``dtype`` is not ``np.int32`` or ``bfloat16``.
| ExternalFunction iron.kernels.reduce.reduce_min | ( | int | tile_size = 1024, |
dtype = np.int32, |
|||
| bool | vectorized = True |
||
| ) |
Reduction kernel: finds the minimum element of a tile.
Args:
tile_size: Number of elements in the input tile.
dtype: Element data type (only ``np.int32`` supported).
vectorized: If ``True`` use vectorized path; ``False`` selects scalar.
Returns:
ExternalFunction configured for the reduce_min kernel.
Raises:
ValueError: When ``dtype`` is not ``np.int32``.
|
protected |