IRON 1a5eed49d3c0721a318ac369f725acc96b7c4584
Loading...
Searching...
No Matches
Functions
iron.algorithms.reduce Namespace Reference

Functions

 _reduce_gen (func, input_desc, output_desc, *trace_size=0)
 
 reduce (func, input_ty, output_ty, *trace_size=0)
 

Detailed Description

Reduction algorithms built on IRON.

Reductions differ from :mod:`~aie.iron.algorithms.transform` in two ways:

* Output shape is *smaller* than input shape (often ``(1,)`` for a scalar
  reduction), so the same-shape invariant ``_transform_gen`` enforces does
  not apply.
* The whole input is handed to the kernel in **one** call rather than tiled.
  Reductions need accumulator state across the elements, which a per-tile
  lambda can't model -- so these helpers accept an :class:`ExternalFunction`
  with signature ``(input_tile, output_tile, input_size: np.int32)`` and
  do not have a lambda path.

Function Documentation

◆ _reduce_gen()

iron.algorithms.reduce._reduce_gen (   func,
  input_desc,
  output_desc,
trace_size = 0 
)
protected
Generate a reduction design: whole input -> smaller output via one kernel call.

Args:
    func: :class:`~aie.iron.kernel.ExternalFunction`. The kernel is
        invoked once per design execution with arguments
        ``(input_tile, output_tile, input_num_elements)`` -- the third
        arg is passed as a literal ``np.int32`` so the kernel can size
        its accumulator loop.
    input_desc: A fake-tensor descriptor (``.shape``, ``.size``,
        ``.dtype``) for the input.  Build via
        :func:`make_param_descriptor`.
    output_desc: Same, for the output (typically ``(1,)``-shaped).
    trace_size: When > 0, enable Worker core trace and a
        ``trace_size``-byte runtime trace buffer (default: 0).  Kernel
        is expected to emit ``event0()``/``event1()`` markers.

◆ reduce()

iron.algorithms.reduce.reduce (   func,
  input_ty,
  output_ty,
trace_size = 0 
)
Apply reduction ``func`` over an entire input tensor producing ``output_ty``.

Like :func:`~aie.iron.algorithms.transform.transform` but for
reductions: hands the whole input to ``func`` in a single kernel call
rather than iterating per-tile.  Intended for use inside ``@iron.jit``
generator bodies where input/output shapes are expressed as
``CompileTime[T]`` parameters::

    @iron.jit
    def my_design(inp: In, out: Out, *, N: CompileTime[int]):
        in_ty = np.ndarray[(N,), np.dtype[np.int32]]
        out_ty = np.ndarray[(1,), np.dtype[np.int32]]
        return reduce(my_reduce_kernel, in_ty, out_ty)

Args:
    func: :class:`~aie.iron.kernel.ExternalFunction` with signature
        ``(input_array, output_array, input_size: np.int32)``.
    input_ty: A numpy ``ndarray`` type (e.g. ``np.ndarray[(1024,),
        np.dtype[np.int32]]``) for the input tensor.
    output_ty: Same, for the output tensor (typically ``(1,)``-shaped).
    trace_size: When > 0, enable Worker core trace and a runtime trace
        buffer of this size in bytes. Defaults to 0 (off).

Returns:
    mlir.ir.Module: The compiled MLIR module.