Functions
None	_check_num_channels (int num_channels)

	_transform_gen (func, list inputs, output, *params, tile_size=16, trace_size=0)

	_transform_parallel_gen (func, list inputs, output, *params, tile_size=16, trace_size=0, num_channels=1, pass_size_to_kernel=True)

	make_param_descriptor (tensor_ty)

	_expand_param (param)

	_make_fake_tensor (tensor_ty, tile_size, fn_name)

	transform (func, tensor_ty, *params, tile_size=16, trace_size=0)

	transform_binary (func, tensor_ty, tile_size=16, trace_size=0)

	transform_parallel (func, tensor_ty, *params, tile_size=16, trace_size=0, num_channels=1, pass_size_to_kernel=True)

	transform_parallel_binary (func, tensor_ty, tile_size=16, trace_size=0, num_channels=1, pass_size_to_kernel=True)

Detailed Description

Tiled transform algorithms (unary/binary, single-core/parallel) built on IRON.

Function Documentation

◆ _check_num_channels()

None iron.algorithms.transform._check_num_channels ( int num_channels )

protected

◆ _expand_param()

iron.algorithms.transform._expand_param ( param )

protected

Allow callers to pass either a real tensor or a numpy ndarray type.

◆ _make_fake_tensor()

iron.algorithms.transform._make_fake_tensor	(	tensor_ty,
		tile_size,
		fn_name
	)

protected

Parse a numpy ndarray type descriptor and return a fake tensor object.

Extracts ``num_elements`` and ``dtype`` from *tensor_ty*, validates that
*tile_size* divides evenly into *num_elements*, and returns a lightweight
object exposing ``.shape``, ``.size``, and ``.dtype`` attributes — enough
for :func:`_transform_gen` and :func:`_transform_parallel_gen` to operate
without real NPU memory.

Args:
    tensor_ty: A numpy ``ndarray`` type (e.g. ``np.ndarray[(1024,),
        np.dtype[np.int32]]``).
    tile_size (int): Number of elements per tile.
    fn_name (str): Caller name used in error messages.

Returns:
    An object with ``.shape``, ``.size``, and ``.dtype``.

◆ _transform_gen()

iron.algorithms.transform._transform_gen	(		func,
		list	inputs,
			output,
		*	params,
			tile_size = `16`,
			trace_size = `0`
	)

protected

General tiled transform to apply a function on inputs and obtain a single output.
Assumes all input and output shapes are the same.

Args:
    func: Function to apply, either a lambda/callable or ExternalFunction.
          For ExternalFunction, arg_types should be [*input_tiles, output_tile, *params]
    inputs: List of input tensors (will be tiled automatically)
    output: Output tensor (will be tiled automatically)
    *params: Additional parameters for ExternalFunction only.
             Scalar dtypes (np.int32, etc.) are passed as MLIR constants;
             array types are transferred via ObjectFifos.
    tile_size: Size of each tile processed by a worker (default: 16)
    trace_size: When > 0, enable per-Worker core trace and a
        ``trace_size``-byte runtime trace buffer (default: 0).  The kernel
        (or lambda) is expected to emit event0()/event1() markers; the
        trace shim records cycles between them.

◆ _transform_parallel_gen()

iron.algorithms.transform._transform_parallel_gen	(		func,
		list	inputs,
			output,
		*	params,
			tile_size = `16`,
			trace_size = `0`,
			num_channels = `1`,
			pass_size_to_kernel = `True`
	)

protected

General parallel transform to apply a function on inputs and obtain a single output.
Distributes work across multiple AIE tiles for parallel execution.

With ``num_channels=2`` (and no extra ``*params``), the design also drives
both shim DMA channels per column — one worker per (column, channel) pair
— which is the right shape for DDR-bandwidth-bound element-wise kernels
like ReLU/GELU/SiLU/eltwise_add.  The single-channel default (``num_channels=1``)
reproduces the original one-worker-per-column behaviour bit-for-bit.

Args:
    func: Function to apply, either a lambda/callable or ExternalFunction.
          For ExternalFunction, arg_types should be [*input_tiles, output_tile, *params]
    inputs: List of input tensors (will be tiled automatically)
    output: Output tensor (will be tiled automatically)
    *params: Additional parameters for ExternalFunction only.
             Scalar dtypes (np.int32, etc.) are passed as MLIR constants;
             array types are transferred via ObjectFifos.
    tile_size: Size of each tile processed by a worker (default: 16)
    trace_size: When > 0, enable per-column-Worker core trace and a
        ``trace_size``-byte runtime trace buffer (default: 0).  Same
        event0()/event1() expectation as :func:`_transform_gen`.
    num_channels: Shim DMA channels per column to drive, 1 or 2 (default: 1).
        With 2, two workers per column run in parallel on disjoint
        sub-ranges, doubling DDR throughput.  Not compatible with shared
        tensor ``*params`` (each per-(col, chan) worker would need its own
        param OF) — use ``num_channels=1`` if you need ``*params``.
    pass_size_to_kernel: When True (default), the kernel receives an extra
        trailing ``int`` argument equal to ``tile_size``.  Set False for
        kernels whose signature is just ``(*in_tiles, out_tile)`` (e.g.
        ``iron.kernels.relu``, ``iron.kernels.add``).

◆ make_param_descriptor()

iron.algorithms.transform.make_param_descriptor ( tensor_ty )

Build a fake-tensor descriptor (``.shape``, ``.size``, ``.dtype``) for
use as an extra param to :func:`transform` and friends.

Mirrors :func:`_make_fake_tensor` but skips the tile-divisibility check
because params (e.g. a 1-element ``factor`` tensor) are passed through
a dedicated ObjectFifo and aren't tiled.

◆ transform()

iron.algorithms.transform.transform	(		func,
			tensor_ty,
		*	params,
			tile_size = `16`,
			trace_size = `0`
	)

Apply ``func`` element-wise over a tensor described by *tensor_ty*.

Like :func:`transform` but accepts a numpy ``ndarray`` type descriptor
instead of a real tensor.  Intended for use inside ``@iron.jit`` generator
bodies where the tensor's shape and dtype are expressed as ``CompileTime[T]``
parameters and the actual tensors are not yet available::

    @iron.jit
    def my_design(inp: In, out: Out,
                  N: CompileTime[int], dtype: CompileTime[type] = np.int32):
        tensor_ty = np.ndarray[(N,), np.dtype[dtype]]
        return iron.algorithms.transform(lambda x: x + 1, tensor_ty)

Args:
    func: Function or :class:`~aie.iron.kernel.ExternalFunction` to apply.
    tensor_ty: A numpy ``ndarray`` type (e.g. ``np.ndarray[(1024,),
        np.dtype[np.int32]]``). Shape and dtype are inferred from this.
    *params: Additional parameters forwarded to ``func`` (ExternalFunction
        only).  Each ``param`` may be a real tensor, a numpy ``ndarray``
        type descriptor (transparently expanded via
        :func:`make_param_descriptor`), or a numpy scalar type.
    tile_size (int, optional): Number of elements per tile. Defaults to 16.
    trace_size (int, optional): When > 0, enable Worker core trace and a
        ``trace_size``-byte runtime trace buffer. Defaults to 0 (off).

Returns:
    mlir.ir.Module: The compiled MLIR module.

◆ transform_binary()

iron.algorithms.transform.transform_binary	(	func,
		tensor_ty,
		tile_size = `16`,
		trace_size = `0`
	)

Apply ``func`` element-wise over two tensors described by *tensor_ty*.

Like :func:`transform_binary` but accepts a numpy ``ndarray`` type
descriptor instead of real tensors.  Intended for use inside
``@iron.jit`` generator bodies.

Args:
    func: Function or :class:`~aie.iron.kernel.ExternalFunction` to apply.
    tensor_ty: A numpy ``ndarray`` type (e.g. ``np.ndarray[(1024,),
        np.dtype[np.int32]]``). Shape and dtype are inferred from this.
    tile_size (int, optional): Number of elements per tile. Defaults to 16.
    trace_size (int, optional): When > 0, enable Worker core trace and a
        ``trace_size``-byte runtime trace buffer. Defaults to 0 (off).

Returns:
    mlir.ir.Module: The compiled MLIR module.

◆ transform_parallel()

iron.algorithms.transform.transform_parallel	(		func,
			tensor_ty,
		*	params,
			tile_size = `16`,
			trace_size = `0`,
			num_channels = `1`,
			pass_size_to_kernel = `True`
	)

Apply ``func`` element-wise in parallel using a tensor type descriptor.

Like :func:`transform_parallel` but accepts a numpy ``ndarray`` type
descriptor instead of a real tensor.  Intended for use inside
``@iron.jit`` generator bodies.

Args:
    func: Function or :class:`~aie.iron.kernel.ExternalFunction` to apply.
    tensor_ty: A numpy ``ndarray`` type (e.g. ``np.ndarray[(1024,),
        np.dtype[np.int32]]``). Shape and dtype are inferred from this.
    *params: Additional compile-time scalar parameters forwarded to
        ``func`` (ExternalFunction only).
    tile_size (int, optional): Number of elements per tile per worker.
        Defaults to 16.
    trace_size (int, optional): When > 0, enable per-column Worker core
        trace and a ``trace_size``-byte runtime trace buffer.
        Defaults to 0 (off).
    num_channels (int, optional): Shim DMA channels per column to drive,
        1 or 2.  ``num_channels=2`` runs one worker per (column, channel),
        doubling DDR throughput for bandwidth-bound element-wise kernels.
        Not compatible with shared tensor ``*params``.  Defaults to 1.
    pass_size_to_kernel (bool, optional): Append ``tile_size`` as a
        trailing ``int`` argument on every kernel call.  Defaults to True;
        set False for kernels with bare ``(in, out)`` signatures.

Returns:
    mlir.ir.Module: The compiled MLIR module.

◆ transform_parallel_binary()

iron.algorithms.transform.transform_parallel_binary	(	func,
		tensor_ty,
		tile_size = `16`,
		trace_size = `0`,
		num_channels = `1`,
		pass_size_to_kernel = `True`
	)

Apply ``func`` over two tensors in parallel using a tensor type descriptor.

Like :func:`transform_parallel_binary` but accepts a numpy ``ndarray``
type descriptor instead of real tensors.  Intended for use inside
``@iron.jit`` generator bodies.

Args:
    func: Function or :class:`~aie.iron.kernel.ExternalFunction` to apply.
    tensor_ty: A numpy ``ndarray`` type (e.g. ``np.ndarray[(1024,),
        np.dtype[np.int32]]``). Shape and dtype are inferred from this.
    tile_size (int, optional): Number of elements per tile per worker.
        Defaults to 16.
    trace_size (int, optional): When > 0, enable per-column Worker core
        trace and a ``trace_size``-byte runtime trace buffer.
        Defaults to 0 (off).
    num_channels (int, optional): Shim DMA channels per column to drive,
        1 or 2.  Defaults to 1.  See :func:`transform_parallel`.
    pass_size_to_kernel (bool, optional): Append ``tile_size`` as a
        trailing ``int`` argument on every kernel call.  Defaults to True.

Returns:
    mlir.ir.Module: The compiled MLIR module.

Functions

Detailed Description

Function Documentation

◆ _check_num_channels()

◆ _expand_param()

◆ _make_fake_tensor()

◆ _transform_gen()

◆ _transform_parallel_gen()

◆ make_param_descriptor()

◆ transform()

◆ transform_binary()

◆ transform_parallel()

◆ transform_parallel_binary()