IRON 1a5eed49d3c0721a318ac369f725acc96b7c4584
Loading...
Searching...
No Matches
Functions | Variables
iron.kernels.activation Namespace Reference

Functions

ExternalFunction _create_lut_kernel (str func_name, str kernel_filename, list arg_types, list[str]|None compile_flags=None)
 
ExternalFunction _bf16_lut_factory (str factory_name, str func_name, str kernel_filename, int tile_size, int arg_arity)
 
ExternalFunction softmax (int tile_size=1024)
 
ExternalFunction gelu (int tile_size=1024)
 
ExternalFunction silu (int tile_size=1024)
 
ExternalFunction swiglu (int tile_size=1024)
 
ExternalFunction bf16_exp (int tile_size=1024)
 
 relu_ref (x)
 
 silu_ref (x)
 
 gelu_ref (x)
 
 bf16_exp_ref (x)
 
 softmax_ref (x, *int tile_size=1024)
 

Variables

int _LUT_FIXED_TILE = 1024
 

Detailed Description

Activation kernel factories + numpy reference implementations.

Factories (each returns an :class:`ExternalFunction`):
  softmax, gelu, silu, swiglu, bf16_exp.

Companion numpy reference implementations for host-side verification:
  :func:`relu_ref`, :func:`silu_ref`, :func:`gelu_ref`,
  :func:`bf16_exp_ref`, :func:`softmax_ref`.  These compute the AIE
  kernel's op in float32 so designs don't each reimplement the math
  in their verify path.  Pair with
  :func:`aie.utils.verify.count_mismatches` (rtol=0.128 is the
  canonical LUT-tolerance default; see each ref's docstring for
  per-op recommendations).

Function Documentation

◆ _bf16_lut_factory()

ExternalFunction iron.kernels.activation._bf16_lut_factory ( str  factory_name,
str  func_name,
str  kernel_filename,
int  tile_size,
int  arg_arity 
)
protected
Build a LUT-backed bf16 kernel whose arg list is N copies of the same tile type.

◆ _create_lut_kernel()

ExternalFunction iron.kernels.activation._create_lut_kernel ( str  func_name,
str  kernel_filename,
list  arg_types,
list[str] | None   compile_flags = None 
)
protected
Create an ExternalFunction for a LUT-dependent kernel.

Handles the aie2/aie2p split:
- aie2: combines kernel source with lut_based_ops.cpp in a single TU.
- aie2p: uses source_file directly (no LUT dependency).

◆ bf16_exp()

ExternalFunction iron.kernels.activation.bf16_exp ( int   tile_size = 1024)
Element-wise exponential kernel for bf16 tiles (must be 1024).

◆ bf16_exp_ref()

iron.kernels.activation.bf16_exp_ref (   x)
numpy reference for :func:`bf16_exp` — element-wise ``exp(x)``.

LUT approximation territory; the AIE kernel saturates on large inputs.
Pair with the canonical 12.8% relative tolerance and ``stop_at_
nonfinite=True`` (the default in
:func:`aie.utils.verify.count_mismatches`) when verifying.

◆ gelu()

ExternalFunction iron.kernels.activation.gelu ( int   tile_size = 1024)
GELU activation kernel (tanh approximation) for bf16 tiles (must be 1024).

◆ gelu_ref()

iron.kernels.activation.gelu_ref (   x)
numpy reference for :func:`gelu` — tanh approximation
``0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))``.

Matches the C++ kernel's tanh-GELU formula; pair with ``rtol=0.128,
atol=0.05`` when verifying.

◆ relu_ref()

iron.kernels.activation.relu_ref (   x)
numpy reference for :func:`relu` — element-wise ``max(x, 0)``.

Exact; tolerance comparison is not needed.  See ``aie.utils.verify``
for the relaxed bf16/LUT-style comparators most kernels here want.

◆ silu()

ExternalFunction iron.kernels.activation.silu ( int   tile_size = 1024)
SiLU (Swish) activation kernel for bf16 tiles (must be 1024).

◆ silu_ref()

iron.kernels.activation.silu_ref (   x)
numpy reference for :func:`silu` (Swish) — ``x * sigmoid(x)``.

LUT-approximation territory; pair with ``rtol=0.128`` (the default
in :func:`aie.utils.verify.count_mismatches`) when verifying.

◆ softmax()

ExternalFunction iron.kernels.activation.softmax ( int   tile_size = 1024)
Softmax activation kernel for bf16 tiles (tile_size must be 1024).

Args:
    tile_size: Number of elements per tile.

Returns:
    ExternalFunction configured for the softmax kernel.

◆ softmax_ref()

iron.kernels.activation.softmax_ref (   x,
*int   tile_size = 1024 
)
numpy reference for :func:`softmax`.

The AIE kernel computes softmax independently per ``tile_size``-element
tile (no cross-tile reduction), so the reference splits ``x`` the same
way before applying the float32 softmax.  ``x.size`` must be a
multiple of ``tile_size``.

◆ swiglu()

ExternalFunction iron.kernels.activation.swiglu ( int   tile_size = 1024)
SwiGLU gated activation kernel for bf16 tiles (must be 1024).

Variable Documentation

◆ _LUT_FIXED_TILE

int iron.kernels.activation._LUT_FIXED_TILE = 1024
protected