|
IRON 1a5eed49d3c0721a318ac369f725acc96b7c4584
|
Functions | |
| ExternalFunction | _create_lut_kernel (str func_name, str kernel_filename, list arg_types, list[str]|None compile_flags=None) |
| ExternalFunction | _bf16_lut_factory (str factory_name, str func_name, str kernel_filename, int tile_size, int arg_arity) |
| ExternalFunction | softmax (int tile_size=1024) |
| ExternalFunction | gelu (int tile_size=1024) |
| ExternalFunction | silu (int tile_size=1024) |
| ExternalFunction | swiglu (int tile_size=1024) |
| ExternalFunction | bf16_exp (int tile_size=1024) |
| relu_ref (x) | |
| silu_ref (x) | |
| gelu_ref (x) | |
| bf16_exp_ref (x) | |
| softmax_ref (x, *int tile_size=1024) | |
Variables | |
| int | _LUT_FIXED_TILE = 1024 |
Activation kernel factories + numpy reference implementations. Factories (each returns an :class:`ExternalFunction`): softmax, gelu, silu, swiglu, bf16_exp. Companion numpy reference implementations for host-side verification: :func:`relu_ref`, :func:`silu_ref`, :func:`gelu_ref`, :func:`bf16_exp_ref`, :func:`softmax_ref`. These compute the AIE kernel's op in float32 so designs don't each reimplement the math in their verify path. Pair with :func:`aie.utils.verify.count_mismatches` (rtol=0.128 is the canonical LUT-tolerance default; see each ref's docstring for per-op recommendations).
|
protected |
Build a LUT-backed bf16 kernel whose arg list is N copies of the same tile type.
|
protected |
Create an ExternalFunction for a LUT-dependent kernel. Handles the aie2/aie2p split: - aie2: combines kernel source with lut_based_ops.cpp in a single TU. - aie2p: uses source_file directly (no LUT dependency).
| ExternalFunction iron.kernels.activation.bf16_exp | ( | int | tile_size = 1024 | ) |
Element-wise exponential kernel for bf16 tiles (must be 1024).
| iron.kernels.activation.bf16_exp_ref | ( | x | ) |
numpy reference for :func:`bf16_exp` — element-wise ``exp(x)``. LUT approximation territory; the AIE kernel saturates on large inputs. Pair with the canonical 12.8% relative tolerance and ``stop_at_ nonfinite=True`` (the default in :func:`aie.utils.verify.count_mismatches`) when verifying.
| ExternalFunction iron.kernels.activation.gelu | ( | int | tile_size = 1024 | ) |
GELU activation kernel (tanh approximation) for bf16 tiles (must be 1024).
| iron.kernels.activation.gelu_ref | ( | x | ) |
numpy reference for :func:`gelu` — tanh approximation ``0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))``. Matches the C++ kernel's tanh-GELU formula; pair with ``rtol=0.128, atol=0.05`` when verifying.
| iron.kernels.activation.relu_ref | ( | x | ) |
numpy reference for :func:`relu` — element-wise ``max(x, 0)``. Exact; tolerance comparison is not needed. See ``aie.utils.verify`` for the relaxed bf16/LUT-style comparators most kernels here want.
| ExternalFunction iron.kernels.activation.silu | ( | int | tile_size = 1024 | ) |
SiLU (Swish) activation kernel for bf16 tiles (must be 1024).
| iron.kernels.activation.silu_ref | ( | x | ) |
numpy reference for :func:`silu` (Swish) — ``x * sigmoid(x)``. LUT-approximation territory; pair with ``rtol=0.128`` (the default in :func:`aie.utils.verify.count_mismatches`) when verifying.
| ExternalFunction iron.kernels.activation.softmax | ( | int | tile_size = 1024 | ) |
Softmax activation kernel for bf16 tiles (tile_size must be 1024).
Args:
tile_size: Number of elements per tile.
Returns:
ExternalFunction configured for the softmax kernel.
| iron.kernels.activation.softmax_ref | ( | x, | |
| *int | tile_size = 1024 |
||
| ) |
numpy reference for :func:`softmax`. The AIE kernel computes softmax independently per ``tile_size``-element tile (no cross-tile reduction), so the reference splits ``x`` the same way before applying the float32 softmax. ``x.size`` must be a multiple of ``tile_size``.
| ExternalFunction iron.kernels.activation.swiglu | ( | int | tile_size = 1024 | ) |
SwiGLU gated activation kernel for bf16 tiles (must be 1024).
|
protected |