|
IRON 1a5eed49d3c0721a318ac369f725acc96b7c4584
|
Functions | |
| list | _i32s (int n) |
| ExternalFunction | conv2dk1 (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8) |
| ExternalFunction | conv2dk3 (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8, int|None weight_output_channels=None) |
| ExternalFunction | conv2dk1_skip (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8) |
| ExternalFunction | conv2dk1_i8 (int input_width=32, int input_channels=64, int output_channels=64) |
| ExternalFunction | conv2dk14 (int input_width=224, int input_channels=16, int output_channels=16, int kernel_width=14) |
| ExternalFunction | conv2dk1_skip_init (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8, int|None skip_input_channels=None) |
| ExternalFunction | bn_conv2dk1_relu (int input_width=32, int input_channels=64, int output_channels=64) |
| ExternalFunction | bn_conv2dk3 (int input_width=32, int input_channels=64, int output_channels=64) |
| ExternalFunction | bn_conv2dk1_i8 (int input_width=32, int input_channels=64, int output_channels=64) |
| ExternalFunction | bn_conv2dk1_skip (int input_width=32, int input_channels=64, int output_channels=64, skip_dtype=np.uint8) |
| ExternalFunction | bn_conv2dk3_dw (int input_width=32, int input_channels=64, int output_channels=64, int stride=1) |
| ExternalFunction | bn_conv2dk1_relu_xy_pool_padded (int input_width=7, int input_channels=80, int output_channels=1280, int|None weight_chunk_count=None) |
| None | _validate_bn_block_index (int block_index, str factory_name) |
| ExternalFunction | bn_conv2dk1_partial_put_i8 (int input_width=7, int input_channels=80, int weight_count=4800, *int block_index=13) |
| ExternalFunction | bn_conv2dk1_partial_get_relu_i8 (int input_width=7, int input_channels=80, int output_channels=480, int weight_count=4800, *int block_index=13) |
| ExternalFunction | bn_conv2dk3_dw_out_split (int input_width=7, int input_channels=480, int output_split_channels=240, *int block_index=13) |
| ExternalFunction | bn_conv2dk1_input_split_partial_put_ui8 (int input_width=7, int input_channels=240, int weight_count=9600, *int block_index=13) |
| ExternalFunction | bn_conv2dk1_input_split_partial_skip_get (int input_width=7, int input_channels=240, int output_channels=80, int weight_count=9600, *int block_index=13) |
| ExternalFunction | bn_fc_relu_ui16_pad (int input_channels=1280, int output_channels=16, int|None weight_chunk_count=None) |
Convolution kernel factories: conv2dk1/3/14, bottleneck (bn_*) variants.
|
protected |
Return a list of *n* ``np.int32`` types — for trailing scalar conv args.
|
protected |
| ExternalFunction iron.kernels.conv.bn_conv2dk1_i8 | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64 |
||
| ) |
Bottleneck 1x1 conv kernel (uint8 in, int8 out).
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
Returns:
ExternalFunction configured for the bn_conv2dk1_i8 kernel.
| ExternalFunction iron.kernels.conv.bn_conv2dk1_input_split_partial_put_ui8 | ( | int | input_width = 7, |
| int | input_channels = 240, |
||
| int | weight_count = 9600, |
||
| *int | block_index = 13 |
||
| ) |
Input-split cascade-PUT half of a 1x1 conv on uint8 activations.
Like :func:`bn_conv2dk1_partial_put_i8` but consumes a CHANNEL slice
(input-split) of a uint8 activation instead of a width slice of int8.
Used by MobileNet V3's bn13 / bn14 L3 stage.
Args:
input_width: Spatial width of the input slice.
input_channels: Number of input channels (one half of the
full input after split).
weight_count: Per-call weight chunk size in elements.
block_index: ``13`` or ``14``; selects the per-block C++ wrapper.
Returns:
ExternalFunction configured for the input-split PUT tile.
Raises:
ValueError: When ``block_index`` is not 13 or 14.
| ExternalFunction iron.kernels.conv.bn_conv2dk1_input_split_partial_skip_get | ( | int | input_width = 7, |
| int | input_channels = 240, |
||
| int | output_channels = 80, |
||
| int | weight_count = 9600, |
||
| *int | block_index = 13 |
||
| ) |
Input-split cascade-GET half of a 1x1 conv + skip-add (uint8 in, int8 out).
The GET tile completes the cascade-split 1x1 + ReLU + residual add
pattern: consumes the partial sum from its sister PUT tile, finishes
the dot product, adds a skip row of int8 activations, and writes int8
output. Sister of :func:`bn_conv2dk1_input_split_partial_put_ui8`.
Args:
input_width: Spatial width of the input slice.
input_channels: Number of input channels (one half after split).
output_channels: Final output channels.
weight_count: Per-call weight chunk size in elements.
block_index: ``13`` or ``14``; selects the per-block C++ wrapper.
Returns:
ExternalFunction configured for the input-split skip-GET tile.
Raises:
ValueError: When ``block_index`` is not 13 or 14.
| ExternalFunction iron.kernels.conv.bn_conv2dk1_partial_get_relu_i8 | ( | int | input_width = 7, |
| int | input_channels = 80, |
||
| int | output_channels = 480, |
||
| int | weight_count = 4800, |
||
| *int | block_index = 13 |
||
| ) |
Cascade-GET half of a width-split 1x1 conv + ReLU on int8 activations.
The GET tile of a two-tile cascade-split pointwise conv: consumes
the cascade partial sum from its sister PUT tile, finishes the dot
product against its weight half, applies ReLU, and writes the full
output buffer. Sister of :func:`bn_conv2dk1_partial_put_i8`.
Currently defined in the .cc only for MobileNet V3's bn13 / bn14
(one wrapper symbol per block); ``block_index`` selects which.
Args:
input_width: Spatial width of the input slice.
input_channels: Number of input channels.
output_channels: Number of output channels (full L1 output width).
weight_count: Per-call weight chunk size in elements.
block_index: ``13`` or ``14``; selects the per-block C++ wrapper.
Returns:
ExternalFunction configured for the GET tile.
Raises:
ValueError: When ``block_index`` is not 13 or 14.
| ExternalFunction iron.kernels.conv.bn_conv2dk1_partial_put_i8 | ( | int | input_width = 7, |
| int | input_channels = 80, |
||
| int | weight_count = 4800, |
||
| *int | block_index = 13 |
||
| ) |
Cascade-PUT half of a width-split 1x1 conv on int8 activations.
The PUT tile of a two-tile cascade-split pointwise conv: consumes a
width slice of the activation, multiplies against its weight half,
and emits the partial sum onto the cascade stream (no separate
output buffer — cascade-only). Sister of
:func:`bn_conv2dk1_partial_get_relu_i8`.
Currently defined in the .cc only for MobileNet V3's bn13 / bn14
(one wrapper symbol per block); ``block_index`` selects which.
Generalising this to arbitrary block names would require adding a
non-prefixed wrapper to ``bn_conv2dk1_i8.cc``.
Args:
input_width: Spatial width of the input slice.
input_channels: Number of input channels.
weight_count: Per-call weight chunk size in elements (the design
streams weights in chunks; full weight tensor is shared
across multiple kernel invocations).
block_index: ``13`` or ``14``; selects the per-block C++ wrapper.
Returns:
ExternalFunction configured for the PUT tile.
Raises:
ValueError: When ``block_index`` is not 13 or 14.
| ExternalFunction iron.kernels.conv.bn_conv2dk1_relu | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64 |
||
| ) |
Bottleneck 1x1 conv + ReLU kernel (int8 in, uint8 out).
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
Returns:
ExternalFunction configured for the bn_conv2dk1_relu kernel.
| ExternalFunction iron.kernels.conv.bn_conv2dk1_relu_xy_pool_padded | ( | int | input_width = 7, |
| int | input_channels = 80, |
||
| int | output_channels = 1280, |
||
| int | None | weight_chunk_count = None |
||
| ) |
Fused 1x1 conv + ReLU + xy-pool with channel padding (int8 in, uint16 out).
A post-stage kernel that fuses a pointwise (1x1) convolution, ReLU
activation, and global xy avg-pool into a single pass, with output
channels padded to a DMA-friendly multiple. Sized for MobileNet V3's
post-bottleneck stage where the final 1x1 expand-conv collapses the
7x7 feature map into a 1x1 vector.
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Logical output channels (e.g. 1280). Sets both
the output buffer length AND, when ``weight_chunk_count`` is
None, the weight buffer length (``input_channels * output_channels``).
weight_chunk_count: Override the weight buffer's element count when
the design streams weights in chunks (cascade/output-split).
``None`` means use the full ``input_channels * output_channels``
tile.
Returns:
ExternalFunction configured for the fused conv+relu+xy_pool kernel.
| ExternalFunction iron.kernels.conv.bn_conv2dk1_skip | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64, |
||
skip_dtype = np.uint8 |
|||
| ) |
Bottleneck 1x1 conv with skip connection (uint8 in).
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
skip_dtype: Skip connection data type (``np.uint8`` or ``np.int8``).
Returns:
ExternalFunction configured for the bn_conv2dk1_skip kernel.
Raises:
ValueError: When ``skip_dtype`` is not ``np.uint8`` or ``np.int8``.
| ExternalFunction iron.kernels.conv.bn_conv2dk3 | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64 |
||
| ) |
Bottleneck 3x3 conv with stride-2 kernel (int8 in, uint8 out).
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
Returns:
ExternalFunction configured for the bn_conv2dk3 kernel.
| ExternalFunction iron.kernels.conv.bn_conv2dk3_dw | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64, |
||
| int | stride = 1 |
||
| ) |
Bottleneck depthwise 3x3 conv + ReLU kernel (uint8 in/out).
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
stride: Convolution stride (1 or 2).
Returns:
ExternalFunction configured for the bn_conv2dk3_dw kernel.
Raises:
ValueError: When ``stride`` is not 1 or 2.
| ExternalFunction iron.kernels.conv.bn_conv2dk3_dw_out_split | ( | int | input_width = 7, |
| int | input_channels = 480, |
||
| int | output_split_channels = 240, |
||
| *int | block_index = 13 |
||
| ) |
Depthwise 3x3 stride-1 conv with split output stream (uint8 in/out).
A variant of :func:`bn_conv2dk3_dw` (stride=1) that writes its output
to TWO separate buffers — the channel dimension is split in half so
downstream cascade-PUT tiles can each consume one slice. Used by
MobileNet V3's bn13 / bn14 depthwise stage to feed the L3 cascade.
Currently defined in the .cc only via per-block extern wrappers
(BN13 or BN14 macro picks the symbol prefix); ``block_index`` selects
which.
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels (== output channels —
depthwise).
output_split_channels: Channels per output slice (half of
``input_channels`` for the typical 2-way split).
block_index: ``13`` or ``14``; selects the per-block C++ wrapper.
Returns:
ExternalFunction configured for the split-output DW kernel.
Raises:
ValueError: When ``block_index`` is not 13 or 14.
| ExternalFunction iron.kernels.conv.bn_fc_relu_ui16_pad | ( | int | input_channels = 1280, |
| int | output_channels = 16, |
||
| int | None | weight_chunk_count = None |
||
| ) |
Fully-connected layer (1x1 conv on (1,1,C)) + ReLU, uint16 in/out, with padding.
A post-stage FC kernel used by MobileNet V3's classifier head. Input is
a (1,1,input_channels) feature vector held as uint16; output is
``output_channels`` uint16 logits. Weights stored in a padded layout
(the ``input_channels_pad`` runtime arg selects the actual stride).
Args:
input_channels: Number of input channels (e.g. 1280).
output_channels: Number of output channels per call (slice width,
since the full FC is split across multiple tiles).
weight_chunk_count: Override the weight buffer's element count when
the design streams weights in chunks (cascade/ping-pong).
``None`` means use the full ``input_channels * output_channels``
tile.
Returns:
ExternalFunction configured for the post-L2 FC kernel.
| ExternalFunction iron.kernels.conv.conv2dk1 | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64, |
||
act_dtype = np.int8 |
|||
| ) |
1x1 convolution kernel.
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
act_dtype: Activation data type (``np.int8`` or ``np.uint8``).
Returns:
ExternalFunction configured for the conv2dk1 kernel.
Raises:
ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.
| ExternalFunction iron.kernels.conv.conv2dk14 | ( | int | input_width = 224, |
| int | input_channels = 16, |
||
| int | output_channels = 16, |
||
| int | kernel_width = 14 |
||
| ) |
14x14 convolution kernel (aie2p only).
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
kernel_width: Width (and height) of the convolution kernel.
Returns:
ExternalFunction configured for the conv2dk14 kernel.
| ExternalFunction iron.kernels.conv.conv2dk1_i8 | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64 |
||
| ) |
1x1 convolution kernel with int8 activations/weights/output.
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
Returns:
ExternalFunction configured for the conv2dk1_i8 kernel.
| ExternalFunction iron.kernels.conv.conv2dk1_skip | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64, |
||
act_dtype = np.int8 |
|||
| ) |
1x1 convolution kernel with skip (residual) connection.
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
act_dtype: Activation data type (``np.int8`` or ``np.uint8``).
Returns:
ExternalFunction configured for the conv2dk1_skip kernel.
Raises:
ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.
| ExternalFunction iron.kernels.conv.conv2dk1_skip_init | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64, |
||
act_dtype = np.int8, |
|||
| int | None | skip_input_channels = None |
||
| ) |
1x1 convolution kernel with skip-init connection.
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels.
act_dtype: Activation data type (``np.int8`` or ``np.uint8``).
skip_input_channels: Number of input channels for the skip-projection
1x1 conv whose weights are concatenated after the main conv
weights in the same buffer. Defaults to ``input_channels``.
Returns:
ExternalFunction configured for the conv2dk1_skip_init kernel.
Raises:
ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.
| ExternalFunction iron.kernels.conv.conv2dk3 | ( | int | input_width = 32, |
| int | input_channels = 64, |
||
| int | output_channels = 64, |
||
act_dtype = np.int8, |
|||
| int | None | weight_output_channels = None |
||
| ) |
3x3 convolution kernel.
Args:
input_width: Spatial width of the input.
input_channels: Number of input channels.
output_channels: Number of output channels produced by this call.
act_dtype: Activation data type (``np.int8`` or ``np.uint8``).
weight_output_channels: Total number of output channels stored in the
weights buffer. Defaults to ``output_channels``. Set higher than
``output_channels`` when the weights buffer is shared across
multiple workers that each produce a slice of the output (the
``channel_offset`` runtime arg selects a worker's slice).
Returns:
ExternalFunction configured for the conv2dk3 kernel.
Raises:
ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.