IRON 1a5eed49d3c0721a318ac369f725acc96b7c4584
Loading...
Searching...
No Matches
Functions
iron.kernels.conv Namespace Reference

Functions

list _i32s (int n)
 
ExternalFunction conv2dk1 (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8)
 
ExternalFunction conv2dk3 (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8, int|None weight_output_channels=None)
 
ExternalFunction conv2dk1_skip (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8)
 
ExternalFunction conv2dk1_i8 (int input_width=32, int input_channels=64, int output_channels=64)
 
ExternalFunction conv2dk14 (int input_width=224, int input_channels=16, int output_channels=16, int kernel_width=14)
 
ExternalFunction conv2dk1_skip_init (int input_width=32, int input_channels=64, int output_channels=64, act_dtype=np.int8, int|None skip_input_channels=None)
 
ExternalFunction bn_conv2dk1_relu (int input_width=32, int input_channels=64, int output_channels=64)
 
ExternalFunction bn_conv2dk3 (int input_width=32, int input_channels=64, int output_channels=64)
 
ExternalFunction bn_conv2dk1_i8 (int input_width=32, int input_channels=64, int output_channels=64)
 
ExternalFunction bn_conv2dk1_skip (int input_width=32, int input_channels=64, int output_channels=64, skip_dtype=np.uint8)
 
ExternalFunction bn_conv2dk3_dw (int input_width=32, int input_channels=64, int output_channels=64, int stride=1)
 
ExternalFunction bn_conv2dk1_relu_xy_pool_padded (int input_width=7, int input_channels=80, int output_channels=1280, int|None weight_chunk_count=None)
 
None _validate_bn_block_index (int block_index, str factory_name)
 
ExternalFunction bn_conv2dk1_partial_put_i8 (int input_width=7, int input_channels=80, int weight_count=4800, *int block_index=13)
 
ExternalFunction bn_conv2dk1_partial_get_relu_i8 (int input_width=7, int input_channels=80, int output_channels=480, int weight_count=4800, *int block_index=13)
 
ExternalFunction bn_conv2dk3_dw_out_split (int input_width=7, int input_channels=480, int output_split_channels=240, *int block_index=13)
 
ExternalFunction bn_conv2dk1_input_split_partial_put_ui8 (int input_width=7, int input_channels=240, int weight_count=9600, *int block_index=13)
 
ExternalFunction bn_conv2dk1_input_split_partial_skip_get (int input_width=7, int input_channels=240, int output_channels=80, int weight_count=9600, *int block_index=13)
 
ExternalFunction bn_fc_relu_ui16_pad (int input_channels=1280, int output_channels=16, int|None weight_chunk_count=None)
 

Detailed Description

Convolution kernel factories: conv2dk1/3/14, bottleneck (bn_*) variants.

Function Documentation

◆ _i32s()

list iron.kernels.conv._i32s ( int  n)
protected
Return a list of *n* ``np.int32`` types — for trailing scalar conv args.

◆ _validate_bn_block_index()

None iron.kernels.conv._validate_bn_block_index ( int  block_index,
str  factory_name 
)
protected

◆ bn_conv2dk1_i8()

ExternalFunction iron.kernels.conv.bn_conv2dk1_i8 ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64 
)
Bottleneck 1x1 conv kernel (uint8 in, int8 out).

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.

Returns:
    ExternalFunction configured for the bn_conv2dk1_i8 kernel.

◆ bn_conv2dk1_input_split_partial_put_ui8()

ExternalFunction iron.kernels.conv.bn_conv2dk1_input_split_partial_put_ui8 ( int   input_width = 7,
int   input_channels = 240,
int   weight_count = 9600,
*int   block_index = 13 
)
Input-split cascade-PUT half of a 1x1 conv on uint8 activations.

Like :func:`bn_conv2dk1_partial_put_i8` but consumes a CHANNEL slice
(input-split) of a uint8 activation instead of a width slice of int8.
Used by MobileNet V3's bn13 / bn14 L3 stage.

Args:
    input_width: Spatial width of the input slice.
    input_channels: Number of input channels (one half of the
        full input after split).
    weight_count: Per-call weight chunk size in elements.
    block_index: ``13`` or ``14``; selects the per-block C++ wrapper.

Returns:
    ExternalFunction configured for the input-split PUT tile.

Raises:
    ValueError: When ``block_index`` is not 13 or 14.

◆ bn_conv2dk1_input_split_partial_skip_get()

ExternalFunction iron.kernels.conv.bn_conv2dk1_input_split_partial_skip_get ( int   input_width = 7,
int   input_channels = 240,
int   output_channels = 80,
int   weight_count = 9600,
*int   block_index = 13 
)
Input-split cascade-GET half of a 1x1 conv + skip-add (uint8 in, int8 out).

The GET tile completes the cascade-split 1x1 + ReLU + residual add
pattern: consumes the partial sum from its sister PUT tile, finishes
the dot product, adds a skip row of int8 activations, and writes int8
output.  Sister of :func:`bn_conv2dk1_input_split_partial_put_ui8`.

Args:
    input_width: Spatial width of the input slice.
    input_channels: Number of input channels (one half after split).
    output_channels: Final output channels.
    weight_count: Per-call weight chunk size in elements.
    block_index: ``13`` or ``14``; selects the per-block C++ wrapper.

Returns:
    ExternalFunction configured for the input-split skip-GET tile.

Raises:
    ValueError: When ``block_index`` is not 13 or 14.

◆ bn_conv2dk1_partial_get_relu_i8()

ExternalFunction iron.kernels.conv.bn_conv2dk1_partial_get_relu_i8 ( int   input_width = 7,
int   input_channels = 80,
int   output_channels = 480,
int   weight_count = 4800,
*int   block_index = 13 
)
Cascade-GET half of a width-split 1x1 conv + ReLU on int8 activations.

The GET tile of a two-tile cascade-split pointwise conv: consumes
the cascade partial sum from its sister PUT tile, finishes the dot
product against its weight half, applies ReLU, and writes the full
output buffer.  Sister of :func:`bn_conv2dk1_partial_put_i8`.

Currently defined in the .cc only for MobileNet V3's bn13 / bn14
(one wrapper symbol per block); ``block_index`` selects which.

Args:
    input_width: Spatial width of the input slice.
    input_channels: Number of input channels.
    output_channels: Number of output channels (full L1 output width).
    weight_count: Per-call weight chunk size in elements.
    block_index: ``13`` or ``14``; selects the per-block C++ wrapper.

Returns:
    ExternalFunction configured for the GET tile.

Raises:
    ValueError: When ``block_index`` is not 13 or 14.

◆ bn_conv2dk1_partial_put_i8()

ExternalFunction iron.kernels.conv.bn_conv2dk1_partial_put_i8 ( int   input_width = 7,
int   input_channels = 80,
int   weight_count = 4800,
*int   block_index = 13 
)
Cascade-PUT half of a width-split 1x1 conv on int8 activations.

The PUT tile of a two-tile cascade-split pointwise conv: consumes a
width slice of the activation, multiplies against its weight half,
and emits the partial sum onto the cascade stream (no separate
output buffer — cascade-only).  Sister of
:func:`bn_conv2dk1_partial_get_relu_i8`.

Currently defined in the .cc only for MobileNet V3's bn13 / bn14
(one wrapper symbol per block); ``block_index`` selects which.
Generalising this to arbitrary block names would require adding a
non-prefixed wrapper to ``bn_conv2dk1_i8.cc``.

Args:
    input_width: Spatial width of the input slice.
    input_channels: Number of input channels.
    weight_count: Per-call weight chunk size in elements (the design
        streams weights in chunks; full weight tensor is shared
        across multiple kernel invocations).
    block_index: ``13`` or ``14``; selects the per-block C++ wrapper.

Returns:
    ExternalFunction configured for the PUT tile.

Raises:
    ValueError: When ``block_index`` is not 13 or 14.

◆ bn_conv2dk1_relu()

ExternalFunction iron.kernels.conv.bn_conv2dk1_relu ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64 
)
Bottleneck 1x1 conv + ReLU kernel (int8 in, uint8 out).

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.

Returns:
    ExternalFunction configured for the bn_conv2dk1_relu kernel.

◆ bn_conv2dk1_relu_xy_pool_padded()

ExternalFunction iron.kernels.conv.bn_conv2dk1_relu_xy_pool_padded ( int   input_width = 7,
int   input_channels = 80,
int   output_channels = 1280,
int | None   weight_chunk_count = None 
)
Fused 1x1 conv + ReLU + xy-pool with channel padding (int8 in, uint16 out).

A post-stage kernel that fuses a pointwise (1x1) convolution, ReLU
activation, and global xy avg-pool into a single pass, with output
channels padded to a DMA-friendly multiple.  Sized for MobileNet V3's
post-bottleneck stage where the final 1x1 expand-conv collapses the
7x7 feature map into a 1x1 vector.

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Logical output channels (e.g. 1280).  Sets both
        the output buffer length AND, when ``weight_chunk_count`` is
        None, the weight buffer length (``input_channels * output_channels``).
    weight_chunk_count: Override the weight buffer's element count when
        the design streams weights in chunks (cascade/output-split).
        ``None`` means use the full ``input_channels * output_channels``
        tile.

Returns:
    ExternalFunction configured for the fused conv+relu+xy_pool kernel.

◆ bn_conv2dk1_skip()

ExternalFunction iron.kernels.conv.bn_conv2dk1_skip ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64,
  skip_dtype = np.uint8 
)
Bottleneck 1x1 conv with skip connection (uint8 in).

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.
    skip_dtype: Skip connection data type (``np.uint8`` or ``np.int8``).

Returns:
    ExternalFunction configured for the bn_conv2dk1_skip kernel.

Raises:
    ValueError: When ``skip_dtype`` is not ``np.uint8`` or ``np.int8``.

◆ bn_conv2dk3()

ExternalFunction iron.kernels.conv.bn_conv2dk3 ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64 
)
Bottleneck 3x3 conv with stride-2 kernel (int8 in, uint8 out).

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.

Returns:
    ExternalFunction configured for the bn_conv2dk3 kernel.

◆ bn_conv2dk3_dw()

ExternalFunction iron.kernels.conv.bn_conv2dk3_dw ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64,
int   stride = 1 
)
Bottleneck depthwise 3x3 conv + ReLU kernel (uint8 in/out).

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.
    stride: Convolution stride (1 or 2).

Returns:
    ExternalFunction configured for the bn_conv2dk3_dw kernel.

Raises:
    ValueError: When ``stride`` is not 1 or 2.

◆ bn_conv2dk3_dw_out_split()

ExternalFunction iron.kernels.conv.bn_conv2dk3_dw_out_split ( int   input_width = 7,
int   input_channels = 480,
int   output_split_channels = 240,
*int   block_index = 13 
)
Depthwise 3x3 stride-1 conv with split output stream (uint8 in/out).

A variant of :func:`bn_conv2dk3_dw` (stride=1) that writes its output
to TWO separate buffers — the channel dimension is split in half so
downstream cascade-PUT tiles can each consume one slice.  Used by
MobileNet V3's bn13 / bn14 depthwise stage to feed the L3 cascade.

Currently defined in the .cc only via per-block extern wrappers
(BN13 or BN14 macro picks the symbol prefix); ``block_index`` selects
which.

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels (== output channels —
        depthwise).
    output_split_channels: Channels per output slice (half of
        ``input_channels`` for the typical 2-way split).
    block_index: ``13`` or ``14``; selects the per-block C++ wrapper.

Returns:
    ExternalFunction configured for the split-output DW kernel.

Raises:
    ValueError: When ``block_index`` is not 13 or 14.

◆ bn_fc_relu_ui16_pad()

ExternalFunction iron.kernels.conv.bn_fc_relu_ui16_pad ( int   input_channels = 1280,
int   output_channels = 16,
int | None   weight_chunk_count = None 
)
Fully-connected layer (1x1 conv on (1,1,C)) + ReLU, uint16 in/out, with padding.

A post-stage FC kernel used by MobileNet V3's classifier head.  Input is
a (1,1,input_channels) feature vector held as uint16; output is
``output_channels`` uint16 logits.  Weights stored in a padded layout
(the ``input_channels_pad`` runtime arg selects the actual stride).

Args:
    input_channels: Number of input channels (e.g. 1280).
    output_channels: Number of output channels per call (slice width,
        since the full FC is split across multiple tiles).
    weight_chunk_count: Override the weight buffer's element count when
        the design streams weights in chunks (cascade/ping-pong).
        ``None`` means use the full ``input_channels * output_channels``
        tile.

Returns:
    ExternalFunction configured for the post-L2 FC kernel.

◆ conv2dk1()

ExternalFunction iron.kernels.conv.conv2dk1 ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64,
  act_dtype = np.int8 
)
1x1 convolution kernel.

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.
    act_dtype: Activation data type (``np.int8`` or ``np.uint8``).

Returns:
    ExternalFunction configured for the conv2dk1 kernel.

Raises:
    ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.

◆ conv2dk14()

ExternalFunction iron.kernels.conv.conv2dk14 ( int   input_width = 224,
int   input_channels = 16,
int   output_channels = 16,
int   kernel_width = 14 
)
14x14 convolution kernel (aie2p only).

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.
    kernel_width: Width (and height) of the convolution kernel.

Returns:
    ExternalFunction configured for the conv2dk14 kernel.

◆ conv2dk1_i8()

ExternalFunction iron.kernels.conv.conv2dk1_i8 ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64 
)
1x1 convolution kernel with int8 activations/weights/output.

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.

Returns:
    ExternalFunction configured for the conv2dk1_i8 kernel.

◆ conv2dk1_skip()

ExternalFunction iron.kernels.conv.conv2dk1_skip ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64,
  act_dtype = np.int8 
)
1x1 convolution kernel with skip (residual) connection.

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.
    act_dtype: Activation data type (``np.int8`` or ``np.uint8``).

Returns:
    ExternalFunction configured for the conv2dk1_skip kernel.

Raises:
    ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.

◆ conv2dk1_skip_init()

ExternalFunction iron.kernels.conv.conv2dk1_skip_init ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64,
  act_dtype = np.int8,
int | None   skip_input_channels = None 
)
1x1 convolution kernel with skip-init connection.

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels.
    act_dtype: Activation data type (``np.int8`` or ``np.uint8``).
    skip_input_channels: Number of input channels for the skip-projection
        1x1 conv whose weights are concatenated after the main conv
        weights in the same buffer. Defaults to ``input_channels``.

Returns:
    ExternalFunction configured for the conv2dk1_skip_init kernel.

Raises:
    ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.

◆ conv2dk3()

ExternalFunction iron.kernels.conv.conv2dk3 ( int   input_width = 32,
int   input_channels = 64,
int   output_channels = 64,
  act_dtype = np.int8,
int | None   weight_output_channels = None 
)
3x3 convolution kernel.

Args:
    input_width: Spatial width of the input.
    input_channels: Number of input channels.
    output_channels: Number of output channels produced by this call.
    act_dtype: Activation data type (``np.int8`` or ``np.uint8``).
    weight_output_channels: Total number of output channels stored in the
        weights buffer. Defaults to ``output_channels``. Set higher than
        ``output_channels`` when the weights buffer is shared across
        multiple workers that each produce a slice of the output (the
        ``channel_offset`` runtime arg selects a worker's slice).

Returns:
    ExternalFunction configured for the conv2dk3 kernel.

Raises:
    ValueError: When ``act_dtype`` is not ``np.int8`` or ``np.uint8``.