transform.air.herd_vectorize (transform::AIRHerdVectorizeOp)Vectorize operations inside air.herd operations
Syntax:
operation ::= `transform.air.herd_vectorize` $target attr-dict
This transform takes a handle to air.herd operations and vectorizes the operations inside their bodies using the same logic as the AIRHerdVectorizePass. It walks the body of each herd operation and applies vectorization patterns to linalg operations and other vectorizable operations.
The transform supports the same options as the AIRHerdVectorizePass:
Example:
%herd = transform.structured.match ops{["air.herd"]} in %f : (!pdl.operation) -> !pdl.operation
%vectorized = transform.air.herd_vectorize %herd {
vectorize_nd_extract = false,
flatten_1d_depthwise_conv = false,
vectorize_padding = true
} : (!pdl.operation) -> !pdl.operation
Returns a handle to the transformed air.herd operations.
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
vectorize_nd_extract | ::mlir::BoolAttr | bool attribute |
flatten_1d_depthwise_conv | ::mlir::BoolAttr | bool attribute |
disable_transfer_permutation_map_lowering_patterns | ::mlir::BoolAttr | bool attribute |
disable_multi_reduction_to_contract_patterns | ::mlir::BoolAttr | bool attribute |
vectorize_padding | ::mlir::BoolAttr | bool attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.hoist_static_alloc (transform::AIRHoistStaticAllocOp)Hoist static allocations.
Syntax:
operation ::= `transform.air.hoist_static_alloc` $target attr-dict `:` functional-type(operands, results)
Moves certain statically-sized memref.alloc operations from inner blocks
to the entry block of the target function. This shortens and unifies buffer
lifetimes, which can unlock reuse and downstream optimizations.
memref.alloc buffers with static shapes.scf.yield, func.return) are not rewritten; such allocations are skipped.Before:
func.func @foo(%arg0: memref<64xi32>) {
scf.for %i = %c0 to %c4 step %c1 {
%tmp = memref.alloc() : memref<64xi32>
linalg.fill ins(%cst : i32) outs(%tmp : memref<64xi32>)
memref.dealloc %tmp : memref<64xi32>
}
return
}
After:
func.func @foo(%arg0: memref<64xi32>) {
%tmp.hoisted = memref.alloc() : memref<64xi32>
scf.for %i = %c0 to %c4 step %c1 {
linalg.fill ins(%cst : i32) outs(%tmp.hoisted : memref<64xi32>)
}
memref.dealloc %tmp.hoisted : memref<64xi32>
return
}
transform.sequence %arg0 : !pdl.operation failures(propagate) {
^bb0(%f: !pdl.operation):
transform.air.hoist_static_alloc %f
: (!pdl.operation) -> ()
}
Traits: ReportTrackingListenerFailuresOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
transform.air.convert_memref_copy_to_linalg_copy (transform::ConvertMemrefCopyToLinalgCopyOp)Convert memref.copy operations to linalg.copy operations
Syntax:
operation ::= `transform.air.convert_memref_copy_to_linalg_copy` $target attr-dict
This transform converts memref.copy operations to linalg.copy operations.
This can be useful for enabling further linalg-based optimizations and transformations.
The transformation replaces:
memref.copy %source, %dest : memref<...> to memref<...>
With:
linalg.copy ins(%source : memref<...>) outs(%dest : memref<...>)
Returns a handle to the modified operation containing the transformed copies.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.copy_to_dma (transform::CopyToDmaOp)Syntax:
operation ::= `transform.air.copy_to_dma` $target attr-dict
Transform a memref.copy operation into a air.dma_memcpy_nd operation.
Returns the new air.dma_memcpy_nd operation.
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.eliminate_cascade_memcpy (transform::EliminateCascadeMemcpyOp)Eliminate intermediate memref buffers in cascaded DMA operations
Syntax:
operation ::= `transform.air.eliminate_cascade_memcpy` $target attr-dict
This transform identifies and eliminates intermediate memref buffers in cascaded air.dma_memcpy_nd operations. It looks for the pattern where an intermediate buffer is used exactly twice: once as the destination of a DMA operation and once as the source of another DMA operation, with both operations using default access patterns (empty offsets, sizes, and strides).
The transformation replaces:
air.dma_memcpy_nd (%intermediate[] [] [], %source[] [] []) : (memref<...>, memref<...>)
air.dma_memcpy_nd (%dest[] [] [], %intermediate[] [] []) : (memref<...>, memref<...>)
With:
air.dma_memcpy_nd (%dest[] [] [], %source[] [] []) : (memref<...>, memref<...>)
This optimization eliminates unnecessary intermediate memory allocations and reduces memory traffic, which is particularly beneficial for cascade patterns in AIR programs.
Returns a handle to the modified operation.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.eliminate_redundant_vector_transfers (transform::EliminateRedundantVectorTransfersOp)Eliminate redundant vector.transfer_read operations
Syntax:
operation ::= `transform.air.eliminate_redundant_vector_transfers` $target attr-dict
This transform identifies and eliminates redundant vector.transfer_read operations within the target operation. Two vector.transfer_read operations are considered redundant when:
The transformation walks through all vector.transfer_read operations in the target, compares each pair, and when a redundant read is found, replaces all uses of the second read with the result of the first read, then erases the redundant operation.
This optimization is particularly useful after loop unrolling or other transformations that may duplicate read operations unnecessarily, reducing memory traffic and register pressure.
Example:
// Before:
%0 = vector.transfer_read %memref[%i, %j], %pad : memref<8x8xi32>, vector<4xi32>
%1 = vector.add %0, %cst : vector<4xi32>
%2 = vector.transfer_read %memref[%i, %j], %pad : memref<8x8xi32>, vector<4xi32> // Redundant!
%3 = vector.mul %2, %other : vector<4xi32>
// After:
%0 = vector.transfer_read %memref[%i, %j], %pad : memref<8x8xi32>, vector<4xi32>
%1 = vector.add %0, %cst : vector<4xi32>
%3 = vector.mul %0, %other : vector<4xi32> // Uses %0 instead of redundant %2
Returns a handle to the transformed operation.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.flatten_for_iter_args (transform::FlattenForIterArgsOp)Flatten vector-typed iter_args of an scf.for loop using vector.shape_cast
Syntax:
operation ::= `transform.air.flatten_for_iter_args` $target attr-dict
This transform takes a handle to an scf.for loop and flattens all vector-typed iter_args by inserting vector.shape_cast operations. The transformation:
This is useful for ensuring that loop-carried dependencies use flattened vector types, which can be required by certain backend lowerings or optimization passes.
Example:
// Before:
%result:4 = scf.for %i = %c0 to %c4 step %c1
iter_args(%arg0 = %v0, %arg1 = %v1, %arg2 = %v2, %arg3 = %v3)
-> (vector<1x1x8x8xi16>, vector<1x1x8x8xi16>, vector<1x1x8x8xi16>, vector<1x1x8x8xi16>) {
// ... computation ...
scf.yield %r0, %r1, %r2, %r3 : vector<1x1x8x8xi16>, vector<1x1x8x8xi16>, vector<1x1x8x8xi16>, vector<1x1x8x8xi16>
}
// After:
%v0_flat = vector.shape_cast %v0 : vector<1x1x8x8xi16> to vector<64xi16>
%v1_flat = vector.shape_cast %v1 : vector<1x1x8x8xi16> to vector<64xi16>
%v2_flat = vector.shape_cast %v2 : vector<1x1x8x8xi16> to vector<64xi16>
%v3_flat = vector.shape_cast %v3 : vector<1x1x8x8xi16> to vector<64xi16>
%result:4 = scf.for %i = %c0 to %c4 step %c1
iter_args(%arg0 = %v0_flat, %arg1 = %v1_flat, %arg2 = %v2_flat, %arg3 = %v3_flat)
-> (vector<64xi16>, vector<64xi16>, vector<64xi16>, vector<64xi16>) {
%arg0_shaped = vector.shape_cast %arg0 : vector<64xi16> to vector<1x1x8x8xi16>
%arg1_shaped = vector.shape_cast %arg1 : vector<64xi16> to vector<1x1x8x8xi16>
%arg2_shaped = vector.shape_cast %arg2 : vector<64xi16> to vector<1x1x8x8xi16>
%arg3_shaped = vector.shape_cast %arg3 : vector<64xi16> to vector<1x1x8x8xi16>
// ... computation using %arg0_shaped, %arg1_shaped, etc. ...
%r0_flat = vector.shape_cast %r0 : vector<1x1x8x8xi16> to vector<64xi16>
%r1_flat = vector.shape_cast %r1 : vector<1x1x8x8xi16> to vector<64xi16>
%r2_flat = vector.shape_cast %r2 : vector<1x1x8x8xi16> to vector<64xi16>
%r3_flat = vector.shape_cast %r3 : vector<1x1x8x8xi16> to vector<64xi16>
scf.yield %r0_flat, %r1_flat, %r2_flat, %r3_flat : vector<64xi16>, vector<64xi16>, vector<64xi16>, vector<64xi16>
}
Returns a handle to the transformed loop.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.forall_with_reduce_to_parallel (transform::ForallWithReduceToParallelOp)Converts a pattern of scf.forall and linalg.reduce to scf.parallel
Syntax:
operation ::= `transform.air.forall_with_reduce_to_parallel` $target attr-dict `:` functional-type(operands, results)
.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
transformed |
variadic of PDL handle to an mlir::Operation * |
transform.air.fuse_extf_linalg (transform::FuseExtfLinalgOp)Fuse a linalg operation containing only arith.extf with its consumer
Syntax:
operation ::= `transform.air.fuse_extf_linalg` $first_op `,` $second_op attr-dict
This transform fuses two linalg operations where:
The fusion is performed by:
This optimization folds the arithmetic extensions into the linalg ops, and enables the use of native native intrinsics on narrower datatypes, such as AMD AIEs.
Example:
// Before fusion:
%0 = linalg.generic {
^bb0(%arg0: f16):
%1 = arith.extf %arg0 : f16 to f32
linalg.yield %1 : f32
} ins(%input : tensor<16xf16>) outs(%temp : tensor<16xf32>)
%result = linalg.generic {
^bb0(%arg0: f32, %arg1: f32):
%2 = arith.addf %arg0, %arg1 : f32
linalg.yield %2 : f32
} ins(%0, %other : tensor<16xf32>, tensor<16xf32>) outs(%output : tensor<16xf32>)
// After fusion:
%result = linalg.generic {
^bb0(%arg0: f16, %arg1: f32):
%1 = arith.extf %arg0 : f16 to f32
%2 = arith.addf %1, %arg1 : f32
linalg.yield %2 : f32
} ins(%input, %other : tensor<16xf16>, tensor<16xf32>) outs(%output : tensor<16xf32>)
Returns a handle to the fused operation (the second operation after modification).
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
first_op |
PDL handle to an mlir::Operation * |
second_op |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
fused_op |
PDL handle to an mlir::Operation * |
transform.air.fuse_into_containing_op (transform::FuseIntoContainingMemrefOp)Fuse a producer into a containing operation.
Syntax:
operation ::= `transform.air.fuse_into_containing_op` $producer_op `into` $containing_op attr-dict
Fuses the producer_op into the containing_op.
Returns a handle to the fused ops.
The producer is a subview slice of a tiled op. This transform computes the accessed producer slice inside of the containing op (“tile and fuse”).
The containing op handle must be associated with exactly one payload op. The producer op handle may be associated with multiple payload ops. This transform fuses exactly one producer.
If the producer could not be fused, this operation fails silently. This is the case when tiling fails or when the producer op has zero uses within the containing op. I.e., “producers” that are not consumed within the containing op are rejected by this operation.
This operation reads and frees the producer handle. This operation reads the containing op handle.
Interfaces: MemoryEffectOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
producer_op |
PDL handle to an mlir::Operation * |
containing_op |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
fused_op |
PDL handle to an mlir::Operation * |
transform.air.fuse_truncf_linalg (transform::FuseTruncfLinalgOp)Fuse a linalg operation containing only arith.truncf into its producer
Syntax:
operation ::= `transform.air.fuse_truncf_linalg` $truncf_op `,` $producer_op attr-dict
This transform fuses two linalg operations where:
The fusion is performed by:
This optimization folds the arithmetic truncations into the producer linalg ops, enabling the use of native intrinsics on narrower datatypes, such as AMD AIEs, and reducing intermediate memory storage requirements.
Example:
// Before fusion:
%0 = linalg.generic {
^bb0(%arg0: f32, %arg1: f32):
%1 = arith.addf %arg0, %arg1 : f32
linalg.yield %1 : f32
} ins(%input1, %input2 : tensor<16xf32>, tensor<16xf32>) outs(%temp : tensor<16xf32>)
%result = linalg.generic {
^bb0(%arg0: f32):
%2 = arith.truncf %arg0 : f32 to f16
linalg.yield %2 : f16
} ins(%0 : tensor<16xf32>) outs(%output : tensor<16xf16>)
// After fusion:
%result = linalg.generic {
^bb0(%arg0: f32, %arg1: f32):
%1 = arith.addf %arg0, %arg1 : f32
%2 = arith.truncf %1 : f32 to f16
linalg.yield %2 : f16
} ins(%input1, %input2 : tensor<16xf32>, tensor<16xf32>) outs(%output : tensor<16xf16>)
Returns a handle to the fused operation (the producer operation after modification).
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
truncf_op |
PDL handle to an mlir::Operation * |
producer_op |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
fused_op |
PDL handle to an mlir::Operation * |
transform.air.get_segment_for (transform::GetSegmentForOp)Gets a handle to the parent ‘air.segment’ of the given operation
Syntax:
operation ::= `transform.air.get_segment_for` $target attr-dict
Produces a handle to the parent air.segment op for each payload IR
operation associated with the operand. Fails if a segment cannot be found.
The list of operations associated with the handle contains
parent operations in the same order as the list associated with the operand,
except for operations that are parents to more than one input which are only
present once.
Traits: NavigationTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
parent |
PDL handle to an mlir::Operation * |
transform.air.hoist_cast_pair (transform::HoistCastPairOp)Hoist extension/truncation operation pairs out of a loop
Syntax:
operation ::= `transform.air.hoist_cast_pair` $extension_op `,` $truncation_op `,` $loop_op attr-dict
This transform takes handles to an extension operation (arith.extsi, arith.extui, or arith.extf), a truncation operation (arith.trunci or arith.truncf), and their parent scf.for loop. It hoists the extension/truncation pair out of the loop by:
Supports the following extension/truncation pairs:
This optimization is beneficial when accumulator values are repeatedly extended to a wider type for computation and then truncated back to a narrow type at each iteration. By keeping the accumulator in the wide type throughout all loop iterations, we eliminate redundant extend/truncate operations.
Example (Integer):
// Before:
%init = ... : vector<64xi16>
%result = scf.for %i = %c0 to %c4 step %c1 iter_args(%arg = %init) -> (vector<64xi16>) {
%arg_shaped = vector.shape_cast %arg : vector<64xi16> to vector<1x1x8x8xi16>
%arg_ext = arith.extsi %arg_shaped : vector<1x1x8x8xi16> to vector<1x1x8x8xi32>
// ... computation using %arg_ext ...
%result_i32 = vector.contract ... : ... into vector<1x1x8x8xi32>
%result_i16 = arith.trunci %result_i32 : vector<1x1x8x8xi32> to vector<1x1x8x8xi16>
%result_flat = vector.shape_cast %result_i16 : vector<1x1x8x8xi16> to vector<64xi16>
scf.yield %result_flat : vector<64xi16>
}
// After:
%init = ... : vector<64xi16>
%init_shaped = vector.shape_cast %init : vector<64xi16> to vector<1x1x8x8xi16>
%init_ext = arith.extsi %init_shaped : vector<1x1x8x8xi16> to vector<1x1x8x8xi32>
%init_flat = vector.shape_cast %init_ext : vector<1x1x8x8xi32> to vector<64xi32>
%result_i32 = scf.for %i = %c0 to %c4 step %c1 iter_args(%arg = %init_flat) -> (vector<64xi32>) {
%arg_shaped = vector.shape_cast %arg : vector<64xi32> to vector<1x1x8x8xi32>
// ... computation using %arg_shaped directly (no extsi needed) ...
%result_i32 = vector.contract ... : ... into vector<1x1x8x8xi32>
%result_flat = vector.shape_cast %result_i32 : vector<1x1x8x8xi32> to vector<64xi32>
scf.yield %result_flat : vector<64xi32>
}
%result_shaped = vector.shape_cast %result_i32 : vector<64xi32> to vector<1x1x8x8xi32>
%result_i16 = arith.trunci %result_shaped : vector<1x1x8x8xi32> to vector<1x1x8x8xi16>
%result = vector.shape_cast %result_i16 : vector<1x1x8x8xi16> to vector<64xi16>
Example (Floating-point):
// Before:
%init = ... : vector<64xbf16>
%result = scf.for %i = %c0 to %c4 step %c1 iter_args(%arg = %init) -> (vector<64xbf16>) {
%arg_ext = arith.extf %arg : vector<64xbf16> to vector<64xf32>
// ... computation using %arg_ext ...
%result_f32 = vector.fma ... : vector<64xf32>
%result_bf16 = arith.truncf %result_f32 : vector<64xf32> to vector<64xbf16>
scf.yield %result_bf16 : vector<64xbf16>
}
// After:
%init = ... : vector<64xbf16>
%init_ext = arith.extf %init : vector<64xbf16> to vector<64xf32>
%result_f32 = scf.for %i = %c0 to %c4 step %c1 iter_args(%arg = %init_ext) -> (vector<64xf32>) {
// ... computation using %arg directly (no extf needed) ...
%result_f32 = vector.fma ... : vector<64xf32>
scf.yield %result_f32 : vector<64xf32>
}
%result = arith.truncf %result_f32 : vector<64xf32> to vector<64xbf16>
Requirements:
Returns a handle to the transformed loop.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectOpInterface, MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
extension_op |
PDL handle to an mlir::Operation * |
truncation_op |
PDL handle to an mlir::Operation * |
loop_op |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.hoist_loop_invariant_transfers (transform::HoistLoopInvariantTransfersOp)Hoist a pair of loop-invariant vector.transfer_read/write operations
Syntax:
operation ::= `transform.air.hoist_loop_invariant_transfers` $read_op `,` $write_op `,` $loop_op attr-dict
This transform takes handles to a vector.transfer_read, a vector.transfer_write, and their parent scf.for loop. If both operations have loop-invariant indices and operate on the same memref, it hoists them outside the loop along with any operations needed to compute their operands (like affine.apply operations).
The read is hoisted before the loop, and the write is hoisted after the loop. All necessary operand-producing operations (constants, affine.apply, etc.) are also hoisted to maintain SSA dominance.
Example:
// Before:
scf.for %i = %c0 to %c4 step %c1 {
%idx = affine.apply #map()[%x]
%val = vector.transfer_read %A[%x, %idx], %pad : memref<8x8xi32>, vector<4xi32>
// ... computation using %val ...
%result = ... // some computation
vector.transfer_write %result, %A[%x, %idx] : vector<4xi32>, memref<8x8xi32>
}
// After:
%idx = affine.apply #map()[%x]
%val = vector.transfer_read %A[%x, %idx], %pad : memref<8x8xi32>, vector<4xi32>
scf.for %i = %c0 to %c4 step %c1 {
// ... computation using %val ...
%result = ... // some computation
}
vector.transfer_write %result, %A[%x, %idx] : vector<4xi32>, memref<8x8xi32>
Requirements:
Returns a handle to the transformed loop.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectOpInterface, MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
read_op |
PDL handle to an mlir::Operation * |
write_op |
PDL handle to an mlir::Operation * |
loop_op |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.hoist_vector_transfer_pointers (transform::HoistVectorTransferPointersOp)Optimize vector transfers by hoisting pointer computations out of loops
Syntax:
operation ::= `transform.air.hoist_vector_transfer_pointers` $target attr-dict
This transform takes a handle to an scf.for loop and optimizes vector transfer operations (vector.transfer_read and vector.transfer_write) inside the loop by:
This optimization converts expensive multi-dimensional address calculations inside loops into simple “pointer + constant” arithmetic with iter_args, which is particularly beneficial for hardware accelerators with limited address computation capabilities.
Example with IV-dependent indices:
// Before:
scf.for %i = %c0 to %c8 step %c1 {
%val = vector.transfer_read %mem[%c0, %i], %pad
: memref<32x32xi16>, vector<8x8xi16>
// ... computation ...
vector.transfer_write %result, %mem[%c0, %i]
: vector<8x8xi16>, memref<32x32xi16>
}
// After:
%flat_mem = memref.collapse_shape %mem [[0, 1]] : memref<32x32xi16> into memref<1024xi16>
%base_ptr = affine.apply affine_map<(d0, d1) -> (d0 * 32 + d1)>(%c0, %c0)
%stride = arith.constant 1 : index
scf.for %i = %c0 to %c8 step %c1 iter_args(%ptr = %base_ptr) -> (index) {
%val_1d = vector.transfer_read %flat_mem[%ptr], %pad : memref<1024xi16>, vector<64xi16>
%val = vector.shape_cast %val_1d : vector<64xi16> to vector<8x8xi16>
// ... computation ...
%result_1d = vector.shape_cast %result : vector<8x8xi16> to vector<64xi16>
vector.transfer_write %result_1d, %flat_mem[%ptr] : vector<64xi16>, memref<1024xi16>
%next_ptr = arith.addi %ptr, %stride : index
scf.yield %next_ptr : index
}
Requirements:
Returns a handle to the transformed loop.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.linalg_promote (transform::LinalgPromoteOp)Syntax:
operation ::= `transform.air.linalg_promote` $target attr-dict
Promotes the specified operands of the target into a separate memory buffer
using the mlir::linalg::promoteSubViews utility.
This operation applies to Linalg ops that satisfy the
mlir::linalg::promoteSubviewsPrecondition, otherwise it fails.
When successful, several optimization passes are run on the resulting IR.
The return handle points to the target operation that was modified
inplace.
The operation accepts as attributes the fields in
mlir::linalg::LinalgPromotionOptions. In addition the memory space in
allocated buffers can be specified with with the memory_space attribute as
“L1”, “L2” or “L3”. The default memory space is L1.
example:
%0 = transform.structured.match ops{["linalg.matmul"]} in %code : (!pdl.operation) -> !pdl.operation
%1 = transform.air.linalg_promote %0 {memory_space="L2", operands_to_promote=[0]}
The group_size attribute is used to apply promotion to multiple
linalg ops. When group_size=N, the operands_to_promote attribute refers to
N payload operations at a time and the operand indices apply to the
operands of the N operations in the order they appear in the target handle.
For example,
%m = transform.structured.match ops{["linalg.matmul"]} in %f : (!pdl.operation) -> !pdl.operation
%f = transform.structured.match ops{["linalg.fill"]} in %f : (!pdl.operation) -> !pdl.operation
%h = transform.merge_handles %f, %m : !pdl.operation
// promote the input of the fill operation and the output of the matmul operation to L1 memory
transform.air.linalg_promote %h {"group_size"=2, "operands_to_promote"=[1,4], "memory_space"="L1"}
Interfaces: MemoryEffectOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
operands_to_promote | ::mlir::ArrayAttr | 64-bit integer array attribute |
group_size | ::mlir::IntegerAttr | 64-bit signless integer attribute |
use_full_tile_buffers | ::mlir::ArrayAttr | 1-bit boolean array attribute |
use_full_tiles_by_default | ::mlir::UnitAttr | unit attribute |
use_alloca | ::mlir::UnitAttr | unit attribute |
alignment | ::mlir::IntegerAttr | 64-bit signless integer attribute |
memory_space | ::mlir::StringAttr | string attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
transformed |
PDL handle to an mlir::Operation * |
transform.air.linalg_tile (transform::LinalgTileOp)Tile a linalg operation with the given sizes. The new linalg
operantion and the generated loop are returned. Tiling is
performed with the transform::tileToForallOpImpl so that an
scf.forall loop is generated whenever possible.
This is a variant of transform.structured.tile_using_forall.
Interfaces: MemoryEffectOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
static_sizes | ::mlir::DenseI64ArrayAttr | i64 dense array attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
dynamic_sizes |
variadic of PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
tiled_linalg_op |
PDL handle to an mlir::Operation * |
loops |
PDL handle to an mlir::Operation * |
transform.air.linalg_to_library_call (transform::LinalgToLibraryCallOp)Convert a linalg op to a function call (library call)
Syntax:
operation ::= `transform.air.linalg_to_library_call` $target attr-dict `:` functional-type(operands, results)
Replaces a linalg op with a call to a function. If the function_name
attribute is provided, it is used as the function name. Otherwise, the
linalg op’s library_call attribute is used. The function is created if
it does not exist. If the link_with attribute is provided, it is used
to link the function call to a prebuilt object that contains the
implementation of the function. If the linalg op is inside a herd, the
link_with attribute is propagated to the herd.
Example:
%matmul = transform.structured.match ops{["linalg.matmul"]} in %f : (!pdl.operation) -> !pdl.operation
%call = transform.air.linalg_to_library_call %matmul { function_name = "my_matmul", link_with = "extern_func.o" } : (!pdl.operation) -> !pdl.operation
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
function_name | ::mlir::StringAttr | string attribute |
link_with | ::mlir::StringAttr | string attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.par_to_herd (transform::ParToHerdOp)Syntax:
operation ::= `transform.air.par_to_herd` $target attr-dict
Transform a scf.parallel operation into a air.herd operation.
If the scf.parallel operation has more than two dimensions, then only
the last two are used and a new scf.parallel is created outside of the
herd. Returns the new air.herd operation.
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
first_dim | ::mlir::IntegerAttr | 64-bit signless integer attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.par_to_launch (transform::ParToLaunchOp)Syntax:
operation ::= `transform.air.par_to_launch` $target attr-dict
Transform a scf.parallel operation into a air.launch operation.
Returns the new air.launch operation.
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
has_air_segment | ::mlir::BoolAttr | bool attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.par_to_segment (transform::ParToSegmentOp)Syntax:
operation ::= `transform.air.par_to_segment` $target attr-dict
Transform a scf.parallel operation into a air.segment operation.
Returns the new air.segment operation.
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
has_air_segment | ::mlir::BoolAttr | bool attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.pipeline_reduce (transform::PipelineReduceOp)Syntax:
operation ::= `transform.air.pipeline_reduce` $target attr-dict
Experimental
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
tile_size | ::mlir::ArrayAttr | 64-bit integer array attribute |
pipeline_depth | ::mlir::IntegerAttr | 64-bit signless integer attribute |
direction | ::mlir::StringAttr | string attribute |
promote | ::mlir::UnitAttr | unit attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.remove_uninitialized_copy (transform::RemoveUninitializedCopyOp)Remove copy operations that copy from uninitialized memrefs
Syntax:
operation ::= `transform.air.remove_uninitialized_copy` $target attr-dict
This transform walks through a func.func operation and identifies memref.copy and linalg.copy operations where the source is an uninitialized memref (allocated but not written to). Such copy operations are erased as they copy undefined data.
The transform detects the pattern where:
Returns a handle to the modified function.
Examples:
// memref.copy case
%alloc = memref.alloc() : memref<2x16x8xi32, 1>
%subview = memref.subview %alloc[0, 0, 0] [1, 16, 8] [1, 1, 1] : ...
%target = memref.alloc() : memref<1x16x8xi32, 2>
memref.copy %subview, %target // <- This copy will be erased
// linalg.copy case
%alloc2 = memref.alloc() : memref<16x8xi32, 1>
%target2 = memref.alloc() : memref<16x8xi32, 2>
linalg.copy ins(%alloc2 : memref<16x8xi32, 1>) outs(%target2 : memref<16x8xi32, 2>) // <- This copy will be erased
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.segment_to_aie (transform::SegmentToAIEOp)Syntax:
operation ::= `transform.air.segment_to_aie` $target attr-dict
Lower air.segment operations to mlir-aie modules.
Traits: FunctionalStyleTransformOpTrait, TransformEachOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
transformed |
PDL handle to an mlir::Operation * |
transform.air.transpose_reduce (transform::TransposeReduceOp)Transpose inputs of linalg.reduce ops to make reduction dimensions innermost
Syntax:
operation ::= `transform.air.transpose_reduce` $target attr-dict
This transform takes a handle to linalg.reduce operations and checks if the reduction dimensions are at the innermost (last/lowest) dimensions. If any reduction dimension has non-reduction dimensions to the right, it transposes the corresponding inputs to ensure all reduction dimensions are innermost.
For example, if a linalg.reduce operation reduces along dimension 1 in a 3D tensor (shape [M, N, K] reducing along N), this transform will transpose the input to [M, K, N] so that the reduction dimension N becomes innermost.
This optimization is beneficial for hardware accelerators that perform more efficient reductions when the reduction dimensions are contiguous and innermost.
The transformation:
Returns a handle to the transformed linalg.reduce operations.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |
transform.air.vector_type_cast (transform::VectorTypeCastOp)Cast vector operands and results of vector operations to a user-provided datatype
Syntax:
operation ::= `transform.air.vector_type_cast` $target attr-dict
This transform takes a handle to vector dialect operations and casts input operands and/or results of vector type to a user-provided datatype. By default, if none of input_indices or output_indices are specified, all vector operands and results are cast.
The transformation works by:
This optimization is useful for hardware accelerators that can perform vector operations natively on specific data types (e.g., bf16, f16) while maintaining compatibility with the original precision through selective casting.
Example 1 - Cast all inputs and outputs (default behavior):
// Before:
%result = vector.fma %a, %b, %c : vector<8xf32>
// After (with target_element_type = f16):
%a_cast = arith.truncf %a : vector<8xf32> to vector<8xf16>
%b_cast = arith.truncf %b : vector<8xf32> to vector<8xf16>
%c_cast = arith.truncf %c : vector<8xf32> to vector<8xf16>
%result_f16 = vector.fma %a_cast, %b_cast, %c_cast : vector<8xf16>
%result = arith.extf %result_f16 : vector<8xf16> to vector<8xf32>
Example 2 - Cast only specific inputs:
// Before:
%result = vector.fma %a, %b, %c : vector<8xf32>
// After (with target_element_type = f16, input_indices = [0, 1]):
%a_cast = arith.truncf %a : vector<8xf32> to vector<8xf16>
%b_cast = arith.truncf %b : vector<8xf32> to vector<8xf16>
%result_f16 = vector.fma %a_cast, %b_cast, %c : vector<8xf16, f32, f32>
%result = arith.extf %result_f16 : vector<8xf16> to vector<8xf32>
Example 3 - Cast only outputs:
// Transform only the output
transform.air.vector_type_cast %op {
target_element_type = f16,
output_indices = [0]
}
Attributes:
Returns a handle to the modified operations containing the transformed vector operations.
Traits: FunctionalStyleTransformOpTrait
Interfaces: MemoryEffectsOpInterface, TransformOpInterface
| Attribute | MLIR Type | Description |
|---|---|---|
target_element_type | ::mlir::TypeAttr | any type attribute |
input_indices | ::mlir::ArrayAttr | 64-bit integer array attribute |
output_indices | ::mlir::ArrayAttr | 64-bit integer array attribute |
| Operand | Description |
|---|---|
target |
PDL handle to an mlir::Operation * |
| Result | Description |
|---|---|
result |
PDL handle to an mlir::Operation * |