‘air’ Dialect

The air dialect is used to describe cooperating arrays of AIE cores and the bulk data movement between AIE cores and a memory hierarchy. Its primary features are collective operations for executing work on groups of AIE cores and multi-dimensional DMA copy operations.

Herd

A herd in the air dialect is a one or two dimensional array of adjacent AIE cores executing the same code region. A herd operation defines an iteration space representing the spatial parallelism of the array of cores. Code within a herd can directly access L1 memory but must use DMA operations or channels to access other levels of memory.

Segment

A segment in the air dialect represents a physically contiguous grouping of AIE cores, L1 and L2 memory resources, and controllers sufficient to implement the herds, memory allocations, data movement, and synchronization contained in the segment code region. Code within a segment can allocate L2 memory but must use DMA operations or channels to access other levels of memory. A segment can optionally define an iteration space which represents a spatial unroll of the segment. That is, it allows the segment to be “stamped out” multiple times, also multiplying the physical resources required by the segment.

Launch

A launch in the air dialect groups segments and L3 allocations that must be co-resident within a device when execution of the launch code region begins. A launch operation can optionally define a parallel iteration space.

Memory Hierarchy

There are three levels of memory in the air dialect:

  • L1 memory corresponds to the AIE tile memory.
  • L2 memory corresponds to pools of on-device memory such as of URAMs, memory tiles, or other memory resources.
  • L3 memory corresponds to off-chip memory such as DDR. Each memory space has an enum defined in the air::MemorySpace namespace.

DMA Memory Copies

The dma_memcpy_nd operation in air dialect describes asynchronous multi-dimensional copy operations between levels of memory.

Channels

A channel in the air dialect is a representation of data movement…

Asynchronous operations

Operations in air dialect can be asynchronous. An asynchronous token interface is provided to synchronize between operations.

[TOC]

Operations

air.channel (xilinx::air::ChannelOp)

Channel for data movement.

Syntax:

operation ::= `air.channel` $sym_name $size attr-dict

Operation to represent a communication channel as a point-to-point connection between two memrefs. The array following the channel name symbol represents the channel’s dimensional sizes. Default size, with empty size array, is 1. The data movement mechanism that the channel uses is controlled by the channel_type attribute.

Channel Types

The channel_type attribute is a string that determines the mechanism used for data movement. Values are namespaced by backend: NPU (AIE) channels use the npu_ prefix; GPU channels use the gpu_ prefix.

NPU (AIE) channel types:

  • “npu_dma_stream” (default): Use DMA engines to send and receive data, with routing performed over a streaming interconnect.
  • “npu_dma_packet”: Use DMA engines to send and receive data, with routing performed over a packet-switched network.
  • “npu_cascade”: Use processor cores to send and receive data via cascade connections between adjacent tiles.
  • “npu_mmio”: Use host-side MMIO writes (e.g. aiex.npu.blockwrite) issued from the runtime sequence to deliver a constant payload directly into a tile-local L1 buffer. No DMA channel, no shim allocation, no flow is reserved. Verifier-enforced constraints on the put/get sites:
    • the put source memref must live in L3 (memory_space=0);
    • the get destination memref must live in L1 (memory_space=2). The lowering further requires the put source to be a constant memref.get_global. The consumer-side get lowers to a no-op because the L1 buffer is already populated when the core begins executing.

GPU channel types:

  • “gpu_symmetric_heap”: Cross-GPU messaging through the symmetric heap runtime (runtime_lib/airgpu/symmetric_heap.{h,cpp}). The channel must be enclosed by an air.rank op; the put/get sites use rank indices to address peer heaps. Lowering will be added by a future GPU pass (planned: air-gpu-channel-to-mgpu) which expands put/get to peer-mapped mgpuMemcpy calls plus a barrier; this PR introduces only the IR surface and verifier rules.

Broadcasting

If a channel broadcasts to multiple destinations, the optional broadcast_shape attribute annotates the output sizes after broadcasting. Broadcasting follows NumPy’s broadcasting rules.

Example:

// An array of 4 x 4 streaming DMA channels (NPU)
air.channel @channel_0 [4, 4] {channel_type = "npu_dma_stream"}

// A streaming DMA channel broadcasting to 4 destinations (NPU)
air.channel @channel_1 [1, 1] {broadcast_shape = [1, 4], channel_type = "npu_dma_stream"}

// An array of 1 x 4 streaming DMA channels broadcasting to 4 x 4 destinations (NPU).
// Broadcasting follows NumPy's rules.
air.channel @channel_2 [1, 4] {broadcast_shape = [4, 4], channel_type = "npu_dma_stream"}

// A packet-switched DMA channel (NPU)
air.channel @channel_3 [] {channel_type = "npu_dma_packet"}

// A cascade channel using core-to-core cascade connections (NPU)
air.channel @channel_4 [] {channel_type = "npu_cascade"}

// An MMIO channel: the put writes a constant from host into L1 of each
// get's destination tile via runtime-sequence blockwrites (NPU)
air.channel @channel_5 [] {channel_type = "npu_mmio"}

// A cross-GPU channel through the symmetric heap (GPU). Must appear inside
// an air.rank scope; the indices on put/get encode the peer rank.
air.channel @channel_6 [] {channel_type = "gpu_symmetric_heap"}

Interfaces: Symbol

Attributes:

AttributeMLIR TypeDescription
sym_name::mlir::StringAttrstring attribute
size::mlir::ArrayAttr64-bit integer array attribute
channel_type::mlir::StringAttrstring attribute

air.channel.get (xilinx::air::ChannelGetOp)

Get for air channels.

Syntax:

operation ::= `air.channel.get` custom<AsyncDependencies>(type($async_token), $async_dependencies)
              $chan_name `[` ($indices^)? `]`
              `(` $dst `[` ($dst_offsets^)? `]``[` ($dst_sizes^)? `]``[` ($dst_strides^)? `]` `)` attr-dict `:`
              `(` type($dst) `)`

The air.channel.get operation represents a pull (receive) operation that copies data from a specified channel into a destination memref.

This operation models one-way data movement from a channel endpoint into memory, enabling asynchronous communication where data previously sent by a corresponding air.channel.put becomes available to the consumer.

Semantics

  • The destination buffer is specified by the dst memref, along with its associated dst_offsets, dst_sizes, and dst_strides which describe the subview being written to.
  • The channel being read is identified by the symbol referenced by chan_name.
  • The channel must have been declared earlier via an air.channel operation.
  • The operation may be asynchronous: if an async token is produced, it can be used to synchronize with subsequent dependent operations.
  • The specific channel it operates on, when chan_name references an array of channels, is identified by indices.
  • Optionally, pad_before and pad_after specify constant zero-padding to apply per dimension during the DMA transfer. This maps to hardware DMA buffer descriptor padding on AIE memtile DMAs.

Interfaces

  • Implements air_AsyncOpInterface, enabling participation in async dependency chains.
  • Implements air_MemcpyInterface, allowing it to behave like a DMA/memcpy operation.
  • Implements air_ChannelInterface, allowing inspection of channel properties.

Example

// Receive a 4x4 tile into %dst from channel @chan_0
air.channel.get @chan_0(%dst[%c0, %c0][%c4, %c4][%c1, %c1]) : (memref<16x16xf32>)

// Asynchronous get with dependency on %t1
%t2 = air.channel.get async [%t1] @chan_1(%dst[%c8, %c0][%c4, %c4][%c1, %c1]) : (memref<16x16xf32>)

Traits: AttrSizedOperandSegments

Interfaces: TilingInterface, air_AsyncOpInterface, air_ChannelInterface, air_MemcpyInterface

Attributes:

AttributeMLIR TypeDescription
chan_name::mlir::FlatSymbolRefAttrflat symbol reference attribute
pad_before::mlir::DenseI32ArrayAttri32 dense array attribute
pad_after::mlir::DenseI32ArrayAttri32 dense array attribute

Operands:

Operand Description
async_dependencies variadic of async token type
indices variadic of index
dst ranked or unranked memref of any type values
dst_offsets variadic of index
dst_sizes variadic of index
dst_strides variadic of index

Results:

Result Description
async_token async token type

air.channel.put (xilinx::air::ChannelPutOp)

Push for air channels.

Syntax:

operation ::= `air.channel.put` custom<AsyncDependencies>(type($async_token), $async_dependencies)
              $chan_name `[` ($indices^)? `]`
              `(` $src `[` ($src_offsets^)? `]``[` ($src_sizes^)? `]``[` ($src_strides^)? `]` `)` attr-dict `:`
              `(` type($src) `)`

The air.channel.put operation represents a push (send) operation that copies data from a source memref into a specified channel.

This operation models one-way data movement into a channel endpoint, enabling asynchronous communication between producer and consumer operations. It is typically paired with air.channel.get operations on the receiving side.

Semantics

  • The source data is specified by the src memref, along with its associated src_offsets, src_sizes, and src_strides which describe the subview being transferred.
  • The channel being targeted is identified by the symbol referenced by chan_name.
  • The channel must have been declared earlier via an air.channel operation.
  • The operation may be asynchronous: if an async token is produced, it can be used to synchronize with subsequent dependent operations.
  • The specific channel it operates on, when chan_name references an array of channels, is identified by indices.
  • Optionally, pad_before and pad_after specify constant zero-padding to apply per dimension during the DMA transfer. This maps to hardware DMA buffer descriptor padding on AIE memtile DMAs.

Interfaces

  • Implements air_AsyncOpInterface, allowing it to participate in async dependency chains.
  • Implements air_MemcpyInterface, enabling it to behave like a DMA/memcpy operation.
  • Implements air_ChannelInterface, allowing inspection of channel properties.

Example

// Send a 4x4 tile from %src into channel @chan_0
air.channel.put @chan_0(%src[%c0, %c0][%c4, %c4][%c1, %c1]) : (memref<16x16xf32>)

// Asynchronous put with dependency on %t0
%t1 = air.channel.put async [%t0] @chan_1(%src[%c8, %c0][%c4, %c4][%c1, %c1]) : (memref<16x16xf32>)

// Put with padding: read 13 elements, pad 2 before and 1 after
air.channel.put @chan_2(%src[%c0] [13] [%c1])
    {pad_before = array<i32: 2>, pad_after = array<i32: 1>} : (memref<16xi32>)

Traits: AttrSizedOperandSegments

Interfaces: TilingInterface, air_AsyncOpInterface, air_ChannelInterface, air_MemcpyInterface

Attributes:

AttributeMLIR TypeDescription
chan_name::mlir::FlatSymbolRefAttrflat symbol reference attribute
pad_before::mlir::DenseI32ArrayAttri32 dense array attribute
pad_after::mlir::DenseI32ArrayAttri32 dense array attribute

Operands:

Operand Description
async_dependencies variadic of async token type
indices variadic of index
src ranked or unranked memref of any type values
src_offsets variadic of index
src_sizes variadic of index
src_strides variadic of index

Results:

Result Description
async_token async token type

air.custom (xilinx::air::CustomOp)

A handle to a user-customized op

A placeholder operation for a user-customized op. With user-specified latency value, AIR Runner is able to simulate the system-level performance with this op in place.

Traits: AttrSizedOperandSegments

Interfaces: air_AsyncOpInterface

Attributes:

AttributeMLIR TypeDescription
symbol::mlir::SymbolRefAttrsymbol reference attribute

Operands:

Operand Description
async_dependencies variadic of async token type
custom_operands variadic of any type

Results:

Result Description
async_token async token type

air.dma_memcpy_nd (xilinx::air::DmaMemcpyNdOp)

Dma operator

Syntax:

operation ::= `air.dma_memcpy_nd` custom<AsyncDependencies>(type($async_token), $async_dependencies)
              `(` $dst `[` ($dst_offsets^)? `]``[` ($dst_sizes^)? `]``[` ($dst_strides^)? `]` `,`
              $src `[` ($src_offsets^)? `]``[` ($src_sizes^)? `]``[` ($src_strides^)? `]` `)`  attr-dict `:`
              `(` type($dst) `,` type($src) `)`

N-dimensional strided bulk copy between two memrefs.

Optional src_rank / dst_rank integer attributes name a peer rank in the enclosing air.rank scope. When present, the corresponding memref is interpreted as living on rank R’s symmetric heap rather than on the local process. These attributes are only valid for air.symmetric-tagged memref allocations and require an enclosing air.rank. Lowering for these attributes will be added by a future GPU pass (planned: air-cross-rank- dma-to-mgpu); this PR introduces only the IR surface and verifier rules.

Traits: AttrSizedOperandSegments

Interfaces: air_AsyncOpInterface, air_MemcpyInterface

Attributes:

AttributeMLIR TypeDescription
pad_before::mlir::DenseI32ArrayAttri32 dense array attribute
pad_after::mlir::DenseI32ArrayAttri32 dense array attribute
src_rank::mlir::IntegerAttr64-bit signless integer attribute
dst_rank::mlir::IntegerAttr64-bit signless integer attribute

Operands:

Operand Description
async_dependencies variadic of async token type
dst ranked or unranked memref of any type values
dst_offsets variadic of index
dst_sizes variadic of index
dst_strides variadic of index
src ranked or unranked memref of any type values
src_offsets variadic of index
src_sizes variadic of index
src_strides variadic of index

Results:

Result Description
async_token async token type

air.execute (xilinx::air::ExecuteOp)

Asynchronous code region

Syntax:

operation ::= `air.execute` (` ``[` $async_dependencies^ `]`)?
              (`->` `(` type($results)^ `)`)? regions attr-dict

Defines a code region to be dispatched asynchronously at runtime. All operations in the region must be executed sequentially.

Traits: SingleBlockImplicitTerminator<ExecuteTerminatorOp>, SingleBlock

Interfaces: MemoryEffectOpInterface, air_AsyncOpInterface

Operands:

Operand Description
async_dependencies variadic of async token type

Results:

Result Description
async_token async token type
results variadic of any type

air.execute_terminator (xilinx::air::ExecuteTerminatorOp)

Terminator for air execute.

Syntax:

operation ::= `air.execute_terminator` attr-dict ($results^ `:` type($results))?

A terminator operation for code regions that appear in the body of air.execute operation. The operation takes variable number of operands and produces no results. The operand number and types must match the signature of the air.execute that contains the operation.

Traits: AlwaysSpeculatableImplTrait, HasParent<ExecuteOp>, ReturnLike, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface

Effects: MemoryEffects::Effect{}

Operands:

Operand Description
results variadic of any type

air.herd (xilinx::air::HerdOp)

Herd

Define and run a 1D or 2D array of tiles as an AIR Herd.

Traits: AffineScope, AttrSizedOperandSegments, IsolatedFromAbove, SingleBlockImplicitTerminator<HerdTerminatorOp>, SingleBlock

Interfaces: RegionBranchOpInterface, air_AsyncOpInterface, air_HierarchyInterface

Attributes:

AttributeMLIR TypeDescription
sym_name::mlir::StringAttrstring attribute
link_with::mlir::StringAttrstring attribute

Operands:

Operand Description
async_dependencies variadic of async token type
sizes variadic of index
herd_operands variadic of any type

Results:

Result Description
async_token async token type

air.herd_terminator (xilinx::air::HerdTerminatorOp)

Terminator for air herd regions.

Syntax:

operation ::= `air.herd_terminator` attr-dict

A terminator operation for the body of air.herd operations. air.herd operations are not expected to return any value so the terminator takes no operands.

Traits: AlwaysSpeculatableImplTrait, HasParent<HerdOp>, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

air.launch (xilinx::air::LaunchOp)

Launch

Launch

Traits: AffineScope, AttrSizedOperandSegments, IsolatedFromAbove, SingleBlockImplicitTerminator<LaunchTerminatorOp>, SingleBlock

Interfaces: RegionBranchOpInterface, air_AsyncOpInterface, air_HierarchyInterface

Attributes:

AttributeMLIR TypeDescription
sym_name::mlir::StringAttrstring attribute

Operands:

Operand Description
async_dependencies variadic of async token type
sizes variadic of index
launch_operands variadic of any type

Results:

Result Description
async_token async token type

air.launch_terminator (xilinx::air::LaunchTerminatorOp)

Terminator for air.launch.

Syntax:

operation ::= `air.launch_terminator` attr-dict

A terminator operation for the body of air.launch operations. air.launch operations are not expected to return any value so the terminator takes no operands.

Traits: AlwaysSpeculatableImplTrait, HasParent<LaunchOp>, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

air.rank (xilinx::air::RankOp)

Multi-device rank

Represents a communicating world of rank instances, where each instance corresponds to a complete GPU device or a CPU host process. air.rank is the outermost hierarchy level, sitting above air.launch.

The operation defines an N-dimensional iteration space. Each point is a rank instance. The body is IsolatedFromAbove; values are passed via explicit kernel operands.

An optional universe operand of type !air.universe constrains the physical pool from which rank instances are scheduled.

Traits: AffineScope, AttrSizedOperandSegments, IsolatedFromAbove, SingleBlockImplicitTerminator<RankTerminatorOp>, SingleBlock

Interfaces: RegionBranchOpInterface, air_AsyncOpInterface, air_HierarchyInterface

Attributes:

AttributeMLIR TypeDescription
sym_name::mlir::StringAttrstring attribute

Operands:

Operand Description
async_dependencies variadic of async token type
universe universe type
sizes variadic of index
rank_operands variadic of any type

Results:

Result Description
async_token async token type

air.rank_terminator (xilinx::air::RankTerminatorOp)

Terminator for air.rank.

Syntax:

operation ::= `air.rank_terminator` attr-dict

A terminator operation for the body of air.rank operations. air.rank operations are not expected to return any value so the terminator takes no operands.

Traits: AlwaysSpeculatableImplTrait, HasParent<RankOp>, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

air.segment (xilinx::air::SegmentOp)

Segment

Segment

Traits: AffineScope, AttrSizedOperandSegments, IsolatedFromAbove, SingleBlockImplicitTerminator<SegmentTerminatorOp>, SingleBlock

Interfaces: RegionBranchOpInterface, air_AsyncOpInterface, air_HierarchyInterface

Attributes:

AttributeMLIR TypeDescription
sym_name::mlir::StringAttrstring attribute

Operands:

Operand Description
async_dependencies variadic of async token type
sizes variadic of index
segment_operands variadic of any type

Results:

Result Description
async_token async token type

air.segment_terminator (xilinx::air::SegmentTerminatorOp)

Terminator for air segment regions.

Syntax:

operation ::= `air.segment_terminator` attr-dict

A terminator operation for the body of air.segment operations. air.segment operations are not expected to return any value so the terminator takes no operands.

Traits: AlwaysSpeculatableImplTrait, HasParent<SegmentOp>, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

air.translate (xilinx::air::TranslateOp)

Re-express a symmetric-heap memref in another rank’s address space

Syntax:

operation ::= `air.translate` $source `,` $from_rank `,` $to_rank `,` $heap_bases
              attr-dict `:` type($source) `,` type($heap_bases)

Produces a memref of the same type as $source whose underlying pointer references the corresponding allocation on $to_rank. The $source memref is assumed to live on $from_rank’s symmetric heap. The translation is the pointer rebase

peer_va = bases[to_rank] + (source_ptr - bases[from_rank])

where $heap_bases is a 1-D memref of index-typed pointer values (per-rank symmetric-heap base addresses) obtained from the mgpuGetHeapBases() runtime hook. The host typically wraps the raw runtime pointer as a memref<?xindex> once and threads it through gpu.launch_func as a kernel argument. No data is moved; this op produces a value-level “view” of peer memory.

Folds to $source when $from_rank and $to_rank are statically equal.

Both ranks must address the same collective allocation on the symmetric heap (i.e. $source must trace back to a memref.alloc {air.symmetric}). Using this op outside that contract is undefined.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand Description
source memref of any type values
from_rank index
to_rank index
heap_bases 1D memref of index values

Results:

Result Description
result memref of any type values

air.universe.alloc (xilinx::air::UniverseAllocOp)

Allocate a universe of devices

Syntax:

operation ::= `air.universe.alloc` `(` $capacity `)` attr-dict

Creates an !air.universe value representing a bounded pool of capacity devices or hosts. The universe value is consumed by air.rank to constrain the physical pool from which rank instances are scheduled.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand Description
capacity index

Results:

Result Description
universe universe type

air.wait_all (xilinx::air::WaitAllOp)

Wait for all operator

Syntax:

operation ::= `air.wait_all` custom<AsyncDependencies>(type($async_token), $async_dependencies) attr-dict

Wait for all async tokens before preceding.

Interfaces: air_AsyncOpInterface

Operands:

Operand Description
async_dependencies variadic of async token type

Results:

Result Description
async_token async token type

Attributes

SymmetricHeapMemorySpaceAttr

Symmetric-heap memory space (cross-rank XGMI-accessible HBM)

Syntax: #air.symmetric_heap

Type constraints

async token type

universe type

Enums

MemorySpace

AIR Memory Space IDs

Cases:

Symbol Value String
L1 2 L1
L2 1 L2
L3 0 L3