-aievec-split-load-ups-chainsSplit vector.load + aievec.ups chains to reduce shuffle operations
This pass optimizes chains of vector.load followed by aievec.ups operations for AIE2p targets. Instead of loading a 1024-bit vector and then shuffling it into two halves for separate UPS operations (3 shuffles total), it splits both the load and UPS into two 512-bit halves, requiring only 1 shuffle for concatenation.