[llvm][ARM]Add widen global arrays pass #107120

nasherm · 2024-09-03T15:16:51Z

Pass optimizes memcpy's by padding out destinations and sources to a full word to make backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant array. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded.
Pass works within GlobalOpt but is disabled by default on all targets except ARM.

llvmbot · 2024-09-03T15:17:27Z

@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Nashe Mncube (nasherm)

Changes

Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded.
Pass works at the midend level instead of being added in overridden method ARMPassConfig::addIRPasses(). This is because addIRPasses are run right at the end just before the llvm midend IR is lowered into the SelectionDag IR. This pass works better if it is in the midend because other optimizations such as dead code elimination can be run afterwards and delete the old unreferenced global string that has been replaced with the padded version. The other reason it's better in the midend is that it makes writing the tests easier as opt is able to run midend level passes. None the less, the pass checks if the it's being run on code targeted with an ARM triple if not then it doesn't run.

Patch is 25.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107120.diff

13 Files Affected:

(added) llvm/include/llvm/Transforms/Scalar/ARMWidenStrings.h (+28)
(modified) llvm/lib/Passes/PassBuilder.cpp (+1)
(modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+6)
(modified) llvm/lib/Passes/PassRegistry.def (+1)
(added) llvm/lib/Transforms/Scalar/ARMWidenStrings.cpp (+236)
(modified) llvm/lib/Transforms/Scalar/CMakeLists.txt (+1)
(added) llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-lengths-dont-match.ll (+29)
(added) llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-more-than-64-bytes.ll (+30)
(added) llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-ptrtoint.ll (+47)
(added) llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-struct-test.ll (+53)
(added) llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-test1.ll (+28)
(added) llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-test2.ll (+24)
(added) llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-volatile.ll (+30)

diff --git a/llvm/include/llvm/Transforms/Scalar/ARMWidenStrings.h b/llvm/include/llvm/Transforms/Scalar/ARMWidenStrings.h
new file mode 100755
index 00000000000000..d78f0219c03037
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Scalar/ARMWidenStrings.h
@@ -0,0 +1,28 @@
+//===- ARMWidenStrings.h --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file provides the interface for the ArmWidenStrings pass
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_SCALAR_ARMWIDENSTRINGS_H
+#define LLVM_TRANSFORMS_SCALAR_ARMWIDENSTRINGS_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+class Module;
+
+struct ARMWidenStringsPass : PassInfoMixin<ARMWidenStringsPass> {
+  PreservedAnalyses run(Function &F, FunctionAnalysisManager &);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_SCALAR_ARMWIDENSTRINGS_H
\ No newline at end of file
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 1df1449fce597c..6b989231cb9861 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -207,6 +207,7 @@
 #include "llvm/Transforms/Instrumentation/ThreadSanitizer.h"
 #include "llvm/Transforms/ObjCARC.h"
 #include "llvm/Transforms/Scalar/ADCE.h"
+#include "llvm/Transforms/Scalar/ARMWidenStrings.h"
 #include "llvm/Transforms/Scalar/AlignmentFromAssumptions.h"
 #include "llvm/Transforms/Scalar/AnnotationRemarks.h"
 #include "llvm/Transforms/Scalar/BDCE.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 9c3d49cabbd38c..b75612c410f07d 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -80,6 +80,7 @@
 #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
 #include "llvm/Transforms/Scalar/ADCE.h"
+#include "llvm/Transforms/Scalar/ARMWidenStrings.h"
 #include "llvm/Transforms/Scalar/AlignmentFromAssumptions.h"
 #include "llvm/Transforms/Scalar/AnnotationRemarks.h"
 #include "llvm/Transforms/Scalar/BDCE.h"
@@ -1513,6 +1514,11 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
   // from the TargetLibraryInfo.
   OptimizePM.addPass(InjectTLIMappings());
 
+  bool IsARM = TM && TM->getTargetTriple().isARM();
+  // Optimizes memcpy by padding arrays to exploit alignment
+  if (IsARM && Level.getSizeLevel() == 0 && Level.getSpeedupLevel() > 1)
+    OptimizePM.addPass(ARMWidenStringsPass());
+
   addVectorPasses(Level, OptimizePM, /* IsFullLTO */ false);
 
   // LoopSink pass sinks instructions hoisted by LICM, which serves as a
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index d6067089c6b5c1..55566f43e5435d 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -489,6 +489,7 @@ FUNCTION_PASS("view-dom-only", DomOnlyViewer())
 FUNCTION_PASS("view-post-dom", PostDomViewer())
 FUNCTION_PASS("view-post-dom-only", PostDomOnlyViewer())
 FUNCTION_PASS("wasm-eh-prepare", WasmEHPreparePass())
+FUNCTION_PASS("arm-widen-strings", ARMWidenStringsPass())
 #undef FUNCTION_PASS
 
 #ifndef FUNCTION_PASS_WITH_PARAMS
diff --git a/llvm/lib/Transforms/Scalar/ARMWidenStrings.cpp b/llvm/lib/Transforms/Scalar/ARMWidenStrings.cpp
new file mode 100644
index 00000000000000..5a3c470861cf45
--- /dev/null
+++ b/llvm/lib/Transforms/Scalar/ARMWidenStrings.cpp
@@ -0,0 +1,236 @@
+// ARMWidenStrings.cpp - Widen strings to word boundaries to speed up
+// programs that use simple strcpy's with constant strings as source
+// and stack allocated array for destination.
+
+#define DEBUG_TYPE "arm-widen-strings"
+
+#include "llvm/Transforms/Scalar/ARMWidenStrings.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Operator.h"
+#include "llvm/IR/ValueSymbolTable.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/TargetParser/Triple.h"
+#include "llvm/Transforms/Scalar.h"
+
+using namespace llvm;
+
+cl::opt<bool> DisableARMWidenStrings("disable-arm-widen-strings");
+
+namespace {
+
+class ARMWidenStrings {
+public:
+  /*
+  Max number of bytes that memcpy allows for lowering to load/stores before it
+  uses library function (__aeabi_memcpy).  This is the same value returned by
+  ARMSubtarget::getMaxInlineSizeThreshold which I would have called in place of
+  the constant int but can't get access to the subtarget info class from the
+  midend.
+  */
+  const unsigned int MemcpyInliningLimit = 64;
+
+  bool run(Function &F);
+};
+
+static bool IsCharArray(Type *t) {
+  const unsigned int CHAR_BIT_SIZE = 8;
+  return t && t->isArrayTy() && t->getArrayElementType()->isIntegerTy() &&
+         t->getArrayElementType()->getIntegerBitWidth() == CHAR_BIT_SIZE;
+}
+
+bool ARMWidenStrings::run(Function &F) {
+  if (DisableARMWidenStrings) {
+    return false;
+  }
+
+  if (Triple(F.getParent()->getTargetTriple()).isARM()) {
+    LLVM_DEBUG(
+        dbgs() << "Pass only runs on ARM as hasn't been benchmarked on other "
+                  "targets\n");
+    return false;
+  }
+  LLVM_DEBUG(dbgs() << "Running ARMWidenStrings on module " << F.getName()
+                    << "\n");
+
+  for (Function::iterator b = F.begin(); b != F.end(); ++b) {
+    for (BasicBlock::iterator i = b->begin(); i != b->end(); ++i) {
+      CallInst *CI = dyn_cast<CallInst>(i);
+      if (!CI) {
+        continue;
+      }
+
+      Function *CallMemcpy = CI->getCalledFunction();
+      // find out if the current call instruction is a call to llvm memcpy
+      // intrinsics
+      if (CallMemcpy == NULL || !CallMemcpy->isIntrinsic() ||
+          CallMemcpy->getIntrinsicID() != Intrinsic::memcpy) {
+        continue;
+      }
+
+      LLVM_DEBUG(dbgs() << "Found call to strcpy/memcpy:\n" << *CI << "\n");
+
+      auto *Alloca = dyn_cast<AllocaInst>(CI->getArgOperand(0));
+      auto *SourceVar = dyn_cast<GlobalVariable>(CI->getArgOperand(1));
+      auto *BytesToCopy = dyn_cast<ConstantInt>(CI->getArgOperand(2));
+      auto *IsVolatile = dyn_cast<ConstantInt>(CI->getArgOperand(3));
+
+      if (!BytesToCopy) {
+        LLVM_DEBUG(dbgs() << "Number of bytes to copy is null\n");
+        continue;
+      }
+
+      uint64_t NumBytesToCopy = BytesToCopy->getZExtValue();
+
+      if (!Alloca) {
+        LLVM_DEBUG(dbgs() << "Destination isn't a Alloca\n");
+        continue;
+      }
+
+      if (!SourceVar) {
+        LLVM_DEBUG(dbgs() << "Source isn't a global constant variable\n");
+        continue;
+      }
+
+      if (!IsVolatile || IsVolatile->isOne()) {
+        LLVM_DEBUG(
+            dbgs() << "Not widening strings for this memcpy because it's "
+                      "a volatile operations\n");
+        continue;
+      }
+
+      if (NumBytesToCopy % 4 == 0) {
+        LLVM_DEBUG(dbgs() << "Bytes to copy in strcpy/memcpy is already word "
+                             "aligned so nothing to do here.\n");
+        continue;
+      }
+
+      if (!SourceVar->hasInitializer() || !SourceVar->isConstant() ||
+          !SourceVar->hasLocalLinkage() || !SourceVar->hasGlobalUnnamedAddr()) {
+        LLVM_DEBUG(dbgs() << "Source is not constant global, thus it's "
+                             "mutable therefore it's not safe to pad\n");
+        continue;
+      }
+
+      ConstantDataArray *SourceDataArray =
+          dyn_cast<ConstantDataArray>(SourceVar->getInitializer());
+      if (!SourceDataArray || !IsCharArray(SourceDataArray->getType())) {
+        LLVM_DEBUG(dbgs() << "Source isn't a constant data array\n");
+        continue;
+      }
+
+      if (!Alloca->isStaticAlloca()) {
+        LLVM_DEBUG(dbgs() << "Destination allocation isn't a static "
+                             "constant which is locally allocated in this "
+                             "function, so skipping.\n");
+        continue;
+      }
+
+      // Make sure destination is definitley a char array.
+      if (!IsCharArray(Alloca->getAllocatedType())) {
+        LLVM_DEBUG(dbgs() << "Destination doesn't look like a constant char (8 "
+                             "bits) array\n");
+        continue;
+      }
+      LLVM_DEBUG(dbgs() << "With Alloca: " << *Alloca << "\n");
+
+      uint64_t DZSize = Alloca->getAllocatedType()->getArrayNumElements();
+      uint64_t SZSize = SourceDataArray->getType()->getNumElements();
+
+      // For safety purposes lets add a constraint and only padd when
+      // num bytes to copy == destination array size == source string
+      // which is a constant
+      LLVM_DEBUG(dbgs() << "Number of bytes to copy is: " << NumBytesToCopy
+                        << "\n");
+      LLVM_DEBUG(dbgs() << "Size of destination array is: " << DZSize << "\n");
+      LLVM_DEBUG(dbgs() << "Size of source array is: " << SZSize << "\n");
+      if (NumBytesToCopy != DZSize || DZSize != SZSize) {
+        LLVM_DEBUG(dbgs() << "Size of number of bytes to copy, destination "
+                             "array and source string don't match, so "
+                             "skipping\n");
+        continue;
+      }
+      LLVM_DEBUG(dbgs() << "Going to widen.\n");
+      unsigned int NumBytesToPad = 4 - (NumBytesToCopy % 4);
+      LLVM_DEBUG(dbgs() << "Number of bytes to pad by is " << NumBytesToPad
+                        << "\n");
+      unsigned int TotalBytes = NumBytesToCopy + NumBytesToPad;
+
+      if (TotalBytes > MemcpyInliningLimit) {
+        LLVM_DEBUG(
+            dbgs() << "Not going to pad because total number of bytes is "
+                   << TotalBytes
+                   << "  which be greater than the inlining "
+                      "limit for memcpy which is "
+                   << MemcpyInliningLimit << "\n");
+        continue;
+      }
+
+      // update destination char array to be word aligned (memcpy(X,...,...))
+      IRBuilder<> BuildAlloca(Alloca);
+      AllocaInst *NewAlloca = cast<AllocaInst>(BuildAlloca.CreateAlloca(
+          ArrayType::get(Alloca->getAllocatedType()->getArrayElementType(),
+                         NumBytesToCopy + NumBytesToPad)));
+      NewAlloca->takeName(Alloca);
+      NewAlloca->setAlignment(Alloca->getAlign());
+      Alloca->replaceAllUsesWith(NewAlloca);
+
+      LLVM_DEBUG(dbgs() << "Updating users of destination stack object to use "
+                        << "new size\n");
+
+      // update source to be word aligned (memcpy(...,X,...))
+      // create replacement string with padded null bytes.
+      StringRef Data = SourceDataArray->getRawDataValues();
+      std::vector<uint8_t> StrData(Data.begin(), Data.end());
+      for (unsigned int p = 0; p < NumBytesToPad; p++)
+        StrData.push_back('\0');
+      auto Arr = ArrayRef(StrData.data(), TotalBytes);
+
+      // create new padded version of global variable string.
+      Constant *SourceReplace = ConstantDataArray::get(F.getContext(), Arr);
+      GlobalVariable *NewGV = new GlobalVariable(
+          *F.getParent(), SourceReplace->getType(), true,
+          SourceVar->getLinkage(), SourceReplace, SourceReplace->getName());
+
+      // copy any other attributes from original global variable string
+      // e.g. unamed_addr
+      NewGV->copyAttributesFrom(SourceVar);
+      NewGV->takeName(SourceVar);
+
+      // replace intrinsic source.
+      CI->setArgOperand(1, NewGV);
+
+      // Update number of bytes to copy (memcpy(...,...,X))
+      CI->setArgOperand(2,
+                        ConstantInt::get(BytesToCopy->getType(), TotalBytes));
+      LLVM_DEBUG(dbgs() << "Padded dest/source and increased number of bytes:\n"
+                        << *CI << "\n"
+                        << *NewAlloca << "\n");
+    }
+  }
+  return true;
+}
+
+} // end of anonymous namespace
+
+PreservedAnalyses ARMWidenStringsPass::run(Function &F,
+                                           FunctionAnalysisManager &AM) {
+  bool Changed = ARMWidenStrings().run(F);
+  if (!Changed)
+    return PreservedAnalyses::all();
+
+  PreservedAnalyses Preserved;
+  Preserved.preserveSet(CFGAnalyses::ID());
+  Preserved.preserve<LoopAnalysis>();
+  return Preserved;
+}
diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt
index 939a1457239567..a9607e4ebc6583 100644
--- a/llvm/lib/Transforms/Scalar/CMakeLists.txt
+++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt
@@ -2,6 +2,7 @@ add_llvm_component_library(LLVMScalarOpts
   ADCE.cpp
   AlignmentFromAssumptions.cpp
   AnnotationRemarks.cpp
+  ARMWidenStrings.cpp
   BDCE.cpp
   CallSiteSplitting.cpp
   ConstantHoisting.cpp
diff --git a/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-lengths-dont-match.ll b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-lengths-dont-match.ll
new file mode 100644
index 00000000000000..a34ddc2ae2a29a
--- /dev/null
+++ b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-lengths-dont-match.ll
@@ -0,0 +1,29 @@
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -O2 -S | FileCheck %s
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -passes="default<O2>" -S | FileCheck %s
+target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv6m-arm-none-eabi"
+
+; CHECK: [17 x i8]
+@.str = private unnamed_addr constant [17 x i8] c"aaaaaaaaaaaaaaaa\00", align 1
+
+; Function Attrs: nounwind
+define hidden void @foo() local_unnamed_addr #0 {
+entry:
+  %something = alloca [20 x i8], align 1
+  call void @llvm.lifetime.start(i64 20, ptr nonnull %something) #3
+  call void @llvm.memcpy.p0i8.p0i8.i32(ptr align 1 nonnull %something, ptr align 1 @.str, i32 17, i1 false)
+  %call2 = call i32 @bar(ptr nonnull %something) #3
+  call void @llvm.lifetime.end(i64 20, ptr nonnull %something) #3
+  ret void
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start(i64, ptr nocapture) #1
+
+declare i32 @bar(...) local_unnamed_addr #2
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end(i64, ptr nocapture) #1
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.memcpy.p0i8.p0i8.i32(ptr nocapture writeonly, ptr nocapture readonly, i32, i1) #1
diff --git a/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-more-than-64-bytes.ll b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-more-than-64-bytes.ll
new file mode 100644
index 00000000000000..15c196b62bc9b2
--- /dev/null
+++ b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-more-than-64-bytes.ll
@@ -0,0 +1,30 @@
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -O3 -S | FileCheck %s
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -passes="default<O3>" -S | FileCheck %s
+target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv6m-arm-none-eabi"
+
+; CHECK: [65 x i8]
+; CHECK-NOT: [68 x i8]
+@.str = private unnamed_addr constant [65 x i8] c"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzz\00", align 1
+
+; Function Attrs: nounwind
+define hidden void @foo() local_unnamed_addr #0 {
+entry:
+  %something = alloca [65 x i8], align 1
+  call void @llvm.lifetime.start(i64 65, ptr nonnull %something) #3
+  call void @llvm.memcpy.p0i8.p0i8.i32(ptr align 1 nonnull %something, ptr align 1 @.str, i32 65, i1 false)
+  %call2 = call i32 @bar(ptr nonnull %something) #3
+  call void @llvm.lifetime.end(i64 65, ptr nonnull %something) #3
+  ret void
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start(i64, ptr nocapture) #1
+
+declare i32 @bar(...) local_unnamed_addr #2
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end(i64, ptr nocapture) #1
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.memcpy.p0i8.p0i8.i32(ptr nocapture writeonly, ptr nocapture readonly, i32, i1) #1
diff --git a/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-ptrtoint.ll b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-ptrtoint.ll
new file mode 100644
index 00000000000000..b4cb1beee92535
--- /dev/null
+++ b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-ptrtoint.ll
@@ -0,0 +1,47 @@
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -O2 -S | FileCheck %s
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -passes="default<O2>" -S | FileCheck %s
+target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv7m-arm-none-eabi"
+
+; This test uses ptrtoint, but still should be handled correctly.
+; The [45 x i8] string should be optimised away (i.e unused)
+; CHECK: [48 x i8]
+; CHECK-NOT: [45 x i8]
+@f.string1 = private unnamed_addr constant [45 x i8] c"The quick brown dog jumps over the lazy fox.\00", align 1
+
+; Function Attrs: nounwind
+define hidden i32 @f() {
+entry:
+  %string1 = alloca [45 x i8], align 1
+  %pos = alloca i32, align 4
+  %token = alloca ptr, align 4
+  call void @llvm.lifetime.start.p0i8(i64 45, ptr %string1)
+  call void @llvm.memcpy.p0i8.p0i8.i32(ptr align 1 %string1, ptr align 1 @f.string1, i32 45, i1 false)
+  call void @llvm.lifetime.start.p0i8(i64 4, ptr %pos)
+  call void @llvm.lifetime.start.p0i8(i64 4, ptr %token)
+  %call = call ptr @strchr(ptr %string1, i32 101)
+  store ptr %call, ptr %token, align 4
+  %0 = load ptr, ptr %token, align 4
+  %sub.ptr.lhs.cast = ptrtoint ptr %0 to i32
+  %sub.ptr.rhs.cast = ptrtoint ptr %string1 to i32
+  %sub.ptr.sub = sub i32 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
+  %add = add nsw i32 %sub.ptr.sub, 1
+  store i32 %add, ptr %pos, align 4
+  %1 = load i32, ptr %pos, align 4
+  call void @llvm.lifetime.end.p0i8(i64 4, ptr %token)
+  call void @llvm.lifetime.end.p0i8(i64 4, ptr %pos)
+  call void @llvm.lifetime.end.p0i8(i64 45, ptr %string1)
+  ret i32 %1
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, ptr nocapture)
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.memcpy.p0i8.p0i8.i32(ptr nocapture writeonly, ptr nocapture readonly, i32, i1)
+
+; Function Attrs: nounwind
+declare ptr @strchr(ptr, i32)
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, ptr nocapture)
diff --git a/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-struct-test.ll b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-struct-test.ll
new file mode 100644
index 00000000000000..b852944c3f876f
--- /dev/null
+++ b/llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-struct-test.ll
@@ -0,0 +1,53 @@
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -O3 -S | FileCheck %s
+; RUN: opt < %s -mtriple=arm-arm-none-eabi -passes="default<O3>" -S | FileCheck %s
+target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv6m-arm-none-eabi"
+
+%struct.P = type { i32, [13 x i8] }
+
+; CHECK-NOT: [16 x i8]
+@.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00", align 1
+@.str.1 = private unnamed_addr constant [4 x i8] c"%s\0A\00", align 1
+@__ARM_use_no_argv = global i32 1, section ".ARM.use_no_argv", align 4
+@llvm.used = appending global [1 x ptr] [ptr @__ARM_use_no_argv], section "llvm.metadata"
+
+; Function Attrs: nounwind
+define hidden i32 @main() local_unnamed_addr #0 {
+entry:
+  %p = alloca %struct.P, align 4
+  call void @llvm.lifetime.start(i64 20, ptr nonnull %p) #2
+  store i32 10, ptr %p, align 4, !tbaa !3
+  %arraydecay = getelementptr inbounds %struct.P, ptr %p, i32 0, i32 1, i32 0
+  call void @llvm.memcpy.p0i8.p0i8.i32(ptr align 1 %arraydecay, ptr align 1 @.str, i32 13, i1 false)
+  %puts = call i32 @puts(ptr %arraydecay)
+  call void @llvm.lifetime.end(i64 20, ptr nonnull %p) #2
+  ret i32 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start(i64, ptr nocapture) #1
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end(i64, ptr nocapture) #1
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.memcpy.p0i8.p0i8.i32(ptr nocapture writeonly, ptr noca...
[truncated]

aeubanks · 2024-09-03T17:57:41Z

if this is actually arm-specific, please use the registerPassBuilderCallbacks framework

some performance numbers in the description would be helpful

adding some arm people

aeubanks · 2024-09-03T17:59:46Z

also I would split PRs to implement the pass and add it to the pipeline

davemgreen · 2024-09-04T08:58:55Z

I believe there was talk a long time ago about adding this to an existing pass such as the GlobalOpts pass or CGP. It sounds like CGP is too late for it, could it be a part of GlobalOpt or some other pass?

nasherm · 2024-09-04T15:35:56Z

if this is actually arm-specific, please use the registerPassBuilderCallbacks framework

some performance numbers in the description would be helpful

I intend to rework the patch to make use of this and benchmark

also I would split PRs to implement the pass and add it to the pipeline

Sure, no problem

efriedma-quic · 2024-09-04T17:55:46Z

Can you give a brief example of Arm asm before/after this optimization?

I suspect this generalizes to other targets, at least in some cases.

Is there some reason we can't pad globals that aren't strings?

Padding out strings probably affects string merging in the linker, so the codesize tradeoff here is sort of hard to compute.

nasherm · 2024-09-09T14:33:12Z

I've reduced this patch down to adding the pass, as well as tests, without enabling it.

With respect to performance gain I've seen a jump of around 1% on some of our benchmarks.

I used the following (truncated) IR to show the difference in generated assembly

# example.ll
@.str = private unnamed_addr constant [10 x i8] c"123456789\00", align 1

; Function Attrs: nounwind
define hidden void @foo() #0 {
entry:
  %something = alloca [10 x i8], align 1
  %arraydecay = getelementptr inbounds [10 x i8], ptr %something, i32 0, i32 0
  %call = call ptr @strcpy(ptr %arraydecay, ptr @.str)
  %arraydecay1 = getelementptr inbounds [10 x i8], ptr %something, i32 0, i32 0
  %call2 = call i32 @bar(ptr %arraydecay1)
  ret void

Optimization off

$ opt example.ll -O2 -S | llc -mtriple=arm-arm-none-eabi -o -
..........
foo:
	.fnstart
@ %bb.0:                                @ %entry
	.save	{r4, lr}
	push	{r4, lr}
	.pad	#24
	sub	sp, sp, #24
	ldr	r12, .LCPI0_0
	add	r0, sp, #4
	mov	lr, r0
	ldm	r12!, {r1, r2, r3, r4}
	stm	lr!, {r1, r2, r3, r4}
	ldrb	r1, [r12]
	strb	r1, [lr]
	bl	bar
	add	sp, sp, #24
	pop	{r4, lr}
	mov	pc, lr
	.p2align	2
@ %bb.1:
.LCPI0_0:
	.long	.L.str
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.fnend
                                        @ -- End function
	.type	.L.str,%object                  @ @.str
	.section	.rodata.str1.4,"aMS",%progbits,1
	.p2align	2, 0x0
.L.str:
	.asciz	"1234567891234567"
	.size	.L.str, 17

	.section	".note.GNU-stack","",%progbits
	.eabi_attribute	30, 1	@ Tag_ABI_optimization_goals

Optmization on

$ opt example.ll  -passes="default<O2>,arm-widen-strings" -S | llc -mtriple=arm-arm-none-eabi -o -
foo:
	.fnstart
@ %bb.0:                                @ %entry
	.save	{r4, r5, r11, lr}
	push	{r4, r5, r11, lr}
	.pad	#40
	sub	sp, sp, #40
	ldr	r12, .LCPI0_0
	add	r0, sp, #20
	mov	r2, r0
	ldm	r12, {r1, r3, r4, r5, lr}
	stm	r2, {r1, r3, r4, r5, lr}
	bl	bar
	add	sp, sp, #40
	pop	{r4, r5, r11, lr}
	mov	pc, lr
	.p2align	2
@ %bb.1:
.LCPI0_0:
	.long	.L.str
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.fnend
                                        @ -- End function
	.type	.L__unnamed_1,%object           @ @0
	.section	.rodata.str1.1,"aMS",%progbits,1
.L__unnamed_1:
	.asciz	"1234567891234567"
	.size	.L__unnamed_1, 17

	.type	.L.str,%object                  @ @.str
	.section	.rodata,"a",%progbits
	.p2align	2, 0x0
.L.str:
	.asciz	"1234567891234567\000\000\000"
	.size	.L.str, 20

	.section	".note.GNU-stack","",%progbits
	.eabi_attribute	30, 1	@ Tag_ABI_optimization_goals

Diff of assembly for readability

24,27c24,27
< 	.save	{r4, lr}
< 	push	{r4, lr}
< 	.pad	#24
< 	sub	sp, sp, #24
---
> 	.save	{r4, r5, r11, lr}
> 	push	{r4, r5, r11, lr}
> 	.pad	#40
> 	sub	sp, sp, #40
29,34c29,32
< 	add	r0, sp, #4
< 	mov	lr, r0
< 	ldm	r12!, {r1, r2, r3, r4}
< 	stm	lr!, {r1, r2, r3, r4}
< 	ldrb	r1, [r12]
< 	strb	r1, [lr]
---
> 	add	r0, sp, #20
> 	mov	r2, r0
> 	ldm	r12, {r1, r3, r4, r5, lr}
> 	stm	r2, {r1, r3, r4, r5, lr}
36,37c34,35
< 	add	sp, sp, #24
< 	pop	{r4, lr}
---
> 	add	sp, sp, #40
> 	pop	{r4, r5, r11, lr}
46a45,50
> 	.type	.L__unnamed_1,%object           @ @0
> 	.section	.rodata.str1.1,"aMS",%progbits,1
> .L__unnamed_1:
> 	.asciz	"1234567891234567"
> 	.size	.L__unnamed_1, 17
> 
48c52
< 	.section	.rodata.str1.4,"aMS",%progbits,1
---
> 	.section	.rodata,"a",%progbits
51,52c55,56
< 	.asciz	"1234567891234567"
< 	.size	.L.str, 17
---
> 	.asciz	"1234567891234567\000\000\000"
> 	.size	.L.str, 20

nasherm · 2024-09-09T14:35:04Z

Is there some reason we can't pad globals that aren't strings?

I don't think so? But there might be a reason this wasn't investigated. The work in this patch was originally authored by someone no longer at Arm

efriedma-quic

Probably worth investigating if we can fit this easily into some pass that's already examining the uses of globals, like GlobalOpt; iterating over the whole module isn't cheap.

llvm/lib/Transforms/Scalar/ARMWidenStrings.cpp

nasherm · 2024-09-11T15:55:03Z

My most recent patch addresses the comments.

Probably worth investigating if we can fit this easily into some pass that's already examining the uses of globals, like GlobalOpt; iterating over the whole module isn't cheap.

I've had a look at GlobalOpt briefly and I have a few questions: if this pass were added wouldn't investigation also include seeing if this improves performance on other targets? I can see restricting the pass to ARM cores but it seems like that would it make it a poor fit for GlobalOpt. Is there something I'm missing?

github-actions · 2024-09-11T15:58:39Z

✅ With the latest revision this PR passed the C/C++ code formatter.

efriedma-quic · 2024-09-11T16:19:09Z

If we're going to make this transform target-independent, we'll need some target-specific tuning from TargetTransformInfo or something like that. Even if the transform is profitable, the exact thresholds where it's profitable are likely to be different. (The maximum size of the global where it's relevant, and whether the best alignment boundary is 2/4/8/16 bytes, is going to vary.)

Not sure we need extensive performance measurements for other targets... if you could get measurements for some big x86 or Arm64 core, that would be nice. But you can basically see what happens on other targets by just compiling a simple example. And if we have a TTI hook, targets could easily opt-out.

nasherm · 2024-09-13T11:28:23Z

I've rewritten the pass to be platform independent and added it to GlobalOpt. By default it's disabled for all targets except ARM.

nasherm · 2024-09-13T11:29:18Z

I haven't had a chance to investigate performance on AArch64 or x86 machines and will not be able to until next week

davemgreen

Would it be possible to try and make the size a little more generic, not just 4? It could possibly have a cost-model based of the expected expansion of memcpy's, but that might not save much on many architectures other than the cost of an unaligned load/store.

Maybe the target could return the size to align to for a given array size, as opposed to the boolean useWidenGlobalStrings. So 4 for Arm (for ldm).

llvm/lib/Transforms/IPO/GlobalOpt.cpp

llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-struct-test.ll

llvm/test/Transforms/ARMWidenStrings/arm-widen-strings-1.ll

nasherm · 2024-09-27T11:53:34Z

Maybe the target could return the size to align to for a given array size, as opposed to the boolean useWidenGlobalStrings. So 4 for Arm

Opted for this approach. Would it be worth making the chosen alignment value correspond to a target feature?

davemgreen

Thanks - I think this seems OK to me. I had some comments inline, but what currently happens when the global has multiple uses?

llvm/lib/Transforms/IPO/GlobalOpt.cpp

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/test/Transforms/GlobalOpt/ARM/arm-widen-strings-1.ll

llvm/test/Transforms/GlobalOpt/ARM/arm-widen-strings-2.ll

llvm/lib/Transforms/IPO/GlobalOpt.cpp

nasherm · 2024-10-02T13:24:03Z

.. but what currently happens when the global has multiple uses?

I've added an IR test that shows this. I believe the global variable only gets padded once while every destination array gets padded if using the variable

davemgreen

I've added an IR test that shows this. I believe the global variable only gets padded once while every destination array gets padded if using the variable

It looks like multiple globals get created. Can we prevent that from happening?

Can we replace all the uses of the old global with the new one, to make sure we don't end up with many repeated globals? i.e use the first n bytes of the now wider global in the "other" uses, and hopefully re-use the already widened global in other allocas that are also widened. (Or widen all uses at once, or maybe just limit it to a single use if it isn't very useful).

llvm/lib/Transforms/IPO/GlobalOpt.cpp

llvm/test/Transforms/GlobalOpt/ARM/arm-widen-non-byte-array.ll

nasherm · 2024-10-04T12:36:12Z

Can we replace all the uses of the old global with the new one, to make sure we don't end up with many repeated globals? i.e use the first n bytes of the now wider global in the "other" uses, and hopefully re-use the already widened global in other allocas that are also widened. (Or widen all uses at once, or maybe just limit it to a single use if it isn't very useful)

I have something that works by not creating multiple globals. It rather creates the new global and updates all the references, but doesn't update the original source global. I'm not sure if this is an ideal solution. The multi-use test demonstrates this.

nasherm · 2024-10-07T11:12:07Z

With the now updated pass I get the following output on the below (truncated) IR

IR

@.str = private unnamed_addr constant [10 x i8] c"123456789\00", align 1

; Function Attrs: nounwind
define hidden void @foo() #0 {
entry:
  %something = alloca [10 x i8], align 1
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(10) %something, ptr noundef nonnull align 1 dereferenceable(10) @.str, i32 10, i1 false)
  %call2 = call i32 @bar(ptr nonnull %something)
  ret void
}


declare i32 @bar(...)

nasherm · 2024-10-07T11:12:49Z

With the now updated pass I get the following output on the below (truncated) IR

IR

@.str = private unnamed_addr constant [10 x i8] c"123456789\00", align 1

; Function Attrs: nounwind
define hidden void @foo() #0 {
entry:
  %something = alloca [10 x i8], align 1
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(10) %something, ptr noundef nonnull align 1 dereferenceable(10) @.str, i32 10, i1 false)
  %call2 = call i32 @bar(ptr nonnull %something)
  ret void
}


declare i32 @bar(...)

ARM

ARM (No pass)

$ opt -mtriple=arm-none-eabi -S | llc  -o -
foo:
	.fnstart
@ %bb.0:                                @ %entry
	.save	{r11, lr}
	push	{r11, lr}
	.pad	#16
	sub	sp, sp, #16
	mov	r0, #57
	strh	r0, [sp, #12]
	ldr	r0, .LCPI0_0
	str	r0, [sp, #8]
	ldr	r0, .LCPI0_1
	str	r0, [sp, #4]
	add	r0, sp, #4
	bl	bar
	add	sp, sp, #16
	pop	{r11, lr}
	mov	pc, lr
	.p2align	2
@ %bb.1:
.LCPI0_0:
	.long	943142453                       @ 0x38373635
.LCPI0_1:
	.long	875770417                       @ 0x34333231
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.fnend
                                        @ -- End function
	.type	.L.str,%object                  @ @.str
	.section	.rodata.str1.4,"aMS",%progbits,1
	.p2align	2, 0x0
.L.str:
	.asciz	"123456789"
	.size	.L.str, 10

	.section	".note.GNU-stack","",%progbits
	.eabi_attribute	30, 1	@ Tag_ABI_optimization_goals

ARM (with pass)

$ opt -passes=globalopt -mtriple=arm-none-eabi -S | llc  -o -
foo:
	.fnstart
@ %bb.0:                                @ %entry
	.save	{r11, lr}
	push	{r11, lr}
	.pad	#16
	sub	sp, sp, #16
	mov	r0, #57
	str	r0, [sp, #12]
	ldr	r0, .LCPI0_0
	str	r0, [sp, #8]
	ldr	r0, .LCPI0_1
	str	r0, [sp, #4]
	add	r0, sp, #4
	bl	bar
	add	sp, sp, #16
	pop	{r11, lr}
	mov	pc, lr
	.p2align	2
@ %bb.1:
.LCPI0_0:
	.long	943142453                       @ 0x38373635
.LCPI0_1:
	.long	875770417                       @ 0x34333231
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.fnend
                                        @ -- End function
	.type	.L.str,%object                  @ @.str
	.section	.rodata,"a",%progbits
	.p2align	2, 0x0
.L.str:
	.asciz	"123456789\000\000"
	.size	.L.str, 12

	.section	".note.GNU-stack","",%progbits
	.eabi_attribute	30, 1	@ Tag_ABI_optimization_goals

nasherm · 2024-10-07T11:13:55Z

With the now updated pass I get the following output on the below (truncated) IR

IR

@.str = private unnamed_addr constant [10 x i8] c"123456789\00", align 1

; Function Attrs: nounwind
define hidden void @foo() #0 {
entry:
  %something = alloca [10 x i8], align 1
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(10) %something, ptr noundef nonnull align 1 dereferenceable(10) @.str, i32 10, i1 false)
  %call2 = call i32 @bar(ptr nonnull %something)
  ret void
}


declare i32 @bar(...)

AArch64

AArch64 (no pass)

$ opt -mtriple=aarch64-none-eabi -S | llc -o -
	.text
	.file	"example.ll"
	.hidden	foo                             // -- Begin function foo
	.globl	foo
	.p2align	2
	.type	foo,@function
foo:                                    // @foo
	.cfi_startproc
// %bb.0:                               // %entry
	sub	sp, sp, #32
	str	x30, [sp, #16]                  // 8-byte Folded Spill
	.cfi_def_cfa_offset 32
	.cfi_offset w30, -16
	adrp	x9, .L.str
	add	x9, x9, :lo12:.L.str
	mov	w8, #57                         // =0x39
	ldr	x9, [x9]
	mov	x0, sp
	strh	w8, [sp, #8]
	str	x9, [sp]
	bl	bar
	ldr	x30, [sp, #16]                  // 8-byte Folded Reload
	add	sp, sp, #32
	ret
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc
                                        // -- End function
	.type	.L.str,@object                  // @.str
	.section	.rodata.str1.1,"aMS",@progbits,1
.L.str:
	.asciz	"123456789"
	.size	.L.str, 10

	.section	".note.GNU-stack","",@progbits

AArch64 (with pass)

$ opt -passes=globalopt -mtriple=aarch64-none-eabi | llc -o - 
foo:                                    // @foo
	.cfi_startproc
// %bb.0:                               // %entry
	sub	sp, sp, #32
	str	x30, [sp, #16]                  // 8-byte Folded Spill
	.cfi_def_cfa_offset 32
	.cfi_offset w30, -16
	adrp	x9, .L.str
	add	x9, x9, :lo12:.L.str
	mov	w8, #57                         // =0x39
	ldr	x9, [x9]
	mov	x0, sp
	str	w8, [sp, #8]
	str	x9, [sp]
	bl	bar
	ldr	x30, [sp, #16]                  // 8-byte Folded Reload
	add	sp, sp, #32
	ret
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc
                                        // -- End function
	.type	.L.str,@object                  // @.str
	.section	.rodata,"a",@progbits
.L.str:
	.asciz	"123456789\000\000"
	.size	.L.str, 12

	.section	".note.GNU-stack","",@progbits

nasherm · 2024-10-07T11:15:06Z

With the now updated pass I get the following output on the below (truncated) IR

IR

@.str = private unnamed_addr constant [10 x i8] c"123456789\00", align 1

; Function Attrs: nounwind
define hidden void @foo() #0 {
entry:
  %something = alloca [10 x i8], align 1
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(10) %something, ptr noundef nonnull align 1 dereferenceable(10) @.str, i32 10, i1 false)
  %call2 = call i32 @bar(ptr nonnull %something)
  ret void
}


declare i32 @bar(...)

x86

x86 (no pass)

$ opt -mtriple=x86_64-unknown-linux-gnu -S | llc -o -
	.text
	.file	"example.ll"
	.hidden	foo                             # -- Begin function foo
	.globl	foo
	.p2align	4, 0x90
	.type	foo,@function
foo:                                    # @foo
	.cfi_startproc
# %bb.0:                                # %entry
	subq	$24, %rsp
	.cfi_def_cfa_offset 32
	movabsq	$4050765991979987505, %rax      # imm = 0x3837363534333231
	movq	%rax, 8(%rsp)
	movw	$57, 16(%rsp)
	leaq	8(%rsp), %rdi
	callq	bar@PLT
	addq	$24, %rsp
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc
                                        # -- End function
	.type	.L.str,@object                  # @.str
	.section	.rodata.str1.1,"aMS",@progbits,1
.L.str:
	.asciz	"123456789"
	.size	.L.str, 10

	.section	".note.GNU-stack","",@progbits

x86(with pass)

$ opt -mtriple=x86_64-unknown-linux-gnu  -passes=globalopt -S | llc -o -
	.text
	.file	"example.ll"
	.hidden	foo                             # -- Begin function foo
	.globl	foo
	.p2align	4, 0x90
	.type	foo,@function
foo:                                    # @foo
	.cfi_startproc
# %bb.0:                                # %entry
	subq	$24, %rsp
	.cfi_def_cfa_offset 32
	movabsq	$4050765991979987505, %rax      # imm = 0x3837363534333231
	movq	%rax, 8(%rsp)
	movl	$57, 16(%rsp)
	leaq	8(%rsp), %rdi
	callq	bar@PLT
	addq	$24, %rsp
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc
                                        # -- End function
	.type	.L.str,@object                  # @.str
	.section	.rodata,"a",@progbits
.L.str:
	.asciz	"123456789\000\000"
	.size	.L.str, 12

	.section	".note.GNU-stack","",@progbits

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/lib/Transforms/IPO/GlobalOpt.cpp

- Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level Change-Id: I1c6371f0962e7ad3c166602b800d041ac1cc7b04

Change-Id: I492ea4e5b6f589e5d877eeb6be31f7ab4720be9b

Change-Id: Ic6ed9a549e39020e8c04b38bc21ba8162b4ebfd9

Updating patch so that when attempting to widen global strings we only check whether the variable is being called by a memcpy intrinsic. Change-Id: I088403636c2ed0acc231af77b399b1b95f1abbc2

The case in which copying from a global source to a global dest wasn't handled and caused opt to crash. This is now handled and a new test has been added to check Change-Id: Ieb0467797fcee888f6e95e68af4dac9c05d70a4d

Change-Id: I029312362f9dd714b2e9bc206cc002883d761b8b

Change-Id: Idc7b14cc785eb88552dd72947eb0df128baa7e90

- Removed handling of global variable destinations. We simply don't pad these for now - Added check that destination array is an array type and added test. Change-Id: Ifc53051952ef69c4af64827402baf7d69cab4824

davemgreen

Thanks for the updates. I ran some tests and as far as I can tell they ran OK now. LGTM if there are no other comments.

llvm/lib/Transforms/IPO/GlobalOpt.cpp

Change-Id: Iad0539e526fb0fc116217dcbd033f8297fa5ef5f

- Added test showing behaviour of attempting to widen non-const globals - Refactoring Change-Id: I566214331bf3d889bd1409d3148aa6eab2530ed5

nasherm requested review from snehasish, teresajohnson and aeubanks and removed request for snehasish and teresajohnson September 3, 2024 15:16

llvmbot added the llvm:transforms label Sep 3, 2024

nasherm requested review from snehasish and teresajohnson September 3, 2024 15:17

aeubanks requested a review from davemgreen September 3, 2024 17:58

davemgreen requested a review from efriedma-quic September 4, 2024 08:57

nasherm force-pushed the nashe/widen-strings branch from b3bca66 to cc8bf21 Compare September 9, 2024 14:32

efriedma-quic reviewed Sep 10, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/ARMWidenStrings.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Scalar/ARMWidenStrings.cpp Outdated Show resolved Hide resolved

nasherm force-pushed the nashe/widen-strings branch from f7e220d to 3b0405b Compare September 13, 2024 11:27

llvmbot added backend:ARM llvm:analysis labels Sep 13, 2024

nasherm changed the title ~~[llvm][ARM]Add ARM widen strings pass~~ [llvm][ARM]Add widen strings pass Sep 13, 2024

nasherm requested a review from efriedma-quic September 16, 2024 14:01

nasherm requested a review from davemgreen September 19, 2024 12:48

davemgreen reviewed Sep 23, 2024

View reviewed changes

davemgreen reviewed Oct 1, 2024

View reviewed changes

nasherm changed the title ~~[llvm][ARM]Add widen strings pass~~ [llvm][ARM]Add widen global arrays pass Oct 2, 2024

davemgreen reviewed Oct 3, 2024

View reviewed changes

llvm/lib/Transforms/IPO/GlobalOpt.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/GlobalOpt/ARM/arm-widen-non-byte-array.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/GlobalOpt/ARM/arm-widen-non-byte-array.ll Outdated Show resolved Hide resolved

davemgreen reviewed Oct 9, 2024

View reviewed changes

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/IPO/GlobalOpt.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/IPO/GlobalOpt.cpp Outdated Show resolved Hide resolved

nasherm added 12 commits October 16, 2024 11:10

Responding to review comments

d802267

Change-Id: I492ea4e5b6f589e5d877eeb6be31f7ab4720be9b

Making ARMWidenStrings to be target independent

2a2c6c9

Change-Id: Ic6ed9a549e39020e8c04b38bc21ba8162b4ebfd9

Review comments

2a2cfdc

Updating patch so that when attempting to widen global strings we only check whether the variable is being called by a memcpy intrinsic. Change-Id: I088403636c2ed0acc231af77b399b1b95f1abbc2

Review comments

72567c4

Responding to review comments

21ca2ba

Review comments: eliminating generation of multiple globals

41aec6f

Correcting and refactoring elimination

c17ff0d

Fix bug when copying to global dest

72be4ca

The case in which copying from a global source to a global dest wasn't handled and caused opt to crash. This is now handled and a new test has been added to check Change-Id: Ieb0467797fcee888f6e95e68af4dac9c05d70a4d

Addressing review comments

f55c239

Change-Id: I029312362f9dd714b2e9bc206cc002883d761b8b

Addressing review comments

e1218a0

Change-Id: Idc7b14cc785eb88552dd72947eb0df128baa7e90

Review comments

9e92588

- Removed handling of global variable destinations. We simply don't pad these for now - Added check that destination array is an array type and added test. Change-Id: Ifc53051952ef69c4af64827402baf7d69cab4824

davemgreen approved these changes Oct 16, 2024

View reviewed changes

llvm/lib/Transforms/IPO/GlobalOpt.cpp Outdated Show resolved Hide resolved

nasherm force-pushed the nashe/widen-strings branch from bbe246e to 86ee9ad Compare October 16, 2024 10:51

Rebasing

2815d59

Change-Id: Iad0539e526fb0fc116217dcbd033f8297fa5ef5f

nasherm force-pushed the nashe/widen-strings branch from 86ee9ad to 2815d59 Compare October 16, 2024 10:53

Responding to review comments

75f951a

- Added test showing behaviour of attempting to widen non-const globals - Refactoring Change-Id: I566214331bf3d889bd1409d3148aa6eab2530ed5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llvm][ARM]Add widen global arrays pass #107120

[llvm][ARM]Add widen global arrays pass #107120

nasherm commented Sep 3, 2024 •

edited

Loading

llvmbot commented Sep 3, 2024 •

edited

Loading

aeubanks commented Sep 3, 2024

aeubanks commented Sep 3, 2024

davemgreen commented Sep 4, 2024

nasherm commented Sep 4, 2024 •

edited

Loading

efriedma-quic commented Sep 4, 2024

nasherm commented Sep 9, 2024

nasherm commented Sep 9, 2024 •

edited

Loading

efriedma-quic left a comment

nasherm commented Sep 11, 2024 •

edited

Loading

github-actions bot commented Sep 11, 2024 •

edited

Loading

efriedma-quic commented Sep 11, 2024

nasherm commented Sep 13, 2024

nasherm commented Sep 13, 2024 •

edited

Loading

davemgreen left a comment

nasherm commented Sep 27, 2024

davemgreen left a comment

nasherm commented Oct 2, 2024

davemgreen left a comment

nasherm commented Oct 4, 2024 •

edited

Loading

nasherm commented Oct 7, 2024

nasherm commented Oct 7, 2024 •

edited

Loading

nasherm commented Oct 7, 2024 •

edited

Loading

nasherm commented Oct 7, 2024

davemgreen left a comment

[llvm][ARM]Add widen global arrays pass #107120

Are you sure you want to change the base?

[llvm][ARM]Add widen global arrays pass #107120

Conversation

nasherm commented Sep 3, 2024 • edited Loading

llvmbot commented Sep 3, 2024 • edited Loading

aeubanks commented Sep 3, 2024

aeubanks commented Sep 3, 2024

davemgreen commented Sep 4, 2024

nasherm commented Sep 4, 2024 • edited Loading

efriedma-quic commented Sep 4, 2024

nasherm commented Sep 9, 2024

nasherm commented Sep 9, 2024 • edited Loading

efriedma-quic left a comment

Choose a reason for hiding this comment

nasherm commented Sep 11, 2024 • edited Loading

github-actions bot commented Sep 11, 2024 • edited Loading

efriedma-quic commented Sep 11, 2024

nasherm commented Sep 13, 2024

nasherm commented Sep 13, 2024 • edited Loading

davemgreen left a comment

Choose a reason for hiding this comment

nasherm commented Sep 27, 2024

davemgreen left a comment

Choose a reason for hiding this comment

nasherm commented Oct 2, 2024

davemgreen left a comment

Choose a reason for hiding this comment

nasherm commented Oct 4, 2024 • edited Loading

nasherm commented Oct 7, 2024

nasherm commented Oct 7, 2024 • edited Loading

nasherm commented Oct 7, 2024 • edited Loading

nasherm commented Oct 7, 2024

davemgreen left a comment

Choose a reason for hiding this comment

nasherm commented Sep 3, 2024 •

edited

Loading

llvmbot commented Sep 3, 2024 •

edited

Loading

nasherm commented Sep 4, 2024 •

edited

Loading

nasherm commented Sep 9, 2024 •

edited

Loading

nasherm commented Sep 11, 2024 •

edited

Loading

github-actions bot commented Sep 11, 2024 •

edited

Loading

nasherm commented Sep 13, 2024 •

edited

Loading

nasherm commented Oct 4, 2024 •

edited

Loading

nasherm commented Oct 7, 2024 •

edited

Loading

nasherm commented Oct 7, 2024 •

edited

Loading