Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ #95802

Merged
merged 7 commits into from
Jun 20, 2024

Conversation

Fznamznon
Copy link
Contributor

This commit implements the entirety of the now-accepted N3017 -Preprocessor Embed and its sister C++ paper p1967. It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression.

This reverts commit 682d461.


Co-authored-by: The Phantom Derpstorm phdofthehouse@gmail.com
Co-authored-by: Aaron Ballman aaron@aaronballman.com
Co-authored-by: cor3ntin corentinjabot@gmail.com
Co-authored-by: H. Vetinari h.vetinari@gmx.com

Fznamznon and others added 2 commits June 17, 2024 07:01
This commit implements the entirety of the now-accepted [N3017 -
Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded
by AST consumers (CodeGen or constant expression evaluators) or
expand embed directive as a comma expression.

This reverts commit 682d461.

---------

Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Memory was allocated by a bump allocator. EmbedAnnotationData
had a SmallString inside that can grow.
This commit fixes memory leak by removing filename fields from
EmbedAnnotationData and EmbedExpr itself since it wasn't used anyway.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang-tools-extra clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:codegen clang:static analyzer labels Jun 17, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Mariya Podchishchaeva (Fznamznon)

Changes

This commit implements the entirety of the now-accepted N3017 -Preprocessor Embed and its sister C++ paper p1967. It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression.

This reverts commit 682d461.


Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>


Patch is 184.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95802.diff

96 Files Affected:

  • (modified) clang-tools-extra/test/pp-trace/pp-trace-macro.cpp (+9)
  • (modified) clang/docs/LanguageExtensions.rst (+24)
  • (modified) clang/include/clang/AST/Expr.h (+158)
  • (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+5)
  • (modified) clang/include/clang/AST/TextNodeDumper.h (+1)
  • (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+3)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+12)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (-2)
  • (modified) clang/include/clang/Basic/FileManager.h (+7-4)
  • (modified) clang/include/clang/Basic/StmtNodes.td (+1)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Driver/Options.td (+6)
  • (modified) clang/include/clang/Frontend/PreprocessorOutputOptions.h (+3)
  • (modified) clang/include/clang/Lex/PPCallbacks.h (+54)
  • (added) clang/include/clang/Lex/PPDirectiveParameter.h (+33)
  • (added) clang/include/clang/Lex/PPEmbedParameters.h (+94)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+67-2)
  • (modified) clang/include/clang/Lex/PreprocessorOptions.h (+3)
  • (modified) clang/include/clang/Parse/Parser.h (+3)
  • (modified) clang/include/clang/Sema/Sema.h (+4)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/lib/AST/Expr.cpp (+12)
  • (modified) clang/lib/AST/ExprClassification.cpp (+5)
  • (modified) clang/lib/AST/ExprConstant.cpp (+58-5)
  • (modified) clang/lib/AST/Interp/ByteCodeExprGen.cpp (+18-3)
  • (modified) clang/lib/AST/Interp/ByteCodeExprGen.h (+1)
  • (modified) clang/lib/AST/ItaniumMangle.cpp (+1)
  • (modified) clang/lib/AST/StmtPrinter.cpp (+4)
  • (modified) clang/lib/AST/StmtProfile.cpp (+2)
  • (modified) clang/lib/AST/TextNodeDumper.cpp (+5)
  • (modified) clang/lib/Basic/FileManager.cpp (+6-1)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-2)
  • (modified) clang/lib/CodeGen/CGExprAgg.cpp (+32-8)
  • (modified) clang/lib/CodeGen/CGExprConstant.cpp (+93-25)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+7)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5-1)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+8)
  • (modified) clang/lib/Frontend/DependencyFile.cpp (+25)
  • (modified) clang/lib/Frontend/DependencyGraph.cpp (+23-1)
  • (modified) clang/lib/Frontend/InitPreprocessor.cpp (+8)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+115-7)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+474-2)
  • (modified) clang/lib/Lex/PPExpressions.cpp (+36-13)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+111)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+4-1)
  • (modified) clang/lib/Parse/ParseExpr.cpp (+36-1)
  • (modified) clang/lib/Parse/ParseInit.cpp (+30)
  • (modified) clang/lib/Parse/ParseTemplate.cpp (+29-12)
  • (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+12-3)
  • (modified) clang/lib/Sema/SemaInit.cpp (+100-13)
  • (modified) clang/lib/Sema/TreeTransform.h (+5)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+14)
  • (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+10)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+4)
  • (added) clang/test/C/C2x/Inputs/bits.bin (+1)
  • (added) clang/test/C/C2x/Inputs/boop.h (+1)
  • (added) clang/test/C/C2x/Inputs/i.dat (+1)
  • (added) clang/test/C/C2x/Inputs/jump.wav (+1)
  • (added) clang/test/C/C2x/Inputs/s.dat (+1)
  • (added) clang/test/C/C2x/n3017.c (+216)
  • (added) clang/test/Preprocessor/Inputs/jk.txt (+1)
  • (added) clang/test/Preprocessor/Inputs/media/art.txt (+9)
  • (added) clang/test/Preprocessor/Inputs/media/empty ()
  • (added) clang/test/Preprocessor/Inputs/null_byte.bin ()
  • (added) clang/test/Preprocessor/Inputs/numbers.txt (+1)
  • (added) clang/test/Preprocessor/Inputs/single_byte.txt (+1)
  • (added) clang/test/Preprocessor/embed___has_embed.c (+60)
  • (added) clang/test/Preprocessor/embed___has_embed_parsing_errors.c (+240)
  • (added) clang/test/Preprocessor/embed___has_embed_supported.c (+24)
  • (added) clang/test/Preprocessor/embed_art.c (+104)
  • (added) clang/test/Preprocessor/embed_codegen.cpp (+84)
  • (added) clang/test/Preprocessor/embed_constexpr.cpp (+97)
  • (added) clang/test/Preprocessor/embed_dependencies.c (+20)
  • (added) clang/test/Preprocessor/embed_ext_compat_diags.c (+16)
  • (added) clang/test/Preprocessor/embed_feature_test.cpp (+7)
  • (added) clang/test/Preprocessor/embed_file_not_found_chevron.c (+4)
  • (added) clang/test/Preprocessor/embed_file_not_found_quote.c (+4)
  • (added) clang/test/Preprocessor/embed_init.c (+29)
  • (added) clang/test/Preprocessor/embed_parameter_if_empty.c (+24)
  • (added) clang/test/Preprocessor/embed_parameter_limit.c (+94)
  • (added) clang/test/Preprocessor/embed_parameter_offset.c (+89)
  • (added) clang/test/Preprocessor/embed_parameter_prefix.c (+38)
  • (added) clang/test/Preprocessor/embed_parameter_suffix.c (+39)
  • (added) clang/test/Preprocessor/embed_parameter_unrecognized.c (+9)
  • (added) clang/test/Preprocessor/embed_parsing_errors.c (+130)
  • (added) clang/test/Preprocessor/embed_path_chevron.c (+8)
  • (added) clang/test/Preprocessor/embed_path_quote.c (+8)
  • (added) clang/test/Preprocessor/embed_preprocess_to_file.c (+39)
  • (added) clang/test/Preprocessor/embed_single_entity.c (+7)
  • (added) clang/test/Preprocessor/embed_weird.cpp (+98)
  • (modified) clang/test/Preprocessor/init-aarch64.c (+3)
  • (modified) clang/test/Preprocessor/init.c (+3)
  • (added) clang/test/Preprocessor/single_byte.txt (+1)
  • (modified) clang/tools/libclang/CXCursor.cpp (+1)
  • (modified) clang/www/c_status.html (+1-1)
diff --git a/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp b/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
index 1d85607e86b7f..7c2a231101070 100644
--- a/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
+++ b/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
@@ -31,6 +31,15 @@ X
 // CHECK:        MacroNameTok: __STDC_UTF_32__
 // CHECK-NEXT:   MacroDirective: MD_Define
 // CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_NOT_FOUND__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_FOUND__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_EMPTY__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
 // CHECK:      - Callback: MacroDefined
 // CHECK-NEXT:   MacroNameTok: MACRO
 // CHECK-NEXT:   MacroDirective: MD_Define
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 92e6025c95a8c..9830b35faae12 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -1502,6 +1502,7 @@ Attributes on Structured Bindings            __cpp_structured_bindings        C+
 Designated initializers (N494)                                                C99           C89
 Array & element qualification (N2607)                                         C23           C89
 Attributes (N2335)                                                            C23           C89
+``#embed`` (N3017)                                                            C23           C89, C++
 ============================================ ================================ ============= =============
 
 Type Trait Primitives
@@ -5664,3 +5665,26 @@ Compiling different TUs depending on these flags (including use of
 ``std::hardware_destructive_interference``)  with different compilers, macro
 definitions, or architecture flags will lead to ODR violations and should be
 avoided.
+
+``#embed`` Parameters
+=====================
+
+``clang::offset``
+-----------------
+The ``clang::offset`` embed parameter may appear zero or one time in the
+embed parameter sequence. Its preprocessor argument clause shall be present and
+have the form:
+
+..code-block: text
+
+  ( constant-expression )
+
+and shall be an integer constant expression. The integer constant expression
+shall not evaluate to a value less than 0. The token ``defined`` shall not
+appear within the constant expression.
+
+The offset will be used when reading the contents of the embedded resource to
+specify the starting offset to begin embedding from. The resources is treated
+as being empty if the specified offset is larger than the number of bytes in
+the resource. The offset will be applied *before* any ``limit`` parameters are
+applied.
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index f2bf667636dc9..3bc8cae4d8c86 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -4799,6 +4799,164 @@ class SourceLocExpr final : public Expr {
   friend class ASTStmtReader;
 };
 
+/// Stores data related to a single #embed directive.
+struct EmbedDataStorage {
+  StringLiteral *BinaryData;
+  size_t getDataElementCount() const { return BinaryData->getByteLength(); }
+};
+
+/// Represents a reference to #emded data. By default, this references the whole
+/// range. Otherwise it represents a subrange of data imported by #embed
+/// directive. Needed to handle nested initializer lists with #embed directives.
+/// Example:
+///  struct S {
+///    int x, y;
+///  };
+///
+///  struct T {
+///    int x[2];
+///    struct S s
+///  };
+///
+///  struct T t[] = {
+///  #embed "data" // data contains 10 elements;
+///  };
+///
+/// The resulting semantic form of initializer list will contain (EE stands
+/// for EmbedExpr):
+///  { {EE(first two data elements), {EE(3rd element), EE(4th element) }},
+///  { {EE(5th and 6th element), {EE(7th element), EE(8th element) }},
+///  { {EE(9th and 10th element), { zeroinitializer }}}
+///
+/// EmbedExpr inside of a semantic initializer list and referencing more than
+/// one element can only appear for arrays of scalars.
+class EmbedExpr final : public Expr {
+  SourceLocation EmbedKeywordLoc;
+  IntegerLiteral *FakeChildNode = nullptr;
+  const ASTContext *Ctx = nullptr;
+  EmbedDataStorage *Data;
+  unsigned Begin = 0;
+  unsigned NumOfElements;
+
+public:
+  EmbedExpr(const ASTContext &Ctx, SourceLocation Loc, EmbedDataStorage *Data,
+            unsigned Begin, unsigned NumOfElements);
+  explicit EmbedExpr(EmptyShell Empty) : Expr(SourceLocExprClass, Empty) {}
+
+  SourceLocation getLocation() const { return EmbedKeywordLoc; }
+  SourceLocation getBeginLoc() const { return EmbedKeywordLoc; }
+  SourceLocation getEndLoc() const { return EmbedKeywordLoc; }
+
+  StringLiteral *getDataStringLiteral() const { return Data->BinaryData; }
+  EmbedDataStorage *getData() const { return Data; }
+
+  unsigned getStartingElementPos() const { return Begin; }
+  size_t getDataElementCount() const { return NumOfElements; }
+
+  // Allows accessing every byte of EmbedExpr data and iterating over it.
+  // An Iterator knows the EmbedExpr that it refers to, and an offset value
+  // within the data.
+  // Dereferencing an Iterator results in construction of IntegerLiteral AST
+  // node filled with byte of data of the corresponding EmbedExpr within offset
+  // that the Iterator currently has.
+  template <bool Const>
+  class ChildElementIter
+      : public llvm::iterator_facade_base<
+            ChildElementIter<Const>, std::random_access_iterator_tag,
+            std::conditional_t<Const, const IntegerLiteral *,
+                               IntegerLiteral *>> {
+    friend class EmbedExpr;
+
+    EmbedExpr *EExpr = nullptr;
+    unsigned long long CurOffset = ULLONG_MAX;
+    using BaseTy = typename ChildElementIter::iterator_facade_base;
+
+    ChildElementIter(EmbedExpr *E) : EExpr(E) {
+      if (E)
+        CurOffset = E->getStartingElementPos();
+    }
+
+  public:
+    ChildElementIter() : CurOffset(ULLONG_MAX) {}
+    typename BaseTy::reference operator*() const {
+      assert(EExpr && CurOffset != ULLONG_MAX &&
+             "trying to dereference an invalid iterator");
+      IntegerLiteral *N = EExpr->FakeChildNode;
+      StringRef DataRef = EExpr->Data->BinaryData->getBytes();
+      N->setValue(*EExpr->Ctx,
+                  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+                              N->getType()->isSignedIntegerType()));
+      // We want to return a reference to the fake child node in the
+      // EmbedExpr, not the local variable N.
+      return const_cast<typename BaseTy::reference>(EExpr->FakeChildNode);
+    }
+    typename BaseTy::pointer operator->() const { return **this; }
+    using BaseTy::operator++;
+    ChildElementIter &operator++() {
+      assert(EExpr && "trying to increment an invalid iterator");
+      assert(CurOffset != ULLONG_MAX &&
+             "Already at the end of what we can iterate over");
+      if (++CurOffset >=
+          EExpr->getDataElementCount() + EExpr->getStartingElementPos()) {
+        CurOffset = ULLONG_MAX;
+        EExpr = nullptr;
+      }
+      return *this;
+    }
+    bool operator==(ChildElementIter Other) const {
+      return (EExpr == Other.EExpr && CurOffset == Other.CurOffset);
+    }
+  }; // class ChildElementIter
+
+public:
+  using fake_child_range = llvm::iterator_range<ChildElementIter<false>>;
+  using const_fake_child_range = llvm::iterator_range<ChildElementIter<true>>;
+
+  fake_child_range underlying_data_elements() {
+    return fake_child_range(ChildElementIter<false>(this),
+                            ChildElementIter<false>());
+  }
+
+  const_fake_child_range underlying_data_elements() const {
+    return const_fake_child_range(
+        ChildElementIter<true>(const_cast<EmbedExpr *>(this)),
+        ChildElementIter<true>());
+  }
+
+  child_range children() {
+    return child_range(child_iterator(), child_iterator());
+  }
+
+  const_child_range children() const {
+    return const_child_range(const_child_iterator(), const_child_iterator());
+  }
+
+  static bool classof(const Stmt *T) {
+    return T->getStmtClass() == EmbedExprClass;
+  }
+
+  ChildElementIter<false> begin() { return ChildElementIter<false>(this); }
+
+  ChildElementIter<true> begin() const {
+    return ChildElementIter<true>(const_cast<EmbedExpr *>(this));
+  }
+
+  template <typename Call, typename... Targs>
+  bool doForEachDataElement(Call &&C, unsigned &StartingIndexInArray,
+                            Targs &&...Fargs) const {
+    for (auto It : underlying_data_elements()) {
+      if (!std::invoke(std::forward<Call>(C), const_cast<IntegerLiteral *>(It),
+                       StartingIndexInArray, std::forward<Targs>(Fargs)...))
+        return false;
+      StartingIndexInArray++;
+    }
+    return true;
+  }
+
+private:
+  friend class ASTStmtReader;
+};
+
 /// Describes an C or C++ initializer list.
 ///
 /// InitListExpr describes an initializer list, which can be used to
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index aa55e2e7e8718..2785afd59bf21 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -2864,6 +2864,11 @@ DEF_TRAVERSE_STMT(ShuffleVectorExpr, {})
 DEF_TRAVERSE_STMT(ConvertVectorExpr, {})
 DEF_TRAVERSE_STMT(StmtExpr, {})
 DEF_TRAVERSE_STMT(SourceLocExpr, {})
+DEF_TRAVERSE_STMT(EmbedExpr, {
+  for (IntegerLiteral *IL : S->underlying_data_elements()) {
+    TRY_TO_TRAVERSE_OR_ENQUEUE_STMT(IL);
+  }
+})
 
 DEF_TRAVERSE_STMT(UnresolvedLookupExpr, {
   TRY_TO(TraverseNestedNameSpecifierLoc(S->getQualifierLoc()));
diff --git a/clang/include/clang/AST/TextNodeDumper.h b/clang/include/clang/AST/TextNodeDumper.h
index abfafcaef271b..39dd1f515c9eb 100644
--- a/clang/include/clang/AST/TextNodeDumper.h
+++ b/clang/include/clang/AST/TextNodeDumper.h
@@ -409,6 +409,7 @@ class TextNodeDumper
   void VisitHLSLBufferDecl(const HLSLBufferDecl *D);
   void VisitOpenACCConstructStmt(const OpenACCConstructStmt *S);
   void VisitOpenACCLoopConstruct(const OpenACCLoopConstruct *S);
+  void VisitEmbedExpr(const EmbedExpr *S);
 };
 
 } // namespace clang
diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td b/clang/include/clang/Basic/DiagnosticCommonKinds.td
index 1e44bc4ad09b6..de758cbe679dc 100644
--- a/clang/include/clang/Basic/DiagnosticCommonKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td
@@ -275,6 +275,9 @@ def err_too_large_for_fixed_point : Error<
 def err_unimplemented_conversion_with_fixed_point_type : Error<
   "conversion between fixed point and %0 is not yet supported">;
 
+def err_requires_positive_value : Error<
+  "%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
+
 // SEH
 def err_seh_expected_handler : Error<
   "expected '__except' or '__finally' block">;
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 25fbfe83fa2bc..12d7b8c0205ee 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -436,6 +436,14 @@ def warn_cxx23_compat_warning_directive : Warning<
 def warn_c23_compat_warning_directive : Warning<
   "#warning is incompatible with C standards before C23">,
   InGroup<CPre23Compat>, DefaultIgnore;
+def ext_pp_embed_directive : ExtWarn<
+  "#embed is a %select{C23|Clang}0 extension">,
+  InGroup<C23>;
+def warn_compat_pp_embed_directive : Warning<
+  "#embed is incompatible with C standards before C23">,
+  InGroup<CPre23Compat>, DefaultIgnore;
+def err_pp_embed_dup_params : Error<
+  "cannot specify parameter '%0' twice in the same '#embed' directive">;
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
@@ -505,6 +513,8 @@ def err_pp_invalid_directive : Error<
   "invalid preprocessing directive%select{|, did you mean '#%1'?}0">;
 def warn_pp_invalid_directive : Warning<
   err_pp_invalid_directive.Summary>, InGroup<DiagGroup<"unknown-directives">>;
+def err_pp_unknown_parameter : Error<
+  "unknown%select{ | embed}0 preprocessor parameter '%1'">;
 def err_pp_directive_required : Error<
   "%0 must be used within a preprocessing directive">;
 def err_pp_file_not_found : Error<"'%0' file not found">, DefaultFatal;
@@ -719,6 +729,8 @@ def err_pp_module_build_missing_end : Error<
   "no matching '#pragma clang module endbuild' for this '#pragma clang module build'">;
 
 def err_defined_macro_name : Error<"'defined' cannot be used as a macro name">;
+def err_defined_in_pp_embed : Error<
+  "'defined' cannot appear within this context">;
 def err_paste_at_start : Error<
   "'##' cannot appear at start of macro expansion">;
 def err_paste_at_end : Error<"'##' cannot appear at end of macro expansion">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 9b8f5b7e80e7e..833e8b51c0257 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -1097,8 +1097,6 @@ def note_surrounding_namespace_starts_here : Note<
   "surrounding namespace with visibility attribute starts here">;
 def err_pragma_loop_invalid_argument_type : Error<
   "invalid argument of type %0; expected an integer type">;
-def err_pragma_loop_invalid_argument_value : Error<
-  "%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
 def err_pragma_loop_compatibility : Error<
   "%select{incompatible|duplicate}0 directives '%1' and '%2'">;
 def err_pragma_loop_precedes_nonloop : Error<
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index e1f33d57a8980..527bbef24793e 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -286,12 +286,15 @@ class FileManager : public RefCountedBase<FileManager> {
   /// MemoryBuffer if successful, otherwise returning null.
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(FileEntryRef Entry, bool isVolatile = false,
-                   bool RequiresNullTerminator = true);
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt);
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(StringRef Filename, bool isVolatile = false,
-                   bool RequiresNullTerminator = true) const {
-    return getBufferForFileImpl(Filename, /*FileSize=*/-1, isVolatile,
-                                RequiresNullTerminator);
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt) const {
+    return getBufferForFileImpl(Filename,
+                                /*FileSize=*/(MaybeLimit ? *MaybeLimit : -1),
+                                isVolatile, RequiresNullTerminator);
   }
 
 private:
diff --git a/clang/include/clang/Basic/StmtNodes.td b/clang/include/clang/Basic/StmtNodes.td
index 6ca08abdb14f0..c59a17be7808f 100644
--- a/clang/include/clang/Basic/StmtNodes.td
+++ b/clang/include/clang/Basic/StmtNodes.td
@@ -204,6 +204,7 @@ def OpaqueValueExpr : StmtNode<Expr>;
 def TypoExpr : StmtNode<Expr>;
 def RecoveryExpr : StmtNode<Expr>;
 def BuiltinBitCastExpr : StmtNode<ExplicitCastExpr>;
+def EmbedExpr : StmtNode<Expr>;
 
 // Microsoft Extensions.
 def MSPropertyRefExpr : StmtNode<Expr>;
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 9c4b17465e18a..37d570ca5e75b 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -126,6 +126,9 @@ PPKEYWORD(error)
 // C99 6.10.6 - Pragma Directive.
 PPKEYWORD(pragma)
 
+// C23 & C++26 #embed
+PPKEYWORD(embed)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -999,6 +1002,9 @@ ANNOTATION(header_unit)
 // Annotation for end of input in clang-repl.
 ANNOTATION(repl_input_end)
 
+// Annotation for #embed
+ANNOTATION(embed)
+
 #undef PRAGMA_ANNOTATION
 #undef ANNOTATION
 #undef TESTING_KEYWORD
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 15f62c5c1a6ab..0c04d272c1ac7 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -880,6 +880,9 @@ will be ignored}]>;
 def L : JoinedOrSeparate<["-"], "L">, Flags<[RenderJoined]>, Group<Link_Group>,
     Visibility<[ClangOption, FlangOption]>,
     MetaVarName<"<dir>">, HelpText<"Add directory to library search path">;
+def embed_dir_EQ : Joined<["--"], "embed-dir=">, Group<Preprocessor_Group>,
+    Visibility<[ClangOption, CC1Option]>, MetaVarName<"<dir>">,
+    HelpText<"Add directory to embed search path">;
 def MD : Flag<["-"], "MD">, Group<M_Group>,
     HelpText<"Write a depfile containing user and system headers">;
 def MMD : Flag<["-"], "MMD">, Group<M_Group>,
@@ -1473,6 +1476,9 @@ def dD : Flag<["-"], "dD">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>
 def dI : Flag<["-"], "dI">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Print include directives in -E mode in addition to normal output">,
   MarshallingInfoFlag<PreprocessorOutputOpts<"ShowIncludeDirectives">>;
+def dE : Flag<["-"], "dE">, Group<d_Group>, Visibility<[CC1Option]>,
+  HelpText<"Print embed directives in -E mode in addition to normal output">,
+  MarshallingInfoFlag<PreprocessorOutputOpts<"ShowEmbedDirectives">>;
 def dM : Flag<["-"], "dM">, Group<d_Group>, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
   HelpText<"Print macro definitions in -E mode instead of normal output">;
 def dead__strip : Flag<["-"], "dead_strip">;
diff --git a/clang/include/clang/Frontend/PreprocessorOutputOptions.h b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
index 6e19cae33cf28..654cf22f010f7 100644
--- a/clang/include/clang/Frontend/PreprocessorOutputOptions.h
+++ b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
@@ -32,6 +32,8 @@ class PreprocessorOutputOptions {
   LLVM_PREFERRED_TYPE(bool)
   unsigned ShowIncludeDirectives : 1;  ///< Print includes, imports etc. within preprocessed output.
   LLVM_PREFERRED_TYPE(bool)
+  unsigned ShowEmbedDirectives : 1; ///< Print embeds, etc. within preprocessed
+  LLVM_PREFERRED_TYPE(bool)
   unsigned RewriteIncludes : 1;    ///< Preprocess include directives only.
   LLVM_PREFERRED_TYPE(bool)
   unsigned RewriteImports  : 1;    ///< Include contents of transitively-imported modules.
@@ -51,6 +53,7 @@ class PreprocessorOutputOptions {
     ShowMacroComments = 0;
     ShowMacros = 0;
     ShowIncludeDirectives = 0;
+    ShowEmbedDirectives = 0;
     RewriteIncludes = 0;
     RewriteImports = 0;
     MinimizeWhitespace = 0;
diff --git a/clang/include/clang/Lex/PPCallbacks.h b/clang/include/clang/Lex/PPCallbacks.h
index dfc74b52686f1..46cc564086f1c 100644
--- a/clang/include/clang/Lex/PPCallbacks.h
+++ b/clang/include/clang/Lex/PPCallbacks.h
@@ -27,6 +27,7 @@ class IdentifierInfo;
 class MacroDefinition;
 class MacroDirective;
 class MacroArgs;
+struct LexEmbedParametersResult;
 
 /// This interface provides a way to observe the actions of the
 /// preprocessor as it does its thing.
@@ -83,6 +84,34 @@ class PPCallbacks {
                            const Token &FilenameTok,
                            SrcMgr::CharacteristicKind FileType) {}
 
+  /// Callback invoked whenever the preprocessor cannot find a file for an
+  /// embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \returns true to indicate that the preprocessor should skip this file
+  /// and not issue any diagnostic.
+  virtual bool EmbedFileNotFound(StringRef FileName) { return false; }
+
+  /// Callback invoked whenever an embed directive has been processed,
+  /// regardless of whether the embed will actually find a file.
+  ///
+  /// \param HashLoc The location of the '#' that starts the embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \param IsAngled Whet...
[truncated]

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-clang-driver

Author: Mariya Podchishchaeva (Fznamznon)

Changes

This commit implements the entirety of the now-accepted N3017 -Preprocessor Embed and its sister C++ paper p1967. It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression.

This reverts commit 682d461.


Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>


Patch is 184.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95802.diff

96 Files Affected:

  • (modified) clang-tools-extra/test/pp-trace/pp-trace-macro.cpp (+9)
  • (modified) clang/docs/LanguageExtensions.rst (+24)
  • (modified) clang/include/clang/AST/Expr.h (+158)
  • (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+5)
  • (modified) clang/include/clang/AST/TextNodeDumper.h (+1)
  • (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+3)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+12)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (-2)
  • (modified) clang/include/clang/Basic/FileManager.h (+7-4)
  • (modified) clang/include/clang/Basic/StmtNodes.td (+1)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Driver/Options.td (+6)
  • (modified) clang/include/clang/Frontend/PreprocessorOutputOptions.h (+3)
  • (modified) clang/include/clang/Lex/PPCallbacks.h (+54)
  • (added) clang/include/clang/Lex/PPDirectiveParameter.h (+33)
  • (added) clang/include/clang/Lex/PPEmbedParameters.h (+94)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+67-2)
  • (modified) clang/include/clang/Lex/PreprocessorOptions.h (+3)
  • (modified) clang/include/clang/Parse/Parser.h (+3)
  • (modified) clang/include/clang/Sema/Sema.h (+4)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/lib/AST/Expr.cpp (+12)
  • (modified) clang/lib/AST/ExprClassification.cpp (+5)
  • (modified) clang/lib/AST/ExprConstant.cpp (+58-5)
  • (modified) clang/lib/AST/Interp/ByteCodeExprGen.cpp (+18-3)
  • (modified) clang/lib/AST/Interp/ByteCodeExprGen.h (+1)
  • (modified) clang/lib/AST/ItaniumMangle.cpp (+1)
  • (modified) clang/lib/AST/StmtPrinter.cpp (+4)
  • (modified) clang/lib/AST/StmtProfile.cpp (+2)
  • (modified) clang/lib/AST/TextNodeDumper.cpp (+5)
  • (modified) clang/lib/Basic/FileManager.cpp (+6-1)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-2)
  • (modified) clang/lib/CodeGen/CGExprAgg.cpp (+32-8)
  • (modified) clang/lib/CodeGen/CGExprConstant.cpp (+93-25)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+7)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5-1)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+8)
  • (modified) clang/lib/Frontend/DependencyFile.cpp (+25)
  • (modified) clang/lib/Frontend/DependencyGraph.cpp (+23-1)
  • (modified) clang/lib/Frontend/InitPreprocessor.cpp (+8)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+115-7)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+474-2)
  • (modified) clang/lib/Lex/PPExpressions.cpp (+36-13)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+111)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+4-1)
  • (modified) clang/lib/Parse/ParseExpr.cpp (+36-1)
  • (modified) clang/lib/Parse/ParseInit.cpp (+30)
  • (modified) clang/lib/Parse/ParseTemplate.cpp (+29-12)
  • (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+12-3)
  • (modified) clang/lib/Sema/SemaInit.cpp (+100-13)
  • (modified) clang/lib/Sema/TreeTransform.h (+5)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+14)
  • (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+10)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+4)
  • (added) clang/test/C/C2x/Inputs/bits.bin (+1)
  • (added) clang/test/C/C2x/Inputs/boop.h (+1)
  • (added) clang/test/C/C2x/Inputs/i.dat (+1)
  • (added) clang/test/C/C2x/Inputs/jump.wav (+1)
  • (added) clang/test/C/C2x/Inputs/s.dat (+1)
  • (added) clang/test/C/C2x/n3017.c (+216)
  • (added) clang/test/Preprocessor/Inputs/jk.txt (+1)
  • (added) clang/test/Preprocessor/Inputs/media/art.txt (+9)
  • (added) clang/test/Preprocessor/Inputs/media/empty ()
  • (added) clang/test/Preprocessor/Inputs/null_byte.bin ()
  • (added) clang/test/Preprocessor/Inputs/numbers.txt (+1)
  • (added) clang/test/Preprocessor/Inputs/single_byte.txt (+1)
  • (added) clang/test/Preprocessor/embed___has_embed.c (+60)
  • (added) clang/test/Preprocessor/embed___has_embed_parsing_errors.c (+240)
  • (added) clang/test/Preprocessor/embed___has_embed_supported.c (+24)
  • (added) clang/test/Preprocessor/embed_art.c (+104)
  • (added) clang/test/Preprocessor/embed_codegen.cpp (+84)
  • (added) clang/test/Preprocessor/embed_constexpr.cpp (+97)
  • (added) clang/test/Preprocessor/embed_dependencies.c (+20)
  • (added) clang/test/Preprocessor/embed_ext_compat_diags.c (+16)
  • (added) clang/test/Preprocessor/embed_feature_test.cpp (+7)
  • (added) clang/test/Preprocessor/embed_file_not_found_chevron.c (+4)
  • (added) clang/test/Preprocessor/embed_file_not_found_quote.c (+4)
  • (added) clang/test/Preprocessor/embed_init.c (+29)
  • (added) clang/test/Preprocessor/embed_parameter_if_empty.c (+24)
  • (added) clang/test/Preprocessor/embed_parameter_limit.c (+94)
  • (added) clang/test/Preprocessor/embed_parameter_offset.c (+89)
  • (added) clang/test/Preprocessor/embed_parameter_prefix.c (+38)
  • (added) clang/test/Preprocessor/embed_parameter_suffix.c (+39)
  • (added) clang/test/Preprocessor/embed_parameter_unrecognized.c (+9)
  • (added) clang/test/Preprocessor/embed_parsing_errors.c (+130)
  • (added) clang/test/Preprocessor/embed_path_chevron.c (+8)
  • (added) clang/test/Preprocessor/embed_path_quote.c (+8)
  • (added) clang/test/Preprocessor/embed_preprocess_to_file.c (+39)
  • (added) clang/test/Preprocessor/embed_single_entity.c (+7)
  • (added) clang/test/Preprocessor/embed_weird.cpp (+98)
  • (modified) clang/test/Preprocessor/init-aarch64.c (+3)
  • (modified) clang/test/Preprocessor/init.c (+3)
  • (added) clang/test/Preprocessor/single_byte.txt (+1)
  • (modified) clang/tools/libclang/CXCursor.cpp (+1)
  • (modified) clang/www/c_status.html (+1-1)
diff --git a/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp b/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
index 1d85607e86b7f..7c2a231101070 100644
--- a/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
+++ b/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
@@ -31,6 +31,15 @@ X
 // CHECK:        MacroNameTok: __STDC_UTF_32__
 // CHECK-NEXT:   MacroDirective: MD_Define
 // CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_NOT_FOUND__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_FOUND__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_EMPTY__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
 // CHECK:      - Callback: MacroDefined
 // CHECK-NEXT:   MacroNameTok: MACRO
 // CHECK-NEXT:   MacroDirective: MD_Define
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 92e6025c95a8c..9830b35faae12 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -1502,6 +1502,7 @@ Attributes on Structured Bindings            __cpp_structured_bindings        C+
 Designated initializers (N494)                                                C99           C89
 Array & element qualification (N2607)                                         C23           C89
 Attributes (N2335)                                                            C23           C89
+``#embed`` (N3017)                                                            C23           C89, C++
 ============================================ ================================ ============= =============
 
 Type Trait Primitives
@@ -5664,3 +5665,26 @@ Compiling different TUs depending on these flags (including use of
 ``std::hardware_destructive_interference``)  with different compilers, macro
 definitions, or architecture flags will lead to ODR violations and should be
 avoided.
+
+``#embed`` Parameters
+=====================
+
+``clang::offset``
+-----------------
+The ``clang::offset`` embed parameter may appear zero or one time in the
+embed parameter sequence. Its preprocessor argument clause shall be present and
+have the form:
+
+..code-block: text
+
+  ( constant-expression )
+
+and shall be an integer constant expression. The integer constant expression
+shall not evaluate to a value less than 0. The token ``defined`` shall not
+appear within the constant expression.
+
+The offset will be used when reading the contents of the embedded resource to
+specify the starting offset to begin embedding from. The resources is treated
+as being empty if the specified offset is larger than the number of bytes in
+the resource. The offset will be applied *before* any ``limit`` parameters are
+applied.
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index f2bf667636dc9..3bc8cae4d8c86 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -4799,6 +4799,164 @@ class SourceLocExpr final : public Expr {
   friend class ASTStmtReader;
 };
 
+/// Stores data related to a single #embed directive.
+struct EmbedDataStorage {
+  StringLiteral *BinaryData;
+  size_t getDataElementCount() const { return BinaryData->getByteLength(); }
+};
+
+/// Represents a reference to #emded data. By default, this references the whole
+/// range. Otherwise it represents a subrange of data imported by #embed
+/// directive. Needed to handle nested initializer lists with #embed directives.
+/// Example:
+///  struct S {
+///    int x, y;
+///  };
+///
+///  struct T {
+///    int x[2];
+///    struct S s
+///  };
+///
+///  struct T t[] = {
+///  #embed "data" // data contains 10 elements;
+///  };
+///
+/// The resulting semantic form of initializer list will contain (EE stands
+/// for EmbedExpr):
+///  { {EE(first two data elements), {EE(3rd element), EE(4th element) }},
+///  { {EE(5th and 6th element), {EE(7th element), EE(8th element) }},
+///  { {EE(9th and 10th element), { zeroinitializer }}}
+///
+/// EmbedExpr inside of a semantic initializer list and referencing more than
+/// one element can only appear for arrays of scalars.
+class EmbedExpr final : public Expr {
+  SourceLocation EmbedKeywordLoc;
+  IntegerLiteral *FakeChildNode = nullptr;
+  const ASTContext *Ctx = nullptr;
+  EmbedDataStorage *Data;
+  unsigned Begin = 0;
+  unsigned NumOfElements;
+
+public:
+  EmbedExpr(const ASTContext &Ctx, SourceLocation Loc, EmbedDataStorage *Data,
+            unsigned Begin, unsigned NumOfElements);
+  explicit EmbedExpr(EmptyShell Empty) : Expr(SourceLocExprClass, Empty) {}
+
+  SourceLocation getLocation() const { return EmbedKeywordLoc; }
+  SourceLocation getBeginLoc() const { return EmbedKeywordLoc; }
+  SourceLocation getEndLoc() const { return EmbedKeywordLoc; }
+
+  StringLiteral *getDataStringLiteral() const { return Data->BinaryData; }
+  EmbedDataStorage *getData() const { return Data; }
+
+  unsigned getStartingElementPos() const { return Begin; }
+  size_t getDataElementCount() const { return NumOfElements; }
+
+  // Allows accessing every byte of EmbedExpr data and iterating over it.
+  // An Iterator knows the EmbedExpr that it refers to, and an offset value
+  // within the data.
+  // Dereferencing an Iterator results in construction of IntegerLiteral AST
+  // node filled with byte of data of the corresponding EmbedExpr within offset
+  // that the Iterator currently has.
+  template <bool Const>
+  class ChildElementIter
+      : public llvm::iterator_facade_base<
+            ChildElementIter<Const>, std::random_access_iterator_tag,
+            std::conditional_t<Const, const IntegerLiteral *,
+                               IntegerLiteral *>> {
+    friend class EmbedExpr;
+
+    EmbedExpr *EExpr = nullptr;
+    unsigned long long CurOffset = ULLONG_MAX;
+    using BaseTy = typename ChildElementIter::iterator_facade_base;
+
+    ChildElementIter(EmbedExpr *E) : EExpr(E) {
+      if (E)
+        CurOffset = E->getStartingElementPos();
+    }
+
+  public:
+    ChildElementIter() : CurOffset(ULLONG_MAX) {}
+    typename BaseTy::reference operator*() const {
+      assert(EExpr && CurOffset != ULLONG_MAX &&
+             "trying to dereference an invalid iterator");
+      IntegerLiteral *N = EExpr->FakeChildNode;
+      StringRef DataRef = EExpr->Data->BinaryData->getBytes();
+      N->setValue(*EExpr->Ctx,
+                  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+                              N->getType()->isSignedIntegerType()));
+      // We want to return a reference to the fake child node in the
+      // EmbedExpr, not the local variable N.
+      return const_cast<typename BaseTy::reference>(EExpr->FakeChildNode);
+    }
+    typename BaseTy::pointer operator->() const { return **this; }
+    using BaseTy::operator++;
+    ChildElementIter &operator++() {
+      assert(EExpr && "trying to increment an invalid iterator");
+      assert(CurOffset != ULLONG_MAX &&
+             "Already at the end of what we can iterate over");
+      if (++CurOffset >=
+          EExpr->getDataElementCount() + EExpr->getStartingElementPos()) {
+        CurOffset = ULLONG_MAX;
+        EExpr = nullptr;
+      }
+      return *this;
+    }
+    bool operator==(ChildElementIter Other) const {
+      return (EExpr == Other.EExpr && CurOffset == Other.CurOffset);
+    }
+  }; // class ChildElementIter
+
+public:
+  using fake_child_range = llvm::iterator_range<ChildElementIter<false>>;
+  using const_fake_child_range = llvm::iterator_range<ChildElementIter<true>>;
+
+  fake_child_range underlying_data_elements() {
+    return fake_child_range(ChildElementIter<false>(this),
+                            ChildElementIter<false>());
+  }
+
+  const_fake_child_range underlying_data_elements() const {
+    return const_fake_child_range(
+        ChildElementIter<true>(const_cast<EmbedExpr *>(this)),
+        ChildElementIter<true>());
+  }
+
+  child_range children() {
+    return child_range(child_iterator(), child_iterator());
+  }
+
+  const_child_range children() const {
+    return const_child_range(const_child_iterator(), const_child_iterator());
+  }
+
+  static bool classof(const Stmt *T) {
+    return T->getStmtClass() == EmbedExprClass;
+  }
+
+  ChildElementIter<false> begin() { return ChildElementIter<false>(this); }
+
+  ChildElementIter<true> begin() const {
+    return ChildElementIter<true>(const_cast<EmbedExpr *>(this));
+  }
+
+  template <typename Call, typename... Targs>
+  bool doForEachDataElement(Call &&C, unsigned &StartingIndexInArray,
+                            Targs &&...Fargs) const {
+    for (auto It : underlying_data_elements()) {
+      if (!std::invoke(std::forward<Call>(C), const_cast<IntegerLiteral *>(It),
+                       StartingIndexInArray, std::forward<Targs>(Fargs)...))
+        return false;
+      StartingIndexInArray++;
+    }
+    return true;
+  }
+
+private:
+  friend class ASTStmtReader;
+};
+
 /// Describes an C or C++ initializer list.
 ///
 /// InitListExpr describes an initializer list, which can be used to
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index aa55e2e7e8718..2785afd59bf21 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -2864,6 +2864,11 @@ DEF_TRAVERSE_STMT(ShuffleVectorExpr, {})
 DEF_TRAVERSE_STMT(ConvertVectorExpr, {})
 DEF_TRAVERSE_STMT(StmtExpr, {})
 DEF_TRAVERSE_STMT(SourceLocExpr, {})
+DEF_TRAVERSE_STMT(EmbedExpr, {
+  for (IntegerLiteral *IL : S->underlying_data_elements()) {
+    TRY_TO_TRAVERSE_OR_ENQUEUE_STMT(IL);
+  }
+})
 
 DEF_TRAVERSE_STMT(UnresolvedLookupExpr, {
   TRY_TO(TraverseNestedNameSpecifierLoc(S->getQualifierLoc()));
diff --git a/clang/include/clang/AST/TextNodeDumper.h b/clang/include/clang/AST/TextNodeDumper.h
index abfafcaef271b..39dd1f515c9eb 100644
--- a/clang/include/clang/AST/TextNodeDumper.h
+++ b/clang/include/clang/AST/TextNodeDumper.h
@@ -409,6 +409,7 @@ class TextNodeDumper
   void VisitHLSLBufferDecl(const HLSLBufferDecl *D);
   void VisitOpenACCConstructStmt(const OpenACCConstructStmt *S);
   void VisitOpenACCLoopConstruct(const OpenACCLoopConstruct *S);
+  void VisitEmbedExpr(const EmbedExpr *S);
 };
 
 } // namespace clang
diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td b/clang/include/clang/Basic/DiagnosticCommonKinds.td
index 1e44bc4ad09b6..de758cbe679dc 100644
--- a/clang/include/clang/Basic/DiagnosticCommonKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td
@@ -275,6 +275,9 @@ def err_too_large_for_fixed_point : Error<
 def err_unimplemented_conversion_with_fixed_point_type : Error<
   "conversion between fixed point and %0 is not yet supported">;
 
+def err_requires_positive_value : Error<
+  "%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
+
 // SEH
 def err_seh_expected_handler : Error<
   "expected '__except' or '__finally' block">;
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 25fbfe83fa2bc..12d7b8c0205ee 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -436,6 +436,14 @@ def warn_cxx23_compat_warning_directive : Warning<
 def warn_c23_compat_warning_directive : Warning<
   "#warning is incompatible with C standards before C23">,
   InGroup<CPre23Compat>, DefaultIgnore;
+def ext_pp_embed_directive : ExtWarn<
+  "#embed is a %select{C23|Clang}0 extension">,
+  InGroup<C23>;
+def warn_compat_pp_embed_directive : Warning<
+  "#embed is incompatible with C standards before C23">,
+  InGroup<CPre23Compat>, DefaultIgnore;
+def err_pp_embed_dup_params : Error<
+  "cannot specify parameter '%0' twice in the same '#embed' directive">;
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
@@ -505,6 +513,8 @@ def err_pp_invalid_directive : Error<
   "invalid preprocessing directive%select{|, did you mean '#%1'?}0">;
 def warn_pp_invalid_directive : Warning<
   err_pp_invalid_directive.Summary>, InGroup<DiagGroup<"unknown-directives">>;
+def err_pp_unknown_parameter : Error<
+  "unknown%select{ | embed}0 preprocessor parameter '%1'">;
 def err_pp_directive_required : Error<
   "%0 must be used within a preprocessing directive">;
 def err_pp_file_not_found : Error<"'%0' file not found">, DefaultFatal;
@@ -719,6 +729,8 @@ def err_pp_module_build_missing_end : Error<
   "no matching '#pragma clang module endbuild' for this '#pragma clang module build'">;
 
 def err_defined_macro_name : Error<"'defined' cannot be used as a macro name">;
+def err_defined_in_pp_embed : Error<
+  "'defined' cannot appear within this context">;
 def err_paste_at_start : Error<
   "'##' cannot appear at start of macro expansion">;
 def err_paste_at_end : Error<"'##' cannot appear at end of macro expansion">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 9b8f5b7e80e7e..833e8b51c0257 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -1097,8 +1097,6 @@ def note_surrounding_namespace_starts_here : Note<
   "surrounding namespace with visibility attribute starts here">;
 def err_pragma_loop_invalid_argument_type : Error<
   "invalid argument of type %0; expected an integer type">;
-def err_pragma_loop_invalid_argument_value : Error<
-  "%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
 def err_pragma_loop_compatibility : Error<
   "%select{incompatible|duplicate}0 directives '%1' and '%2'">;
 def err_pragma_loop_precedes_nonloop : Error<
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index e1f33d57a8980..527bbef24793e 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -286,12 +286,15 @@ class FileManager : public RefCountedBase<FileManager> {
   /// MemoryBuffer if successful, otherwise returning null.
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(FileEntryRef Entry, bool isVolatile = false,
-                   bool RequiresNullTerminator = true);
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt);
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(StringRef Filename, bool isVolatile = false,
-                   bool RequiresNullTerminator = true) const {
-    return getBufferForFileImpl(Filename, /*FileSize=*/-1, isVolatile,
-                                RequiresNullTerminator);
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt) const {
+    return getBufferForFileImpl(Filename,
+                                /*FileSize=*/(MaybeLimit ? *MaybeLimit : -1),
+                                isVolatile, RequiresNullTerminator);
   }
 
 private:
diff --git a/clang/include/clang/Basic/StmtNodes.td b/clang/include/clang/Basic/StmtNodes.td
index 6ca08abdb14f0..c59a17be7808f 100644
--- a/clang/include/clang/Basic/StmtNodes.td
+++ b/clang/include/clang/Basic/StmtNodes.td
@@ -204,6 +204,7 @@ def OpaqueValueExpr : StmtNode<Expr>;
 def TypoExpr : StmtNode<Expr>;
 def RecoveryExpr : StmtNode<Expr>;
 def BuiltinBitCastExpr : StmtNode<ExplicitCastExpr>;
+def EmbedExpr : StmtNode<Expr>;
 
 // Microsoft Extensions.
 def MSPropertyRefExpr : StmtNode<Expr>;
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 9c4b17465e18a..37d570ca5e75b 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -126,6 +126,9 @@ PPKEYWORD(error)
 // C99 6.10.6 - Pragma Directive.
 PPKEYWORD(pragma)
 
+// C23 & C++26 #embed
+PPKEYWORD(embed)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -999,6 +1002,9 @@ ANNOTATION(header_unit)
 // Annotation for end of input in clang-repl.
 ANNOTATION(repl_input_end)
 
+// Annotation for #embed
+ANNOTATION(embed)
+
 #undef PRAGMA_ANNOTATION
 #undef ANNOTATION
 #undef TESTING_KEYWORD
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 15f62c5c1a6ab..0c04d272c1ac7 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -880,6 +880,9 @@ will be ignored}]>;
 def L : JoinedOrSeparate<["-"], "L">, Flags<[RenderJoined]>, Group<Link_Group>,
     Visibility<[ClangOption, FlangOption]>,
     MetaVarName<"<dir>">, HelpText<"Add directory to library search path">;
+def embed_dir_EQ : Joined<["--"], "embed-dir=">, Group<Preprocessor_Group>,
+    Visibility<[ClangOption, CC1Option]>, MetaVarName<"<dir>">,
+    HelpText<"Add directory to embed search path">;
 def MD : Flag<["-"], "MD">, Group<M_Group>,
     HelpText<"Write a depfile containing user and system headers">;
 def MMD : Flag<["-"], "MMD">, Group<M_Group>,
@@ -1473,6 +1476,9 @@ def dD : Flag<["-"], "dD">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>
 def dI : Flag<["-"], "dI">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Print include directives in -E mode in addition to normal output">,
   MarshallingInfoFlag<PreprocessorOutputOpts<"ShowIncludeDirectives">>;
+def dE : Flag<["-"], "dE">, Group<d_Group>, Visibility<[CC1Option]>,
+  HelpText<"Print embed directives in -E mode in addition to normal output">,
+  MarshallingInfoFlag<PreprocessorOutputOpts<"ShowEmbedDirectives">>;
 def dM : Flag<["-"], "dM">, Group<d_Group>, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
   HelpText<"Print macro definitions in -E mode instead of normal output">;
 def dead__strip : Flag<["-"], "dead_strip">;
diff --git a/clang/include/clang/Frontend/PreprocessorOutputOptions.h b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
index 6e19cae33cf28..654cf22f010f7 100644
--- a/clang/include/clang/Frontend/PreprocessorOutputOptions.h
+++ b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
@@ -32,6 +32,8 @@ class PreprocessorOutputOptions {
   LLVM_PREFERRED_TYPE(bool)
   unsigned ShowIncludeDirectives : 1;  ///< Print includes, imports etc. within preprocessed output.
   LLVM_PREFERRED_TYPE(bool)
+  unsigned ShowEmbedDirectives : 1; ///< Print embeds, etc. within preprocessed
+  LLVM_PREFERRED_TYPE(bool)
   unsigned RewriteIncludes : 1;    ///< Preprocess include directives only.
   LLVM_PREFERRED_TYPE(bool)
   unsigned RewriteImports  : 1;    ///< Include contents of transitively-imported modules.
@@ -51,6 +53,7 @@ class PreprocessorOutputOptions {
     ShowMacroComments = 0;
     ShowMacros = 0;
     ShowIncludeDirectives = 0;
+    ShowEmbedDirectives = 0;
     RewriteIncludes = 0;
     RewriteImports = 0;
     MinimizeWhitespace = 0;
diff --git a/clang/include/clang/Lex/PPCallbacks.h b/clang/include/clang/Lex/PPCallbacks.h
index dfc74b52686f1..46cc564086f1c 100644
--- a/clang/include/clang/Lex/PPCallbacks.h
+++ b/clang/include/clang/Lex/PPCallbacks.h
@@ -27,6 +27,7 @@ class IdentifierInfo;
 class MacroDefinition;
 class MacroDirective;
 class MacroArgs;
+struct LexEmbedParametersResult;
 
 /// This interface provides a way to observe the actions of the
 /// preprocessor as it does its thing.
@@ -83,6 +84,34 @@ class PPCallbacks {
                            const Token &FilenameTok,
                            SrcMgr::CharacteristicKind FileType) {}
 
+  /// Callback invoked whenever the preprocessor cannot find a file for an
+  /// embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \returns true to indicate that the preprocessor should skip this file
+  /// and not issue any diagnostic.
+  virtual bool EmbedFileNotFound(StringRef FileName) { return false; }
+
+  /// Callback invoked whenever an embed directive has been processed,
+  /// regardless of whether the embed will actually find a file.
+  ///
+  /// \param HashLoc The location of the '#' that starts the embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \param IsAngled Whet...
[truncated]

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Mariya Podchishchaeva (Fznamznon)

Changes

This commit implements the entirety of the now-accepted N3017 -Preprocessor Embed and its sister C++ paper p1967. It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression.

This reverts commit 682d461.


Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>


Patch is 184.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95802.diff

96 Files Affected:

  • (modified) clang-tools-extra/test/pp-trace/pp-trace-macro.cpp (+9)
  • (modified) clang/docs/LanguageExtensions.rst (+24)
  • (modified) clang/include/clang/AST/Expr.h (+158)
  • (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+5)
  • (modified) clang/include/clang/AST/TextNodeDumper.h (+1)
  • (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+3)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+12)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (-2)
  • (modified) clang/include/clang/Basic/FileManager.h (+7-4)
  • (modified) clang/include/clang/Basic/StmtNodes.td (+1)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Driver/Options.td (+6)
  • (modified) clang/include/clang/Frontend/PreprocessorOutputOptions.h (+3)
  • (modified) clang/include/clang/Lex/PPCallbacks.h (+54)
  • (added) clang/include/clang/Lex/PPDirectiveParameter.h (+33)
  • (added) clang/include/clang/Lex/PPEmbedParameters.h (+94)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+67-2)
  • (modified) clang/include/clang/Lex/PreprocessorOptions.h (+3)
  • (modified) clang/include/clang/Parse/Parser.h (+3)
  • (modified) clang/include/clang/Sema/Sema.h (+4)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/lib/AST/Expr.cpp (+12)
  • (modified) clang/lib/AST/ExprClassification.cpp (+5)
  • (modified) clang/lib/AST/ExprConstant.cpp (+58-5)
  • (modified) clang/lib/AST/Interp/ByteCodeExprGen.cpp (+18-3)
  • (modified) clang/lib/AST/Interp/ByteCodeExprGen.h (+1)
  • (modified) clang/lib/AST/ItaniumMangle.cpp (+1)
  • (modified) clang/lib/AST/StmtPrinter.cpp (+4)
  • (modified) clang/lib/AST/StmtProfile.cpp (+2)
  • (modified) clang/lib/AST/TextNodeDumper.cpp (+5)
  • (modified) clang/lib/Basic/FileManager.cpp (+6-1)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-2)
  • (modified) clang/lib/CodeGen/CGExprAgg.cpp (+32-8)
  • (modified) clang/lib/CodeGen/CGExprConstant.cpp (+93-25)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+7)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5-1)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+8)
  • (modified) clang/lib/Frontend/DependencyFile.cpp (+25)
  • (modified) clang/lib/Frontend/DependencyGraph.cpp (+23-1)
  • (modified) clang/lib/Frontend/InitPreprocessor.cpp (+8)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+115-7)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+474-2)
  • (modified) clang/lib/Lex/PPExpressions.cpp (+36-13)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+111)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+4-1)
  • (modified) clang/lib/Parse/ParseExpr.cpp (+36-1)
  • (modified) clang/lib/Parse/ParseInit.cpp (+30)
  • (modified) clang/lib/Parse/ParseTemplate.cpp (+29-12)
  • (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+12-3)
  • (modified) clang/lib/Sema/SemaInit.cpp (+100-13)
  • (modified) clang/lib/Sema/TreeTransform.h (+5)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+14)
  • (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+10)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+4)
  • (added) clang/test/C/C2x/Inputs/bits.bin (+1)
  • (added) clang/test/C/C2x/Inputs/boop.h (+1)
  • (added) clang/test/C/C2x/Inputs/i.dat (+1)
  • (added) clang/test/C/C2x/Inputs/jump.wav (+1)
  • (added) clang/test/C/C2x/Inputs/s.dat (+1)
  • (added) clang/test/C/C2x/n3017.c (+216)
  • (added) clang/test/Preprocessor/Inputs/jk.txt (+1)
  • (added) clang/test/Preprocessor/Inputs/media/art.txt (+9)
  • (added) clang/test/Preprocessor/Inputs/media/empty ()
  • (added) clang/test/Preprocessor/Inputs/null_byte.bin ()
  • (added) clang/test/Preprocessor/Inputs/numbers.txt (+1)
  • (added) clang/test/Preprocessor/Inputs/single_byte.txt (+1)
  • (added) clang/test/Preprocessor/embed___has_embed.c (+60)
  • (added) clang/test/Preprocessor/embed___has_embed_parsing_errors.c (+240)
  • (added) clang/test/Preprocessor/embed___has_embed_supported.c (+24)
  • (added) clang/test/Preprocessor/embed_art.c (+104)
  • (added) clang/test/Preprocessor/embed_codegen.cpp (+84)
  • (added) clang/test/Preprocessor/embed_constexpr.cpp (+97)
  • (added) clang/test/Preprocessor/embed_dependencies.c (+20)
  • (added) clang/test/Preprocessor/embed_ext_compat_diags.c (+16)
  • (added) clang/test/Preprocessor/embed_feature_test.cpp (+7)
  • (added) clang/test/Preprocessor/embed_file_not_found_chevron.c (+4)
  • (added) clang/test/Preprocessor/embed_file_not_found_quote.c (+4)
  • (added) clang/test/Preprocessor/embed_init.c (+29)
  • (added) clang/test/Preprocessor/embed_parameter_if_empty.c (+24)
  • (added) clang/test/Preprocessor/embed_parameter_limit.c (+94)
  • (added) clang/test/Preprocessor/embed_parameter_offset.c (+89)
  • (added) clang/test/Preprocessor/embed_parameter_prefix.c (+38)
  • (added) clang/test/Preprocessor/embed_parameter_suffix.c (+39)
  • (added) clang/test/Preprocessor/embed_parameter_unrecognized.c (+9)
  • (added) clang/test/Preprocessor/embed_parsing_errors.c (+130)
  • (added) clang/test/Preprocessor/embed_path_chevron.c (+8)
  • (added) clang/test/Preprocessor/embed_path_quote.c (+8)
  • (added) clang/test/Preprocessor/embed_preprocess_to_file.c (+39)
  • (added) clang/test/Preprocessor/embed_single_entity.c (+7)
  • (added) clang/test/Preprocessor/embed_weird.cpp (+98)
  • (modified) clang/test/Preprocessor/init-aarch64.c (+3)
  • (modified) clang/test/Preprocessor/init.c (+3)
  • (added) clang/test/Preprocessor/single_byte.txt (+1)
  • (modified) clang/tools/libclang/CXCursor.cpp (+1)
  • (modified) clang/www/c_status.html (+1-1)
diff --git a/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp b/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
index 1d85607e86b7f..7c2a231101070 100644
--- a/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
+++ b/clang-tools-extra/test/pp-trace/pp-trace-macro.cpp
@@ -31,6 +31,15 @@ X
 // CHECK:        MacroNameTok: __STDC_UTF_32__
 // CHECK-NEXT:   MacroDirective: MD_Define
 // CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_NOT_FOUND__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_FOUND__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
+// CHECK-NEXT:   MacroNameTok: __STDC_EMBED_EMPTY__
+// CHECK-NEXT:   MacroDirective: MD_Define
+// CHECK:      - Callback: MacroDefined
 // CHECK:      - Callback: MacroDefined
 // CHECK-NEXT:   MacroNameTok: MACRO
 // CHECK-NEXT:   MacroDirective: MD_Define
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 92e6025c95a8c..9830b35faae12 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -1502,6 +1502,7 @@ Attributes on Structured Bindings            __cpp_structured_bindings        C+
 Designated initializers (N494)                                                C99           C89
 Array & element qualification (N2607)                                         C23           C89
 Attributes (N2335)                                                            C23           C89
+``#embed`` (N3017)                                                            C23           C89, C++
 ============================================ ================================ ============= =============
 
 Type Trait Primitives
@@ -5664,3 +5665,26 @@ Compiling different TUs depending on these flags (including use of
 ``std::hardware_destructive_interference``)  with different compilers, macro
 definitions, or architecture flags will lead to ODR violations and should be
 avoided.
+
+``#embed`` Parameters
+=====================
+
+``clang::offset``
+-----------------
+The ``clang::offset`` embed parameter may appear zero or one time in the
+embed parameter sequence. Its preprocessor argument clause shall be present and
+have the form:
+
+..code-block: text
+
+  ( constant-expression )
+
+and shall be an integer constant expression. The integer constant expression
+shall not evaluate to a value less than 0. The token ``defined`` shall not
+appear within the constant expression.
+
+The offset will be used when reading the contents of the embedded resource to
+specify the starting offset to begin embedding from. The resources is treated
+as being empty if the specified offset is larger than the number of bytes in
+the resource. The offset will be applied *before* any ``limit`` parameters are
+applied.
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index f2bf667636dc9..3bc8cae4d8c86 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -4799,6 +4799,164 @@ class SourceLocExpr final : public Expr {
   friend class ASTStmtReader;
 };
 
+/// Stores data related to a single #embed directive.
+struct EmbedDataStorage {
+  StringLiteral *BinaryData;
+  size_t getDataElementCount() const { return BinaryData->getByteLength(); }
+};
+
+/// Represents a reference to #emded data. By default, this references the whole
+/// range. Otherwise it represents a subrange of data imported by #embed
+/// directive. Needed to handle nested initializer lists with #embed directives.
+/// Example:
+///  struct S {
+///    int x, y;
+///  };
+///
+///  struct T {
+///    int x[2];
+///    struct S s
+///  };
+///
+///  struct T t[] = {
+///  #embed "data" // data contains 10 elements;
+///  };
+///
+/// The resulting semantic form of initializer list will contain (EE stands
+/// for EmbedExpr):
+///  { {EE(first two data elements), {EE(3rd element), EE(4th element) }},
+///  { {EE(5th and 6th element), {EE(7th element), EE(8th element) }},
+///  { {EE(9th and 10th element), { zeroinitializer }}}
+///
+/// EmbedExpr inside of a semantic initializer list and referencing more than
+/// one element can only appear for arrays of scalars.
+class EmbedExpr final : public Expr {
+  SourceLocation EmbedKeywordLoc;
+  IntegerLiteral *FakeChildNode = nullptr;
+  const ASTContext *Ctx = nullptr;
+  EmbedDataStorage *Data;
+  unsigned Begin = 0;
+  unsigned NumOfElements;
+
+public:
+  EmbedExpr(const ASTContext &Ctx, SourceLocation Loc, EmbedDataStorage *Data,
+            unsigned Begin, unsigned NumOfElements);
+  explicit EmbedExpr(EmptyShell Empty) : Expr(SourceLocExprClass, Empty) {}
+
+  SourceLocation getLocation() const { return EmbedKeywordLoc; }
+  SourceLocation getBeginLoc() const { return EmbedKeywordLoc; }
+  SourceLocation getEndLoc() const { return EmbedKeywordLoc; }
+
+  StringLiteral *getDataStringLiteral() const { return Data->BinaryData; }
+  EmbedDataStorage *getData() const { return Data; }
+
+  unsigned getStartingElementPos() const { return Begin; }
+  size_t getDataElementCount() const { return NumOfElements; }
+
+  // Allows accessing every byte of EmbedExpr data and iterating over it.
+  // An Iterator knows the EmbedExpr that it refers to, and an offset value
+  // within the data.
+  // Dereferencing an Iterator results in construction of IntegerLiteral AST
+  // node filled with byte of data of the corresponding EmbedExpr within offset
+  // that the Iterator currently has.
+  template <bool Const>
+  class ChildElementIter
+      : public llvm::iterator_facade_base<
+            ChildElementIter<Const>, std::random_access_iterator_tag,
+            std::conditional_t<Const, const IntegerLiteral *,
+                               IntegerLiteral *>> {
+    friend class EmbedExpr;
+
+    EmbedExpr *EExpr = nullptr;
+    unsigned long long CurOffset = ULLONG_MAX;
+    using BaseTy = typename ChildElementIter::iterator_facade_base;
+
+    ChildElementIter(EmbedExpr *E) : EExpr(E) {
+      if (E)
+        CurOffset = E->getStartingElementPos();
+    }
+
+  public:
+    ChildElementIter() : CurOffset(ULLONG_MAX) {}
+    typename BaseTy::reference operator*() const {
+      assert(EExpr && CurOffset != ULLONG_MAX &&
+             "trying to dereference an invalid iterator");
+      IntegerLiteral *N = EExpr->FakeChildNode;
+      StringRef DataRef = EExpr->Data->BinaryData->getBytes();
+      N->setValue(*EExpr->Ctx,
+                  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+                              N->getType()->isSignedIntegerType()));
+      // We want to return a reference to the fake child node in the
+      // EmbedExpr, not the local variable N.
+      return const_cast<typename BaseTy::reference>(EExpr->FakeChildNode);
+    }
+    typename BaseTy::pointer operator->() const { return **this; }
+    using BaseTy::operator++;
+    ChildElementIter &operator++() {
+      assert(EExpr && "trying to increment an invalid iterator");
+      assert(CurOffset != ULLONG_MAX &&
+             "Already at the end of what we can iterate over");
+      if (++CurOffset >=
+          EExpr->getDataElementCount() + EExpr->getStartingElementPos()) {
+        CurOffset = ULLONG_MAX;
+        EExpr = nullptr;
+      }
+      return *this;
+    }
+    bool operator==(ChildElementIter Other) const {
+      return (EExpr == Other.EExpr && CurOffset == Other.CurOffset);
+    }
+  }; // class ChildElementIter
+
+public:
+  using fake_child_range = llvm::iterator_range<ChildElementIter<false>>;
+  using const_fake_child_range = llvm::iterator_range<ChildElementIter<true>>;
+
+  fake_child_range underlying_data_elements() {
+    return fake_child_range(ChildElementIter<false>(this),
+                            ChildElementIter<false>());
+  }
+
+  const_fake_child_range underlying_data_elements() const {
+    return const_fake_child_range(
+        ChildElementIter<true>(const_cast<EmbedExpr *>(this)),
+        ChildElementIter<true>());
+  }
+
+  child_range children() {
+    return child_range(child_iterator(), child_iterator());
+  }
+
+  const_child_range children() const {
+    return const_child_range(const_child_iterator(), const_child_iterator());
+  }
+
+  static bool classof(const Stmt *T) {
+    return T->getStmtClass() == EmbedExprClass;
+  }
+
+  ChildElementIter<false> begin() { return ChildElementIter<false>(this); }
+
+  ChildElementIter<true> begin() const {
+    return ChildElementIter<true>(const_cast<EmbedExpr *>(this));
+  }
+
+  template <typename Call, typename... Targs>
+  bool doForEachDataElement(Call &&C, unsigned &StartingIndexInArray,
+                            Targs &&...Fargs) const {
+    for (auto It : underlying_data_elements()) {
+      if (!std::invoke(std::forward<Call>(C), const_cast<IntegerLiteral *>(It),
+                       StartingIndexInArray, std::forward<Targs>(Fargs)...))
+        return false;
+      StartingIndexInArray++;
+    }
+    return true;
+  }
+
+private:
+  friend class ASTStmtReader;
+};
+
 /// Describes an C or C++ initializer list.
 ///
 /// InitListExpr describes an initializer list, which can be used to
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index aa55e2e7e8718..2785afd59bf21 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -2864,6 +2864,11 @@ DEF_TRAVERSE_STMT(ShuffleVectorExpr, {})
 DEF_TRAVERSE_STMT(ConvertVectorExpr, {})
 DEF_TRAVERSE_STMT(StmtExpr, {})
 DEF_TRAVERSE_STMT(SourceLocExpr, {})
+DEF_TRAVERSE_STMT(EmbedExpr, {
+  for (IntegerLiteral *IL : S->underlying_data_elements()) {
+    TRY_TO_TRAVERSE_OR_ENQUEUE_STMT(IL);
+  }
+})
 
 DEF_TRAVERSE_STMT(UnresolvedLookupExpr, {
   TRY_TO(TraverseNestedNameSpecifierLoc(S->getQualifierLoc()));
diff --git a/clang/include/clang/AST/TextNodeDumper.h b/clang/include/clang/AST/TextNodeDumper.h
index abfafcaef271b..39dd1f515c9eb 100644
--- a/clang/include/clang/AST/TextNodeDumper.h
+++ b/clang/include/clang/AST/TextNodeDumper.h
@@ -409,6 +409,7 @@ class TextNodeDumper
   void VisitHLSLBufferDecl(const HLSLBufferDecl *D);
   void VisitOpenACCConstructStmt(const OpenACCConstructStmt *S);
   void VisitOpenACCLoopConstruct(const OpenACCLoopConstruct *S);
+  void VisitEmbedExpr(const EmbedExpr *S);
 };
 
 } // namespace clang
diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td b/clang/include/clang/Basic/DiagnosticCommonKinds.td
index 1e44bc4ad09b6..de758cbe679dc 100644
--- a/clang/include/clang/Basic/DiagnosticCommonKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td
@@ -275,6 +275,9 @@ def err_too_large_for_fixed_point : Error<
 def err_unimplemented_conversion_with_fixed_point_type : Error<
   "conversion between fixed point and %0 is not yet supported">;
 
+def err_requires_positive_value : Error<
+  "%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
+
 // SEH
 def err_seh_expected_handler : Error<
   "expected '__except' or '__finally' block">;
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 25fbfe83fa2bc..12d7b8c0205ee 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -436,6 +436,14 @@ def warn_cxx23_compat_warning_directive : Warning<
 def warn_c23_compat_warning_directive : Warning<
   "#warning is incompatible with C standards before C23">,
   InGroup<CPre23Compat>, DefaultIgnore;
+def ext_pp_embed_directive : ExtWarn<
+  "#embed is a %select{C23|Clang}0 extension">,
+  InGroup<C23>;
+def warn_compat_pp_embed_directive : Warning<
+  "#embed is incompatible with C standards before C23">,
+  InGroup<CPre23Compat>, DefaultIgnore;
+def err_pp_embed_dup_params : Error<
+  "cannot specify parameter '%0' twice in the same '#embed' directive">;
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
@@ -505,6 +513,8 @@ def err_pp_invalid_directive : Error<
   "invalid preprocessing directive%select{|, did you mean '#%1'?}0">;
 def warn_pp_invalid_directive : Warning<
   err_pp_invalid_directive.Summary>, InGroup<DiagGroup<"unknown-directives">>;
+def err_pp_unknown_parameter : Error<
+  "unknown%select{ | embed}0 preprocessor parameter '%1'">;
 def err_pp_directive_required : Error<
   "%0 must be used within a preprocessing directive">;
 def err_pp_file_not_found : Error<"'%0' file not found">, DefaultFatal;
@@ -719,6 +729,8 @@ def err_pp_module_build_missing_end : Error<
   "no matching '#pragma clang module endbuild' for this '#pragma clang module build'">;
 
 def err_defined_macro_name : Error<"'defined' cannot be used as a macro name">;
+def err_defined_in_pp_embed : Error<
+  "'defined' cannot appear within this context">;
 def err_paste_at_start : Error<
   "'##' cannot appear at start of macro expansion">;
 def err_paste_at_end : Error<"'##' cannot appear at end of macro expansion">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 9b8f5b7e80e7e..833e8b51c0257 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -1097,8 +1097,6 @@ def note_surrounding_namespace_starts_here : Note<
   "surrounding namespace with visibility attribute starts here">;
 def err_pragma_loop_invalid_argument_type : Error<
   "invalid argument of type %0; expected an integer type">;
-def err_pragma_loop_invalid_argument_value : Error<
-  "%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
 def err_pragma_loop_compatibility : Error<
   "%select{incompatible|duplicate}0 directives '%1' and '%2'">;
 def err_pragma_loop_precedes_nonloop : Error<
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index e1f33d57a8980..527bbef24793e 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -286,12 +286,15 @@ class FileManager : public RefCountedBase<FileManager> {
   /// MemoryBuffer if successful, otherwise returning null.
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(FileEntryRef Entry, bool isVolatile = false,
-                   bool RequiresNullTerminator = true);
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt);
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(StringRef Filename, bool isVolatile = false,
-                   bool RequiresNullTerminator = true) const {
-    return getBufferForFileImpl(Filename, /*FileSize=*/-1, isVolatile,
-                                RequiresNullTerminator);
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt) const {
+    return getBufferForFileImpl(Filename,
+                                /*FileSize=*/(MaybeLimit ? *MaybeLimit : -1),
+                                isVolatile, RequiresNullTerminator);
   }
 
 private:
diff --git a/clang/include/clang/Basic/StmtNodes.td b/clang/include/clang/Basic/StmtNodes.td
index 6ca08abdb14f0..c59a17be7808f 100644
--- a/clang/include/clang/Basic/StmtNodes.td
+++ b/clang/include/clang/Basic/StmtNodes.td
@@ -204,6 +204,7 @@ def OpaqueValueExpr : StmtNode<Expr>;
 def TypoExpr : StmtNode<Expr>;
 def RecoveryExpr : StmtNode<Expr>;
 def BuiltinBitCastExpr : StmtNode<ExplicitCastExpr>;
+def EmbedExpr : StmtNode<Expr>;
 
 // Microsoft Extensions.
 def MSPropertyRefExpr : StmtNode<Expr>;
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 9c4b17465e18a..37d570ca5e75b 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -126,6 +126,9 @@ PPKEYWORD(error)
 // C99 6.10.6 - Pragma Directive.
 PPKEYWORD(pragma)
 
+// C23 & C++26 #embed
+PPKEYWORD(embed)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -999,6 +1002,9 @@ ANNOTATION(header_unit)
 // Annotation for end of input in clang-repl.
 ANNOTATION(repl_input_end)
 
+// Annotation for #embed
+ANNOTATION(embed)
+
 #undef PRAGMA_ANNOTATION
 #undef ANNOTATION
 #undef TESTING_KEYWORD
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 15f62c5c1a6ab..0c04d272c1ac7 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -880,6 +880,9 @@ will be ignored}]>;
 def L : JoinedOrSeparate<["-"], "L">, Flags<[RenderJoined]>, Group<Link_Group>,
     Visibility<[ClangOption, FlangOption]>,
     MetaVarName<"<dir>">, HelpText<"Add directory to library search path">;
+def embed_dir_EQ : Joined<["--"], "embed-dir=">, Group<Preprocessor_Group>,
+    Visibility<[ClangOption, CC1Option]>, MetaVarName<"<dir>">,
+    HelpText<"Add directory to embed search path">;
 def MD : Flag<["-"], "MD">, Group<M_Group>,
     HelpText<"Write a depfile containing user and system headers">;
 def MMD : Flag<["-"], "MMD">, Group<M_Group>,
@@ -1473,6 +1476,9 @@ def dD : Flag<["-"], "dD">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>
 def dI : Flag<["-"], "dI">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Print include directives in -E mode in addition to normal output">,
   MarshallingInfoFlag<PreprocessorOutputOpts<"ShowIncludeDirectives">>;
+def dE : Flag<["-"], "dE">, Group<d_Group>, Visibility<[CC1Option]>,
+  HelpText<"Print embed directives in -E mode in addition to normal output">,
+  MarshallingInfoFlag<PreprocessorOutputOpts<"ShowEmbedDirectives">>;
 def dM : Flag<["-"], "dM">, Group<d_Group>, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
   HelpText<"Print macro definitions in -E mode instead of normal output">;
 def dead__strip : Flag<["-"], "dead_strip">;
diff --git a/clang/include/clang/Frontend/PreprocessorOutputOptions.h b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
index 6e19cae33cf28..654cf22f010f7 100644
--- a/clang/include/clang/Frontend/PreprocessorOutputOptions.h
+++ b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
@@ -32,6 +32,8 @@ class PreprocessorOutputOptions {
   LLVM_PREFERRED_TYPE(bool)
   unsigned ShowIncludeDirectives : 1;  ///< Print includes, imports etc. within preprocessed output.
   LLVM_PREFERRED_TYPE(bool)
+  unsigned ShowEmbedDirectives : 1; ///< Print embeds, etc. within preprocessed
+  LLVM_PREFERRED_TYPE(bool)
   unsigned RewriteIncludes : 1;    ///< Preprocess include directives only.
   LLVM_PREFERRED_TYPE(bool)
   unsigned RewriteImports  : 1;    ///< Include contents of transitively-imported modules.
@@ -51,6 +53,7 @@ class PreprocessorOutputOptions {
     ShowMacroComments = 0;
     ShowMacros = 0;
     ShowIncludeDirectives = 0;
+    ShowEmbedDirectives = 0;
     RewriteIncludes = 0;
     RewriteImports = 0;
     MinimizeWhitespace = 0;
diff --git a/clang/include/clang/Lex/PPCallbacks.h b/clang/include/clang/Lex/PPCallbacks.h
index dfc74b52686f1..46cc564086f1c 100644
--- a/clang/include/clang/Lex/PPCallbacks.h
+++ b/clang/include/clang/Lex/PPCallbacks.h
@@ -27,6 +27,7 @@ class IdentifierInfo;
 class MacroDefinition;
 class MacroDirective;
 class MacroArgs;
+struct LexEmbedParametersResult;
 
 /// This interface provides a way to observe the actions of the
 /// preprocessor as it does its thing.
@@ -83,6 +84,34 @@ class PPCallbacks {
                            const Token &FilenameTok,
                            SrcMgr::CharacteristicKind FileType) {}
 
+  /// Callback invoked whenever the preprocessor cannot find a file for an
+  /// embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \returns true to indicate that the preprocessor should skip this file
+  /// and not issue any diagnostic.
+  virtual bool EmbedFileNotFound(StringRef FileName) { return false; }
+
+  /// Callback invoked whenever an embed directive has been processed,
+  /// regardless of whether the embed will actually find a file.
+  ///
+  /// \param HashLoc The location of the '#' that starts the embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \param IsAngled Whet...
[truncated]

@Fznamznon
Copy link
Contributor Author

Fznamznon commented Jun 17, 2024

This fixes #68620 (comment) .
There was also #68620 (comment) reported, but I'm not able to access proper logs. The link points to sanitizer buildbots so I suppose it might be the same memory leak.
With this patch and address sanitizer build only Clang-Unit :: Lex/./LexTests/67/120 fails, which also fails on main.

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit to fix the leak LGTM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't commit binary files if it isn't absolutely necessary. You can generate whatever files you need in a RUN line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removed null byte file.

@@ -2422,6 +2422,10 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
Bldr.addNodes(Dst);
break;
}

case Stmt::EmbedExprClass:
llvm_unreachable("Support for EmbedExpr is not implemented.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use llvm_unreachable for things which are actually reachable. At the very least, use report_fatal_error. Prefer a real diagnostic when possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used report_fatal_error instead.

constexpr unsigned char ch =
#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX
;
static_assert(ch == 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More weird cases to consider:

void f(float x, char y, char z);
void g() { f((float)
#embed "three_character_file"
);
}
struct S { S(char x); ~S(); };
void f() { 
  S s[] = {
#embed "file"
  };
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first is interesting, cast to float makes #embed considered as a comma expression, so there is actually not enough arguments. Although, I think this is a correct behavior. Otherwise It is not quite clear to me to which data element the cast should apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because embed produces tokens from the preprocessor, the first example should be the same as:

void f(float x, char y, char z);
void g() { f((float)
1, 2, 3
);
}

which should be accepted (the cast applies to the first argument). You can run into the same with something like:


void f(float x, char y, char z);
void g() {
  f(
#embed "three_character_file" prefix((float))
  );
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one more case, that would be solved by injecting tokens back into the stream. Right now it is quite complex to understand that embed met by ParseCastExpression should be expanded in a special way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one more case, that would be solved by injecting tokens back into the stream. Right now it is quite complex to understand that embed met by ParseCastExpression should be expanded in a special way.

+1. I can work on that when I come back from st Louis if you don't get to it first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you want to special case it in way too many spots, I'd think it would be far easier to optimize just the inner part of the integer sequence, i.e. everything except the first and last sequence element (maybe with the exception when the last prefix token is , or first suffix token is ,
Because one can use arbitrary tokens before and after the #embed, it can be

const unsigned char a[] = {
-400 + 4 * 
#embed __FILE__
- 27 };

(or with tokens from prefix/suffix) and at least the current patchset mishandles many of such cases. For the inner part of the sequence you know there is , before it and , after it, which simplifies a lot of things.
The above is handled correctly by GCC and by clang -save-temps, but not by clang without -save-temps.
And there are tons of other cases like that, e.g. even designated initializer [26] =
before the sequence, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a prototype of injecting tokens that helps. It also removes all the "whack a mole" around template arguments. The only downside it is now yields int instead of unsigned char, but I guess it is fine?

Should I push it to this PR or it makes sense to land this first and make a separate PR? NOTE: I'm on vacation next week, so I will not be available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should land this PR and iterate from there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a prototype of injecting tokens that helps. It also removes all the "whack a mole" around template arguments. The only downside it is now yields int instead of unsigned char, but I guess it is fine?

Nice! Yes, it's fine to yield an int; that's how the feature is defined to behave in C and we need the semantics to be the same in C and C++.

Should I push it to this PR or it makes sense to land this first and make a separate PR? NOTE: I'm on vacation next week, so I will not be available.

IMO, it would be easier for reviewers to land the current changes and then push fixes and improvements separately. This patch is already really hard to review due to size. What do folks think about landing the changes as-is today/tomorrow and then doing follow-up work once @Fznamznon is back from vacation?

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leak fix LGTM, I think it's ready to re-land and try again.

@Fznamznon Fznamznon merged commit 41c6e43 into llvm:main Jun 20, 2024
9 of 10 checks passed
@Fznamznon
Copy link
Contributor Author

Fznamznon commented Jun 20, 2024

@Fznamznon
Copy link
Contributor Author

Fznamznon commented Jun 20, 2024

@tbaederr , I noticed that all buildbot failures relate to the run with the new constant interpreter. I was wondering if you could see if I did something wrong, please? For example, embed by default yields values of type unsigned char. However when expanding in ByteCodeExprGen.cpp , I did not insert any casts. I think this could affect, since I had to insert casts in other places and from the log (as @AaronBallman noticed):

Line 18: in call to 'value(1778384896, 1795162112)'
The first value is 0x6A000000 in hex and the second is 0x6B000000. The ASCII value for j is 0x6A and k is 0x6B

So the values are actually correct

@tbaederr
Copy link
Contributor

Do you have a smaller reproducer? Are all the failing build bots big endian?

@Fznamznon
Copy link
Contributor Author

Yes, all bots are big endian. Reproducer is

clang -cc1 %s -fsyntax-only -verify -fexperimental-new-constant-interpreter
constexpr int value(int a, int b) {
  return a + b;
}
constexpr int init_list_expr() {
  int vals[] = {
#embed "jk.txt"
  };
  return value(vals[0], vals[1]);
}
constexpr int ExpectedValue = 'j' + 'k';
static_assert(init_list_expr() == ExpectedValue);

contents of "jk.txt" is simply "jk".

@Fznamznon
Copy link
Contributor Author

I'm trying to insert a cast using emitCast:

--- a/clang/lib/AST/Interp/ByteCodeExprGen.cpp
+++ b/clang/lib/AST/Interp/ByteCodeExprGen.cpp
@@ -1347,6 +1347,13 @@ bool ByteCodeExprGen<Emitter>::visitInitList(ArrayRef<const Expr *> Inits,
     }

     auto Eval = [&](Expr *Init, unsigned ElemIndex) {
+      auto ArrayTy = Ctx.getASTContext().getAsConstantArrayType(T);
+      std::optional<PrimType> FromT = classify(Init->getType());
+      std::optional<PrimType> ToT = classify(ArrayTy->getElementType());
+      if (FromT != ToT) {
+        if (!this->emitCast(*FromT, *ToT, Init))
+          return false;
+      }
       return visitArrayElemInit(ElemIndex, Init);
     };

But it fails with an assertion

source/llvm-project/clang/lib/AST/Interp/InterpStack.h:46: T clang::interp::InterpStack::pop() [with T = clang::interp::Integral<8, false>]: Asserti
on `ItemTypes.back() == toPrimType<T>()' failed.

@tbaederr
Copy link
Contributor

Here's a quick patch with the cast inserted:

diff --git a/clang/lib/AST/Interp/ByteCodeExprGen.cpp b/clang/lib/AST/Interp/ByteCodeExprGen.cpp
index 731153a6ead9..e7fa1a62c277 100644
--- a/clang/lib/AST/Interp/ByteCodeExprGen.cpp
+++ b/clang/lib/AST/Interp/ByteCodeExprGen.cpp
@@ -1346,13 +1346,30 @@ bool ByteCodeExprGen<Emitter>::visitInitList(ArrayRef<const Expr *> Inits,
       }
     }

-    auto Eval = [&](Expr *Init, unsigned ElemIndex) {
-      return visitArrayElemInit(ElemIndex, Init);
-    };
-
+    E->dump();
     unsigned ElementIndex = 0;
     for (const Expr *Init : Inits) {
-      if (auto *EmbedS = dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
+      if (const auto *EmbedS = dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
+        QualType TargetType = Init->getType();
+        PrimType TargetT = classifyPrim(Init->getType());
+        TargetType->dump();
+
+
+    auto Eval = [&](const Expr *Init, unsigned ElemIndex) {
+      PrimType InitT = classifyPrim(Init->getType());
+      if (!this->visit(Init))
+        return false;
+      if (InitT != TargetT) {
+        if (!this->emitCast(InitT, TargetT, E))
+          return false;
+      }
+    return this->emitInitElem(TargetT, ElemIndex, Init);
+    };
+
+
+
+
+
         if (!EmbedS->doForEachDataElement(Eval, ElementIndex))
           return false;
       } else {

Can you check if that fixes the problem?

@Fznamznon
Copy link
Contributor Author

I can't because I don't have a big endian to verify with. We can try to push speculatively if it doesn't break existing tests.

@Fznamznon
Copy link
Contributor Author

Fznamznon commented Jun 20, 2024

I wonder if that would be ok to disable new embed interpreter tests for now?
And then I'll try to figure fix tomorrow.

@tbaederr
Copy link
Contributor

I just pushed 99f5fcb - I don't have time to run all the tests though, so this is a bit of a long shot. If that doesn't fix it, then disabling them for now sounds fine to me.

@Fznamznon
Copy link
Contributor Author

Thank you!

@Fznamznon
Copy link
Contributor Author

According to the bots that worked!

@AaronBallman
Copy link
Collaborator

Thank you both for collaborating to get that solved!

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…m#95802)

This commit implements the entirety of the now-accepted [N3017
-Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded by
AST consumers (CodeGen or constant expression evaluators) or expand
embed directive as a comma expression.

This reverts commit
llvm@682d461.

---------

Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
@jakubjelinek
Copy link

@ThePhD @AaronBallman @cor3ntin
Joseph Myers raised an interesting question whether the tokens in embed-parameter-sequence are macro expanded or not.
Consider

#define FILE "/etc/passwd"
#define LIMIT limit(1)
#define THIS , 1, 2, 3
#define PRE prefix (42,
ONE
#embed FILE LIMIT suffix(THIS) PRE )
TWO
#embed "/etc/passwd" LIMIT suffix(THIS) PRE )
THREE
#define limit prefix
#embed "/etc/passwd" limit (4) suffix (THIS)

The first #embed is I hope clear in that it is the #embed pp-tokens new-line case where everything gets macro expanded.
The second case is less clear, the filename part matches the " q-char-sequence " case, but LIMIT suffix(THIS) PRE ) is not valid embed-parameter-sequence, so shouldn't it be expanded too?
And the last case by strict reading shouldn't be macro expanded because it is valid embed-parameter-sequence, still
both the the.phd branch and clang trunk and also my GCC patchset handle it as prefix (4) suffix(, 1, 2, 3).
Would in that reading #embed "/etc/passwd" LIMIT suffix(THIS) be valid too and thus not macro expanded and thus later invalid?
And, if embed-parameter-sequence tokens are macro expanded only sometimes, what should happen say with
#define ARG 2) if_empty (1
#embed "file" limit (ARG)
?
On

#define A "/etc/passwd" limit (1) ) + (0
#if __has_embed (A)
int i;
#endif

also all 3 compilers agree and happily use the closing ) from the macro to use the ) from the macro expansion for the closing ) of __has_embed and continue through to the rest of the expression. Or is __has_embed supposed to be always macro expanded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:static analyzer clang Clang issues not falling into any other category clang-tools-extra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants