diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 0e1c72e72..ebab56ad2 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -2,23 +2,27 @@ -> `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing -> refactoring, so some of the links in this chapter may be broken. +> N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all +> undergoing refactoring, so some of the links in this chapter may be broken. -Rust has a very powerful macro system. In the previous chapter, we saw how the -parser sets aside macros to be expanded (it temporarily uses [placeholders]). +Rust has a very powerful macro system. In the previous chapter, we saw how +the parser sets aside macros to be expanded (using temporary [placeholders]). This chapter is about the process of expanding those macros iteratively until -we have a complete AST for our crate with no unexpanded macros (or a compile -error). +we have a complete [*Abstract Syntax Tree* (AST)][ast] for our crate with no +unexpanded macros (or a compile error). +[ast]: ./ast-validation.md +[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html +[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html +[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html [placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html -First, we will discuss the algorithm that expands and integrates macro output -into ASTs. Next, we will take a look at how hygiene data is collected. Finally, -we will look at the specifics of expanding different types of macros. +First, we discuss the algorithm that expands and integrates macro output into +ASTs. Next, we take a look at how hygiene data is collected. Finally, we look +at the specifics of expanding different types of macros. Many of the algorithms and data structures described below are in [`rustc_expand`], -with basic data structures in [`rustc_expand::base`][base]. +with fundamental data structures in [`rustc_expand::base`][base]. Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are handled in [`rustc_expand::config`][cfg]. @@ -29,7 +33,7 @@ handled in [`rustc_expand::config`][cfg]. ## Expansion and AST Integration -First of all, expansion happens at the crate level. Given a raw source code for +Firstly, expansion happens at the crate level. Given a raw source code for a crate, the compiler will produce a massive AST with all macros expanded, all modules inlined, etc. The primary entry point for this process is the [`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we @@ -40,7 +44,7 @@ below for more detailed discussion of edge case expansion issues). [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a -queue of unresolved macro invocations (that is, macros we haven't found the +queue of unresolved macro invocations (i.e. macros we haven't found the definition of yet). We repeatedly try to pick a macro from the queue, resolve it, expand it, and integrate it back. If we can't make progress in an iteration, this represents a compile error. Here is the [algorithm][original]: @@ -53,26 +57,27 @@ iteration, this represents a compile error. Here is the [algorithm][original]: 1. [Resolve](./name-resolution.md) imports in our partially built crate as much as possible. 2. Collect as many macro [`Invocation`s][inv] as possible from our - partially built crate (fn-like, attributes, derives) and add them to the + partially built crate (`fn`-like, attributes, derives) and add them to the queue. - 3. Dequeue the first element, and attempt to resolve it. + 3. Dequeue the first element and attempt to resolve it. 4. If it's resolved: 1. Run the macro's expander function that consumes a [`TokenStream`] or AST and produces a [`TokenStream`] or [`AstFragment`] (depending on - the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt], + the macro kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt], each of which are a token (punctuation, identifier, or literal) or a delimited group (anything inside `()`/`[]`/`{}`)). - At this point, we know everything about the macro itself and can - call `set_expn_data` to fill in its properties in the global data; - that is the hygiene data associated with `ExpnId`. (See [the - "Hygiene" section below][hybelow]). - 2. Integrate that piece of AST into the big existing partially built - AST. This is essentially where the "token-like mass" becomes a - proper set-in-stone AST with side-tables. It happens as follows: + call [`set_expn_data`] to fill in its properties in the global + data; that is the [hygiene] data associated with [`ExpnId`] (see + [Hygiene][hybelow] below). + 2. Integrate that piece of AST into the currently-existing though + partially-built AST. This is essentially where the "token-like mass" + becomes a proper set-in-stone AST with side-tables. It happens as + follows: - If the macro produces tokens (e.g. a proc macro), we parse into an AST, which may produce parse errors. - - During expansion, we create `SyntaxContext`s (hierarchy 2). (See - [the "Hygiene" section below][hybelow]) + - During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see + [Hygiene][hybelow] below). - These three passes happen one after another on every AST fragment freshly expanded from a macro: - [`NodeId`]s are assigned by [`InvocationCollector`]. This @@ -85,30 +90,33 @@ iteration, this represents a compile error. Here is the [algorithm][original]: 3. After expanding a single macro and integrating its output, continue to the next iteration of [`fully_expand_fragment`][fef]. 5. If it's not resolved: - 1. Put the macro back in the queue + 1. Put the macro back in the queue. 2. Continue to next iteration... -[defpath]: hir.md#identifiers-in-the-hir -[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html -[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html -[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html -[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html +[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html [`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html -[hybelow]: #hygiene-and-hierarchies -[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html +[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html +[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html +[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html +[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html +[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html +[`set_expn_data`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.LocalExpnId.html#method.set_expn_data +[`SyntaxContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html [`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html +[defpath]: hir.md#identifiers-in-the-hir +[hybelow]: #hygiene-and-hierarchies +[hygiene]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html [inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html -[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html +[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html ### Error Recovery -If we make no progress in an iteration, then we have reached a compilation -error (e.g. an undefined macro). We attempt to recover from failures -(unresolved macros or imports) for the sake of diagnostics. This allows -compilation to continue past the first error, so that we can report more errors -at a time. Recovery can't cause compilation to succeed. We know that it will -fail at this point. The recovery happens by expanding unresolved macros into -[`ExprKind::Err`][err]. +If we make no progress in an iteration we have reached a compilation error +(e.g. an undefined macro). We attempt to recover from failures (i.e. +unresolved macros or imports) with the intent of generating diagnostics. +Failure recovery happens by expanding unresolved macros into +[`ExprKind::Err`][err] and allows compilation to continue past the first error +so that `rustc` can report more errors than just the original failure. [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err @@ -117,20 +125,20 @@ fail at this point. The recovery happens by expanding unresolved macros into Notice that name resolution is involved here: we need to resolve imports and macro names in the above algorithm. This is done in [`rustc_resolve::macros`][mresolve], which resolves macro paths, validates -those resolutions, and reports various errors (e.g. "not found" or "found, but -it's unstable" or "expected x, found y"). However, we don't try to resolve -other names yet. This happens later, as we will see in the [next -chapter](./name-resolution.md). +those resolutions, and reports various errors (e.g. "not found", "found, but +it's unstable", "expected x, found y"). However, we don't try to resolve +other names yet. This happens later, as we will see in the chapter: [Name +Resolution](./name-resolution.md). [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html ### Eager Expansion -_Eager expansion_ means that we expand the arguments of a macro invocation -before the macro invocation itself. This is implemented only for a few special +_Eager expansion_ means we expand the arguments of a macro invocation before +the macro invocation itself. This is implemented only for a few special built-in macros that expect literals; expanding arguments first for some of -these macro results in a smoother user experience. As an example, consider the -following: +these macro results in a smoother user experience. As an example, consider +the following: ```rust,ignore macro bar($i: ident) { $i } @@ -139,35 +147,37 @@ macro foo($i: ident) { $i } foo!(bar!(baz)); ``` -A lazy expansion would expand `foo!` first. An eager expansion would expand +A lazy-expansion would expand `foo!` first. An eager-expansion would expand `bar!` first. -Eager expansion is not a generally available feature of Rust. Implementing -eager expansion more generally would be challenging, but we implement it for a -few special built-in macros for the sake of user experience. The built-in -macros are implemented in [`rustc_builtin_macros`], along with some other early -code generation facilities like injection of standard library imports or +Eager-expansion is not a generally available feature of Rust. Implementing +eager-expansion more generally would be challenging, so we implement it for a +few special built-in macros for the sake of user-experience. The built-in +macros are implemented in [`rustc_builtin_macros`], along with some other +early code generation facilities like injection of standard library imports or generation of test harness. There are some additional helpers for building -their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally -performs a subset of the things that lazy (normal) expansion does. It is done by -invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to -the whole crate, like we normally do). +AST fragments in [`rustc_expand::build`][reb]. Eager-expansion generally +performs a subset of the things that lazy (normal) expansion does. It is done +by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed +to the whole crate, like we normally do). ### Other Data Structures -Here are some other notable data structures involved in expansion and integration: -- [`ResolverExpand`] - a trait used to break crate dependencies. This allows the +Here are some other notable data structures involved in expansion and +integration: +- [`ResolverExpand`] - a `trait` used to break crate dependencies. This allows the resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and pretty much everything else depending on [`rustc_ast`]. -- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion - infrastructure in the process of its work -- [`Annotatable`] - a piece of AST that can be an attribute target, almost same - thing as AstFragment except for types and patterns that can be produced by - macros but cannot be annotated with attributes -- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a - different `AstFragment` depending on its [`AstFragmentKind`] - item, - or expression, or pattern etc. +- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion + infrastructure data. +- [`Annotatable`] - a piece of AST that can be an attribute target, almost the same + thing as [`AstFragment`] except for types and patterns that can be produced by + macros but cannot be annotated with attributes. +- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into + a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item, + expression, pattern, etc). +[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html [`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html [`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html [`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html @@ -179,7 +189,7 @@ Here are some other notable data structures involved in expansion and integratio ## Hygiene and Hierarchies -If you have ever used C/C++ preprocessor macros, you know that there are some +If you have ever used the C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code: ```c @@ -252,10 +262,10 @@ crate. All of these hierarchies need some sort of "macro ID" to identify individual elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive an integer ID, assigned continuously starting from 0 as we discover new macro -calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own +calls. All hierarchies start at [`ExpnId::root`][rootid], which is its own parent. -[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms +The [`rustc_span::hygiene`][hy] crate contains all of the hygiene-related algorithms (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) and structures related to hygiene and expansion that are kept in global data. @@ -279,12 +289,12 @@ invocation is in the output of another macro. Here, the children in the hierarchy will be the "innermost" tokens. The [`ExpnData`] struct itself contains a subset of properties from both macro definition and macro call available through global data. -[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy. +[`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy. [`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent -For example, +For example: ```rust,ignore macro_rules! foo { () => { println!(); } } @@ -303,14 +313,14 @@ one is a bit tricky and more complex than the other two hierarchies. [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID. [`SyntaxContextData`][scd] contains data associated with the given -`SyntaxContext`; mostly it is a cache for results of filtering that chain in -different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent +[`SyntaxContext`][sc]; mostly it is a cache for results of filtering that chain in +different ways. [`SyntaxContextData::parent`][scdp] is the child-to-parent link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual -elements in the chain. The "chaining operator" is +elements in the chain. The "chaining-operator" is [`SyntaxContext::apply_mark`][am] in compiler code. A [`Span`][span], mentioned above, is actually just a compact representation of -a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned +a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an interned [`Symbol`] + `Span` (i.e. an interned string + hygiene data). [`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html @@ -321,9 +331,11 @@ a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned [am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark For built-in macros, we use the context: -`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to -be defined at the hierarchy root. We do the same for proc-macros because we -haven't implemented cross-crate hygiene yet. +[`SyntaxContext::empty().apply_mark(expn_id)`], and such macros are +considered to be defined at the hierarchy root. We do the same for `proc +macro`s because we haven't implemented cross-crate hygiene yet. + +[`SyntaxContext::empty().apply_mark(expn_id)`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark If the token had context `X` before being produced by a macro then after being produced by the macro it has context `X -> macro_id`. Here are some examples: @@ -336,12 +348,11 @@ macro m() { ident } m!(); ``` -Here `ident` originally has context [`SyntaxContext::root()`][scr]. `ident` has +Here `ident` which initially has context [`SyntaxContext::root`][scr] has context `ROOT -> id(m)` after it's produced by `m`. [scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root - Example 1: ```rust,ignore @@ -350,13 +361,14 @@ macro m() { macro n() { ident } } m!(); n!(); ``` -In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)` + +In this example the `ident` has context `ROOT` initially, then `ROOT -> id(m)` after the first expansion, then `ROOT -> id(m) -> id(n)`. Example 2: Note that these chains are not entirely determined by their last element, in -other words `ExpnId` is not isomorphic to `SyntaxContext`. +other words [`ExpnId`] is not isomorphic to [`SyntaxContext`][sc]. ```rust,ignore macro m($i: ident) { macro n() { ($i, bar) } } @@ -367,19 +379,21 @@ m!(foo); After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context `ROOT -> id(m) -> id(n)`. -Finally, one last thing to mention is that currently, this hierarchy is subject -to the ["context transplantation hack"][hack]. Basically, the more modern (and -experimental) `macro` macros have stronger hygiene than the older MBE system, -but this can result in weird interactions between the two. The hack is intended -to make things "just work" for now. +Currently this hierarchy for tracking macro definitions is subject to the +so-called ["context transplantation hack"][hack]. Modern (i.e. experimental) +macros have stronger hygiene than the legacy "Macros By Example" (MBE) +system which can result in weird interactions between the two. The hack is +intended to make things "just work" for now. +[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732 ### The Call-site Hierarchy The third and final hierarchy tracks the location of macro invocations. -In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link. +In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent` +link. [callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site @@ -409,22 +423,24 @@ Above, we saw how the output of a macro is integrated into the AST for a crate, and we also saw how the hygiene data for a crate is generated. But how do we actually produce the output of a macro? It depends on the type of macro. -There are two types of macros in Rust: -`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros -(or "proc macros"; including custom derives). During the parsing phase, the normal -Rust parser will set aside the contents of macros and their invocations. Later, -macros are expanded using these portions of the code. +There are two types of macros in Rust: + 1. `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)), and, + 2. procedural macros (proc macros); including custom derives. + +During the parsing phase, the normal Rust parser will set aside the contents of +macros and their invocations. Later, macros are expanded using these +portions of the code. Some important data structures/interfaces here: - [`SyntaxExtension`] - a lowered macro representation, contains its expander - function, which transforms a `TokenStream` or AST into another `TokenStream` - or AST + some additional data like stability, or a list of unstable features - allowed inside the macro. + function, which transforms a [`TokenStream`] or AST into another + [`TokenStream`] or AST + some additional data like stability, or a list of + unstable features allowed inside the macro. - [`SyntaxExtensionKind`] - expander functions may have several different signatures (take one token stream, or two, or a piece of AST, etc). This is - an enum that lists them. + an `enum` that lists them. - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] - - traits representing the expander function signatures. + `trait`s representing the expander function signatures. [`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html [`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html @@ -435,18 +451,15 @@ Some important data structures/interfaces here: ## Macros By Example -MBEs have their own parser distinct from the normal Rust parser. When macros -are expanded, we may invoke the MBE parser to parse and expand a macro. The -MBE parser, in turn, may call the normal Rust parser when it needs to bind a -metavariable (e.g. `$my_expr`) while parsing the contents of a macro +MBEs have their own parser distinct from the Rust parser. When macros are +expanded, we may invoke the MBE parser to parse and expand a macro. The +MBE parser, in turn, may call the Rust parser when it needs to bind a +metavariable (e.g. `$my_expr`) while parsing the contents of a macro invocation. The code for macro expansion is in [`compiler/rustc_expand/src/mbe/`][code_dir]. ### Example -It's helpful to have an example to refer to. For the remainder of this chapter, -whenever we refer to the "example _definition_", we mean the following: - ```rust,ignore macro_rules! printer { (print $mvar:ident) => { @@ -459,41 +472,41 @@ macro_rules! printer { } ``` -`$mvar` is called a _metavariable_. Unlike normal variables, rather than -binding to a value in a computation, a metavariable binds _at compile time_ to -a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an +Here `$mvar` is called a _metavariable_. Unlike normal variables, rather than +binding to a value _at runtime_, a metavariable binds _at compile time_ to a +tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other -special tokens, such as `EOF`, which indicates that there are no more tokens. -Token trees resulting from paired parentheses-like characters (`(`...`)`, -`[`...`]`, and `{`...`}`) – they include the open and close and all the tokens -in between (we do require that parentheses-like characters be balanced). Having -macro expansion operate on token streams rather than the raw bytes of a source -file abstracts away a lot of complexity. The macro expander (and much of the -rest of the compiler) doesn't really care that much about the exact line and -column of some syntactic construct in the code; it cares about what constructs -are used in the code. Using tokens allows us to care about _what_ without -worrying about _where_. For more information about tokens, see the -[Parsing][parsing] chapter of this book. - -Whenever we refer to the "example _invocation_", we mean the following snippet: +special tokens, such as `EOF`, which its self indicates that there are no more +tokens. There are token trees resulting from the paired parentheses-like +characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and +close and all the tokens in between (Rust requires that parentheses-like +characters be balanced). Having macro expansion operate on token streams +rather than the raw bytes of a source-file abstracts away a lot of complexity. +The macro expander (and much of the rest of the compiler) doesn't consider +the exact line and column of some syntactic construct in the code; it considers +which constructs are used in the code. Using tokens allows us to care about +_what_ without worrying about _where_. For more information about tokens, see +the [Parsing][parsing] chapter of this book. ```rust,ignore -printer!(print foo); // Assume `foo` is a variable defined somewhere else... +printer!(print foo); // `foo` is a variable ``` The process of expanding the macro invocation into the syntax tree -`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is -called _macro expansion_, and it is the topic of this chapter. +`println!("{}", foo)` and then expanding the syntax tree into a call to +`Display::fmt` is one common example of _macro expansion_. ### The MBE parser -There are two parts to MBE expansion: parsing the definition and parsing the -invocations. Interestingly, both are done by the macro parser. +There are two parts to MBE expansion done by the macro parser: + 1. parsing the definition, and, + 2. parsing the invocations. -Basically, the MBE parser is like an NFA-based regex parser. It uses an -algorithm similar in spirit to the [Earley parsing -algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is -defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. +We think of the MBE parser as a nondeterministic finite automaton (NFA) based +regex parser since it uses an algorithm similar in spirit to the [Earley +parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro +parser is defined in +[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. The interface of the macro parser is as follows (this is slightly simplified): @@ -507,58 +520,61 @@ fn parse_tt( We use these items in macro parser: -- `parser` is a reference to the state of a normal Rust parser, including the - token stream and parsing session. The token stream is what we are about to - ask the MBE parser to parse. We will consume the raw stream of tokens and - output a binding of metavariables to corresponding token trees. The parsing - session can be used to report parser errors. -- `matcher` is a sequence of `MatcherLoc`s that we want to match +- a `parser` variable is a reference to the state of a normal Rust parser, + including the token stream and parsing session. The token stream is what we + are about to ask the MBE parser to parse. We will consume the raw stream of + tokens and output a binding of metavariables to corresponding token trees. + The parsing session can be used to report parser errors. +- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match the token stream against. They're converted from token trees before matching. -In the analogy of a regex parser, the token stream is the input and we are matching it -against the pattern `matcher`. Using our examples, the token stream could be the stream of -tokens containing the inside of the example invocation `print foo`, while `matcher` -might be the sequence of token (trees) `print $mvar:ident`. +[`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html + +In the analogy of a regex parser, the token stream is the input and we are +matching it against the pattern defined by matcher. Using our examples, the +token stream could be the stream of tokens containing the inside of the example +invocation `print foo`, while matcher might be the sequence of token (trees) +`print $mvar:ident`. The output of the parser is a [`ParseResult`], which indicates which of three cases has occurred: -- Success: the token stream matches the given `matcher`, and we have produced a binding - from metavariables to the corresponding token trees. -- Failure: the token stream does not match `matcher`. This results in an error message such as - "No rule expected token _blah_". -- Error: some fatal error has occurred _in the parser_. For example, this - happens if there is more than one pattern match, since that indicates - the macro is ambiguous. +- **Success**: the token stream matches the given matcher and we have produced a + binding from metavariables to the corresponding token trees. +- **Failure**: the token stream does not match matcher and results in an error + message such as "No rule expected token ...". +- **Error**: some fatal error has occurred _in the parser_. For example, this + happens if there is more than one pattern match, since that indicates the + macro is ambiguous. The full interface is defined [here][code_parse_int]. -The macro parser does pretty much exactly the same as a normal regex parser with -one exception: in order to parse different types of metavariables, such as -`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the -normal Rust parser. - -As mentioned above, both definitions and invocations of macros are parsed using -the macro parser. This is extremely non-intuitive and self-referential. The code -to parse macro _definitions_ is in -[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for -matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, -a `macro_rules` definition should have in its body at least one occurrence of a -token tree followed by `=>` followed by another token tree. When the compiler -comes to a `macro_rules` definition, it uses this pattern to match the two token -trees per rule in the definition of the macro _using the macro parser itself_. -In our example definition, the metavariable `$lhs` would match the patterns of -both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` -would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ -println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this -knowledge around for when it needs to expand a macro invocation. +The macro parser does pretty much exactly the same as a normal regex parser +with one exception: in order to parse different types of metavariables, such as +`ident`, `block`, `expr`, etc., the macro parser must call back to the normal +Rust parser. Both the definition and invocation of macros are parsed using +the parser in a process which is non-intuitively self-referential. + +The code to parse macro _definitions_ is in +[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the +pattern for matching a macro definition as `$( $lhs:tt => $rhs:tt );+`. In +other words, a `macro_rules` definition should have in its body at least one +occurrence of a token tree followed by `=>` followed by another token tree. +When the compiler comes to a `macro_rules` definition, it uses this pattern to +match the two token trees per the rules of the definition of the macro, _thereby +utilizing the macro parser itself_. In our example definition, the +metavariable `$lhs` would match the patterns of both arms: `(print +$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the +bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar); +println!("{}", $mvar); }`. The parser keeps this knowledge around for when it +needs to expand a macro invocation. When the compiler comes to a macro invocation, it parses that invocation using -the same NFA-based macro parser that is described above. However, the matcher +a NFA-based macro parser described above. However, the matcher variable used is the first token tree (`$lhs`) extracted from the arms of the macro _definition_. Using our example, we would try to match the token stream `print foo` from the invocation against the matchers `print $mvar:ident` and `print -twice $mvar:ident` that we previously extracted from the definition. The +twice $mvar:ident` that we previously extracted from the definition. The algorithm is exactly the same, but when the macro parser comes to a place in the current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), it calls back to the normal Rust parser to get the contents of that @@ -572,32 +588,21 @@ error. For more information about the macro parser's implementation, see the comments in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. -### `macro`s and Macros 2.0 - -There is an old and mostly undocumented effort to improve the MBE system, give -it more hygiene-related features, better scoping and visibility rules, etc. There -hasn't been a lot of work on this recently, unfortunately. Internally, `macro` -macros use the same machinery as today's MBEs; they just have additional -syntactic sugar and are allowed to be in namespaces. - ## Procedural Macros -Procedural macros are also expanded during parsing, as mentioned above. -However, they use a rather different mechanism. Rather than having a parser in -the compiler, procedural macros are implemented as custom, third-party crates. -The compiler will compile the proc macro crate and specially annotated -functions in them (i.e. the proc macro itself), passing them a stream of tokens. - -The proc macro can then transform the token stream and output a new token -stream, which is synthesized into the AST. - -It's worth noting that the token stream type used by proc macros is _stable_, -so `rustc` does not use it internally (since our internal data structures are -unstable). The compiler's token stream is -[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is -converted into the stable [`proc_macro::TokenStream`][stablets] and back in +Procedural macros are also expanded during parsing. However, rather than +having a parser in the compiler, proc macros are implemented as custom, +third-party crates. The compiler will compile the proc macro crate and +specially annotated functions in them (i.e. the proc macro itself), passing +them a stream of tokens. A proc macro can then transform the token stream and +output a new token stream, which is synthesized into the AST. + +The token stream type used by proc macros is _stable_, so `rustc` does not +use it internally. The compiler's (unstable) token stream is defined in +[`rustc_ast::tokenstream::TokenStream`][rustcts]. This is converted into the +stable [`proc_macro::TokenStream`][stablets] and back in [`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms]. -Because the Rust ABI is unstable, we use the C ABI for this conversion. +Since the Rust ABI is currently unstable, we use the C ABI for this conversion. [tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html [rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html @@ -606,10 +611,17 @@ Because the Rust ABI is unstable, we use the C ABI for this conversion. [pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html [`ParseResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html -TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) + ### Custom Derive Custom derives are a special type of proc macro. -TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) +### Macros By Example and Macros 2.0 + +There is an legacy and mostly undocumented effort to improve the MBE system +by giving it more hygiene-related features, better scoping and visibility +rules, etc. Internally this uses the same machinery as today's MBEs with some +additional syntactic sugar and are allowed to be in namespaces. + +