Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow any file to make use of custom descriptor.proto #97

Merged
merged 8 commits into from
Feb 23, 2023

Conversation

jhump
Copy link
Member

@jhump jhump commented Feb 21, 2023

Before this PR, if an override version of the descriptor protos/options were in use, the file making use of them had to import "google/protobuf/descriptor.proto". But this is counter-intuitive since files typically do not import that file, unless they have explicit references to types therein or unless they define custom options. So this PR allows override descriptors to be used, even for files that have no such import statement.

Note that even with the new behavior, protocompile varies from protoc in how override descriptors are supported. This is intentional as protoc's behavior relies on the fact that it is single-threaded.

Behavior of protoc

With protoc, a file need merely define the relevant type (e.g. google.protobuf.FieldOptions) and be compiled before the file that needs it (meaning it appears before the file that needs it on the command line.

For example, let's say I have two files:

// test1.proto
syntax = "proto3";

message Foo {
    string name = 1 [some_new_option = "bar"];
}
// test2.proto
syntax = "proto3";
package google.protobuf;
message FieldOptions {
    string some_new_option = 100;
}

If I specify them in this order to protoc, it fails!

> protoc test1.proto test2.proto -o /dev/null
test1.proto:5:22: Option "some_new_option" unknown. Ensure that your proto definition file imports the proto which defines the option.

But if I reverse the order it works!

> protoc test2.proto test1.proto -o /dev/null

Because it is single-threaded, it compiles the files in order (though if a file imports another, it will recursively compile the imports first, even if they also appear later on the command-line). So after test2.proto is compiled, it is now aware of the override definition of google.protobuf.FieldOptions and then allows the some_new_option declaration in test1.proto.

Note that if test1.proto explicitly imported test2.proto, then it would work every time.

Behavior of protocompile

The compiler in this repo is inherently parallel. So it cannot behave like protoc, compiling the files sequentially in order to discover where override descriptors/options are defined.

Previously, the only way to use override descriptors/options was to explicitly import the definition. So, like the example above, it would work if test1.proto imported test2.proto. But it would always fail otherwise.

With this branch, it will now work without an explicit import, but only if the override descriptors/options are in a file named "google/protobuf/descriptor.proto". So we renamed "test2.proto" to "google/protobuf/descriptor.proto" in the above example, it will work!

This constraint (to name the file with override definition "google/protobuf/descriptor.proto") should be fine since it is likely that 100% of real world cases where descriptors are overridden, they are in a file named thusly -- usually a fork of the descriptor.proto, accompanied by a fork of a runtime and/or a code gen plugin.

@jhump jhump force-pushed the jh/use-descriptor-if-possible branch from 5baf893 to debf947 Compare February 21, 2023 17:17
@@ -316,6 +320,18 @@ func (e errFailedToResolve) Unwrap() error {
return e.err
}

func (e *executor) hasOverrideDescriptorProto() bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this as a way to cheaply detect if descriptors are overridden. That way, if the file provided is not overridden, we can avoid the cost of wrapping the descriptor in a linker.File (which is not terribly expensive, but also not free since it must create an index of all symbols in the file).

linker/files.go Outdated Show resolved Hide resolved
@jhump jhump requested a review from pkwarren February 21, 2023 17:32
@jhump
Copy link
Member Author

jhump commented Feb 21, 2023

@pkwarren, this is needed so that buf can provide an override version of descriptor.proto (such as the new v22.0 version, with new field option debug_redact) and it will get used even for files that do not import descriptor.proto.

cc @bufdev

Copy link
Member

@pkwarren pkwarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good - only had a couple of questions. Thanks for providing all the context to make it easier to review.

linker/resolve.go Show resolved Hide resolved
compiler.go Outdated Show resolved Hide resolved
compiler.go Outdated Show resolved Hide resolved
compiler.go Outdated Show resolved Hide resolved
compiler.go Outdated Show resolved Hide resolved
select {
case <-descriptorProtoRes.ready:
// descriptor.proto wasn't explicitly imported, so we can ignore a failure
if descriptorProtoRes.err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I walked through this in the debugger and saw one non-nil error reported in the linker tests - search result for "google/protobuf/descriptor.proto" returned descriptor for "foo.proto". Is that expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. What test? If you look at the resolver it's using, it may be hard-coded to return a particular descriptor, regardless of the requested file name.

In any event, since this is not an explicit dependency, I don't think we should fail here if the resolver can't supply something. If the explicitly import this file, then it will fail as expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can gather more details. I agree with the behavior just wanted to make sure there wasn't something we were missing.

@jhump jhump enabled auto-merge (squash) February 23, 2023 03:24
@jhump jhump merged commit 7c5114e into main Feb 23, 2023
@jhump jhump deleted the jh/use-descriptor-if-possible branch February 23, 2023 03:24
jhump added a commit that referenced this pull request Mar 7, 2023
…roto` (#109)

An issue was introduced in #97 -- if a custom `descriptor.proto` is
used, then custom options may come across in the final compilation
result as unrecognized fields. The first commit in this PR is a repro case.

This happens when the options in question come from _public_
imports and when an _override_ set of options are used. We have
to use a dynamic message to represent the override definition of
the options. But the result ultimately needs to be the generated
type (not a dynamic message). So we end serializing the dynamic
message to bytes and then de-serializing that into the generated
type. During de-serialization, we were supplying an extension
resolver that was failing to find the custom option, because it
was visible via a public import.

To fix the issue, I found several code paths that could be unified
and consolidated, which ultimately resulted in removing a few
no-longer-used methods from interfaces defined in the linker
sub-package. This also changes the way we handle an override
descriptor.proto by making it an interpreter option instead of
providing it during linking (since it is really only used by the
option interpreter).

These API/interface changes are not technically backwards
compatible. But this repo is still pre-v1.0, and these changes
are in interfaces that we don't actually expect any users to
implement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants