Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associate cli arguments with executables and refactor llvm/gcc c/c++ toolchain selection #6217

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Jul 23, 2018

Problem

#5951 explains the problem addressed by moving CLI arguments to individual Executable objects -- this reduces greatly the difficulty in generating appropriate command lines for the executables invoked. In this PR, it can be seen to remove a significant amount of repeated boilerplate.

Additionally, we weren't distinguishing between a Linker to link the compiled object files of gcc or g++ vs clang or clang++. We were attempting to generate a linker object which would work with any of gcc, g++, clang, or clang++, and this wasn't really feasible. Along with the above, this made it extremely difficult and error-prone to generate correct command lines / environments for executing the linker, which led to e.g. not being able to find crti.o (as one symptom addressed by this problem).

Solution

  • Introduce CToolchain and CppToolchain in environment.py, which can be generated from LLVMCToolchain, LLVMCppToolchain, GCCCToolchain, or GCCCppToolchain. These toolchain datatypes are created in native_toolchain.py, where a single @rule for each ensures that no C or C++ compiler that someone can request was made without an accompanying linker, which will be configured to work with the compiler.
  • Introduce the extra_args property to the Executable mixin in environment.py, which Executable subclasses can just declare a datatype field named extra_args in order to override. This is used in native_toolchain.py to ensure platform-specific arguments and environment variables are set in the same @rule which produces a paired compiler and linker -- there is a single place to look at to see where all the process invocation environment variables and command-line arguments are set for a given toolchain.
  • Introduce the ArchiveFileMapper subsystem and use it to declare sets of directories to resolve within our BinaryTool archives GCC and LLVM. This subsystem allows globbing (and checks that there is a unique expansion), which makes it robust to e.g. platform-specific paths to things like include or lib directories.

Result

Removes several FIXMEs, including heavily-commented parts of test_native_toolchain.py. Partially addresses #5951 -- setup_py.py still generates its own execution environment from scratch, and this could be made more hygienic in the future. As noted in #6179 and #6205, this PR seems to immediately fix the CI failures in those PRs.

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is alot to grok at once, so a bit of a superficial skim.

def c_compiler(self):
exe_filename = 'gcc'
path_entries = self.path_entries()
_PLATFORM_INTERMEDIATE_DIRNAME = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these solid for all 64 linux and osx? They look a bit specific. If they are just enough to get CI tests working that's a step forward, but deserve a TODO with issue pointer. On the other hand, if they are solid - they don't look so and desrve a comment pointing off to GCC docs that show they are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is actually only a path into the specific gcc archive we provide, which is why I was thinking it was ok to lean on. However, the ParseSearchDirs approach actually may be able to get this -- doing it manually results in:

> PATH="${HOME}/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/bin" g++ -print-search-dirs | grep -P '^libraries' | sed -re 's#^libraries: =##g' | tr ':' '\n' | sort | uniq | parallel -n1 readlink -f
/Users/dmcclanahan/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/lib/gcc/x86_64-apple-darwin17.5.0/7.3.0
/Users/dmcclanahan/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/lib
/Users/dmcclanahan/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/lib/gcc

This looks like it can be manipulated to find the include dirs we want instead of generating those paths ourselves, which would allow us to revert many of the changes to this file.

Copy link
Contributor

@jsirois jsirois Jul 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that would be excellent.

But this clarification also points to this issue pantsbuild/binaries#78; ie: x86_64-apple-darwin17.5.0 is a bit specific, which would be OK iff it corresponded to 10.11.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this entire class refers to the specifics of our GCC build is worth pointing out in a class level doc most likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 100% agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, 17.5.0 is from the LLVM binary release for OSX -- this is the package we provide for all OSX (because it's the package LLVM provides for all OSX), and is not an artifact of any build process we perform ourselves. I am making the ParseSearchDirs approach work regardless, just wanted to clear up why I thought that path was ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having difficulty getting the ParseSearchDirs solution to work, so e67044c scraps that entirely and uses a helper method _get_check_single_path_by_glob() which is used to locate directories within our provided gcc distribution.

I was annoyed that we couldn't just invoke the binary -- although there may be a better way in the future, adding /lib and /include to directory paths (as I mentioned could be doable above) ended up getting way too long and needlessly complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process of using globs was split off into another pants.backend.native.utils subsystem named ArchiveFileMapper, and docstrings describing that GCC and LLVM rely on the current specific layout of the archive were added as well, in 3051235.


@abstractproperty
def as_c_toolchain(self):
"""???"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please doc or pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be a pass, will make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 3441e74!


@abstractproperty
def as_cpp_toolchain(self):
"""???"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto doc or pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 3441e74!

@@ -92,12 +92,11 @@ class SetupPyNativeTools(datatype([
"""


@rule(SetupPyNativeTools, [Select(CCompiler), Select(CppCompiler), Select(Linker), Select(Platform)])
def get_setup_py_native_tools(c_compiler, cpp_compiler, linker, platform):
@rule(SetupPyNativeTools, [Select(LLVMCToolchain), Select(LLVMCppToolchain), Select(Platform)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a step back - SetupPyNativeTools which only cares about a CppToolchain and a CToolchain is not LLVM specific and cannot be produced for GCC. Am I missing something or is a TODO/comment missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally right! In 3441e74 I moved all the rules and datatypes into python_native_code.py. The rules select_c_toolchain_for_local_dist_compilation and select_cpp_toolchain_for_local_dist_compilation mean that LLVM is used as the default toolchain, but SetupPyNativeTools only requires an injected CToolchain and CppToolchain.

class LLVMCToolchain(datatype([
('llvm_c_compiler', LLVMCCompiler),
('llvm_c_linker', LLVMCLinker),
]), CToolchainProvider):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding why class LLVMCCompiler(datatype([('c_compiler', CCompiler)])): pass vs just class LLVMCCompiler(CCompiler): pass, etc. It seems to me this would eliminate the need for both CToolchainProvider and CppToolchainProvider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're completely right. 3441e74 does exactly this. That removed ~70 lines from this diff.

yield GCCCCompiler(gcc.c_compiler())
@rule(GCCCCompiler, [Select(GCC), Select(Platform)])
def get_gcc(gcc, platform):
yield GCCCCompiler(gcc.c_compiler(platform))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a yield here and below instead of a return? Afaict that just causes a little extra work looping through generator send gymnastics for what will be a blocking call either way:

deps
.then(move |deps_result| match deps_result {
Ok(deps) => externs::call(&externs::val_for(&func.0), &deps),
Err(failure) => Err(failure),
})
.then(move |task_result| match task_result {
Ok(val) => {
if externs::satisfied_by(&context.core.types.generator, &val) {
Self::generate(context, entry, val)
} else {
ok(val)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had slipped my mind that return was allowed for an @rule -- this is using return now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It absolutely makes sense, but I wasn't aware before that yield was interpreted to mean this (hadn't thought about it in a while) -- this makes the execution process of @rules more clear to me.

library_dirs=[])
library_dirs=[],
linking_library_dirs=[],
extra_args=['-mmacosx-version-min=10.11'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 10.11 seems arbitrarty / tied to Pants current min version support. Does this deserve to be lifted out to a constant used to generate all such flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does! And it was, in MIN_OSX_VERSION_ARG from b2a750b.

return NativeToolchain.scoped_instance(self)

def get_compile_settings(self):
return CppCompileSettings.scoped_instance(self)

@memoized_property
def _cpp_toolchain(self):
llvm_cpp_toolchain = self._request_single(LLVMCppToolchain, self._native_toolchain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another instance of regressing? to LLVM lock in here in a presumably generic CppCompile class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! This was fixed in 6843579! Thanks for describing the regressions!

return NativeToolchain.scoped_instance(self)

@memoized_property
def _cpp_toolchain(self):
llvm_cpp_toolchain = self._request_single(LLVMCppToolchain, self._native_toolchain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More lock in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should be fixed by 6843579!

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John has provided a very useful review for this, so I'll stay out of it. But will say that what you've done to inline things into src/python/pants/backend/native/subsystems/native_toolchain.py makes a lot of sense to me, and removes a bit of unnecessary abstraction.

def c_compiler(self):
exe_filename = 'gcc'
path_entries = self.path_entries()
_PLATFORM_INTERMEDIATE_DIRNAME = {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this entire class refers to the specifics of our GCC build is worth pointing out in a class level doc most likely.

@@ -30,7 +30,9 @@ def linker(self):
return Linker(
path_entries=self.path_entries(),
exe_filename='ld',
library_dirs=[])
library_dirs=[],
Copy link
Contributor

@CMLivingston CMLivingston Jul 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly confused in the distinction between this param and the one below it. This is slightly ambiguous to me as well:
@abstractproperty def library_dirs(self): """Directories containing shared libraries required for a subprocess to run."""

Maybe add an abstract property with a docstring to the Linker class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes things a lot neater! Added LinkerMixin in b2a750b!

libc_dev = yield Get(LibcDev, NativeToolchain, native_toolchain)
working_linker = Linker(
path_entries=(base_linker.path_entries + working_c_compiler.path_entries),
exe_filename=working_c_compiler.exe_filename,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are still linking through the compiler frontend? Based on path_entries, it looks like the we will hit our linker first, however why not explicitly invoke it? I know you explained this to me once but I'm having a tough time recalling...I think it had to do with critical search dirs that we are unable to find by directly executing the linker?

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been very difficult to invoke the linker directly when I've tried, for C or C++. If you try adding -v to the extra_args for a Linker somewhere you can see the (extremely long) generated command line -- I have not been able to make it work yet, after trying for a while, and as far as I can tell there is no reason we would to invoke the linker directly, unless the compiler frontend is pulling in something we don't want (which is what we are using the extra_args to avoid, with arguments such as -nostdinc++).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to clarify, I would love to invoke the linker directly, but it's not something that has ever borne fruit every time I've tried. if there's a way to do this that I'm missing I'm all ears)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(what I said to you before was a condensed version of that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't think this is a problem if we are hitting our linker regardless. If we find that the frontend is making things too difficult to build/debug, we can lean into direct invocation.

'-x', 'c++', '-std=c++11',
# These mean we don't use any of the headers from our LLVM distribution.
'-nobuiltininc',
'-nostdinc++',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that we will always be relying on system headers/includes to be there for any calls to std::*, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the opposite -- these flags stop clang from searching for or using the system headers (because we don't control what it finds in that way) except for the ones we provide via *_INCLUDE_PATH. We add the system headers that we control / know about specifically later in the @rule, either from XCodeCLITools or GCC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the definition of these more in 973f26e. Also, -nobuiltininc may not exist, or may not exist on Linux? I'm not sure, but it's not being recognized now -- will see if it's required on OSX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang++ --help is the only way I've been getting information about what any of these options do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

extra_args=llvm_cpp_compiler_args)
linking_library_dirs = provided_gpp.library_dirs + provided_clang.library_dirs
# Ensure we use libstdc++, provided by g++, during the linking stage.
linker_extra_args=['-stdlib=libstdc++']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in that we are using the provided lib dirs for linking, but not using them for compilation? This is for LLVM-based toolchains only, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linking_library_dirs aren't used in compilation, only the compiler object is used. This is only for LLVM toolchains -- this @rule is producing the single LLVMCppToolchain. The -stdlib argument is only recognized on clang, see the libcxx site (the new LLVM C++ standard library that we are turning off with this option, because it is not complete on Linux yet). See the LinkerMixin introduced in b2a750b which was added in response to your other review comment, which makes the difference between these directories more clear.

Copy link
Contributor

@CMLivingston CMLivingston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for circling back to fix these TODOs. A few questions from me, but I would like to hear your take on the llvm-locking happening in the native tasks (w.r.t. John's comments) before I approve these changes.

It would also be a great idea to get one more member of the Build team to give this a look to promote knowledge sharing of the native compilation support.

@cosmicexplorer
Copy link
Contributor Author

I removed some arguments to clang and clang++ while iterating on Linux that turned out to be necessary for OSX, but it should all be sorted out now (we'll see what travis has to say about that). All comments above should have been addressed.

@cosmicexplorer
Copy link
Contributor Author

Green, and OP updated.

@cosmicexplorer
Copy link
Contributor Author

Also broke out #6224 for further documentation of what we are doing and why we do what we are doing in native_toolchain.py.


if self.include_dirs:
ret['C_INCLUDE_PATH'] = create_path_env_var(self.include_dirs)
ret = super(CCompiler, self).as_invocation_environment_dict.copy()

ret['CC'] = self.exe_filename
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you been able to verify that setup.py compilation is taking place with the correct exe? I recall we hit some cases where it wasn't actually respecting this variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the correct exe_filename? There is no testing that the setup.py compilation does anything other than succeed, currently, but if you make the build fail (e.g. by inserting a syntax error into a C/C++ source), pex will print the stdout and stderr to the terminal stderr, and you can see that modifying exe_filenames (or e.g. selecting GCCCToolchain instead of the llvm one in python_native_code.py) will change the compiler used for the generated command line to build setup.py native sources.

At this point, I'm not sure what we should be testing wrt setup.py compilation other than success. If you have some itches you want to scratch, an issue just listing what tests we should add, or a PR doing that, would be great so I or someone else can address it in full. This wouldn't need to more than a few sentences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that may have been the case (wrt not respecting CC), but I don't remember the situation, and the issue may have been e.g. that we were adding the wrong entries to the PATH.

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general unease I have with new and existing code here is the proliferation of tiny rules that don't obviously do anything expensive. Some definitely just construct a datatype instance from other datatype instances - which is not good. Others are less clear. The problem though is that each of these tiny rules will run on a different machine in the future! That's completely un-necessary and although it would be nice to not have to think about that when writing rules, I think - currently at least - you do have to consider that.

@@ -129,3 +139,157 @@ def check_build_for_current_platform_only(self, targets):
'native code. Please ensure that the platform arguments in all relevant targets and build '
'options are compatible with the current platform. Found targets for platforms: {}'
.format(str(platforms_with_sources)))


@rule(CToolchain, [Select(PythonNativeCode)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CToolchain but you limit to LLVMCToolchain here (GCCCToolchain is ruled out). Why is this OK? Is this required to avoid the central problem that you have 2 producers for CToolchain components in GCC and LLVM registered rules with the engine, and to request a CCompiler straight up would thus blow up with an ambiguous producer? A comment / TODO to address would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% correct on why this is required. This concern was vaguely addressed with the TODO(#4020): These classes are performing the work of variants. I can add your comment to the usage here to make it clear how variants would improve the ux.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be great, but this is not just UX, it's a bug. There is currently no way to select GCC.

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, my comment was under the assumption that it was possible to select GCC for C/C++ elsewhere (this is done in test_native_toolchain.py) -- we just currently elect to use the LLVM toolchain when compiling python_dist()s without making that configurable yet. Am I missing something?

I just added 4f5194c which removes the rules which provide a bare CToolchain or CppToolchain, and expands the explanation of why the wrappers are necessary/how to request them. EDIT: the right commit is ac543fe now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but in the test you select GCC explicitly. What's the point of the generic CToolchain if you know you're getting an LLVMCToolchain or GCCCToolchain toolchain because you ask for one of those directly by type. There is perhaps more abstraction than needed.

class CToolchain(datatype([('c_compiler', CCompiler), ('c_linker', Linker)])): pass


class LLVMCToolchain(datatype([('c_toolchain', CToolchain)])): pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still missing the need for indirection in CToolchain and CppToolchain variants. Why not just subtype directly?:

class CToolchain(datatype([('c_compiler', CCompiler), ('c_linker', Linker)])): pass


class LLVMCToolchain(CToolchain): pass


class GCCCToolchain(CToolchain): pass

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then I can't figure out how to do e.g. this in python_native_code.py:

@rule(CToolchain, [Select(PythonNativeCode)])
def select_c_toolchain_for_local_dist_compilation(python_native_code):
  return Get(LLVMCToolchain, NativeToolchain, python_native_code.native_toolchain)


@rule(CppToolchain, [Select(PythonNativeCode)])
def select_cpp_toolchain_for_local_dist_compilation(python_native_code):
  return Get(LLVMCppToolchain, NativeToolchain, python_native_code.native_toolchain)

because afaik the engine matches on an Exactly constraint when type-checking @rule return types. Just now, when trying the edits in your comment and changing python_native_code.py to the above, I get this error message excerpt when running ./pants test tests/python/pants_test/backend/python/tasks:python_native_code_testing -- -vsk test_python_create_platform_specific_distribution:

E         ExecutionError: Received unexpected Throw state(s):
                     E         Computing Select(<pants.backend.python.subsystems.python_native_code.PythonNativeCode object at 0x11141ad90>, =SetupPyNativeTools)
                     E           Noop(No task was available to compute the value.)

cc @stuhood: is it correct to say this is happening because of an Exactly type constraint on rule results? Or is there something else I'm missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume by return Get(... you mean yield Get(.... Gets are only meant to work with yields.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. This will give a more useful error post #5788... but currently returning the wrong type from a rule triggers that Noop behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct -- I had tried both, and yield Get(...) also fails, with a StopIteration error, but I had neglected to mention that above.

# TODO: could this kind of @rule be automatically generated?
@rule(SetupPyNativeTools, [Select(CToolchain), Select(CppToolchain), Select(Platform)])
def get_setup_py_native_tools(c_toolchain, cpp_toolchain, platform):
yield SetupPyNativeTools(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule was removed entirely in 9846475!

# only creating an environment at the very end.
native_tools = self.setup_py_native_tools
if native_tools:
# TODO: an as_tuple() method for datatypes could make this destructuring cleaner!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually a sign you should eliminate the middleman. See comment below RE SetupPyNativeTools rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have so far ended up keeping the SetupPyExecutionEnvironment for now, but removing all the rules associated with it. This is because the right environment is difficult to create simply by composing together other executable environments (for now) -- but I did expand the comment here and remove the TODO in 7ebfe51.

"""


# TODO: could this kind of @rule be automatically generated?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time you have a rule like this I think its a sign you don't need a rule like this - for 2 reasons:

  1. You're likely creating a middleman wrapper struct - if so, let the folks needing the struct components ask for them directly.
  2. You're just running non-io python code - this is generally an abuse of parallelism and in-efficient, just run that code where you need to run it, not off in a rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a very helpful comment! Along these lines, I was actually able to remove all the rules from the python backend in 9846475 and request LLVMCToolchain/etc directly.

@cosmicexplorer cosmicexplorer force-pushed the refactor-llvm-gcc-toolchain-selection branch from 4f5194c to ac543fe Compare July 24, 2018 20:32
@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Jul 24, 2018

In response to the above and other comments on individual lines in the diff, I was able to remove all of the rules from the python backend. I would strongly appreciate not having to mix concerns of the execution model with the @rule interface -- I'm really digging the use as a generic typed dependency injection facility. I will see if it would be reasonable to intersperse a non-futurized/parallelized execution model at some point in the future into the one we have right now to avoid having to run certain logic in the coroutine execution model, if only because, incredibly, that is something I was thinking a lot about before I ever started working on pants.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Py3 compliant to me.

@cosmicexplorer cosmicexplorer merged commit 2fc57cb into pantsbuild:master Jul 24, 2018
cosmicexplorer added a commit that referenced this pull request Jul 28, 2018
…toolchain selection (#6217)

### Problem

#5951 explains the problem addressed by moving CLI arguments to individual `Executable` objects -- this reduces greatly the difficulty in generating appropriate command lines for the executables invoked. In this PR, it can be seen to remove a significant amount of repeated boilerplate.

Additionally, we weren't distinguishing between a `Linker` to link the compiled object files of `gcc` or `g++` vs `clang` or `clang++`. We were attempting to generate a linker object which would work with *any of* `gcc`, `g++`, `clang`, or `clang++`, and this wasn't really feasible. Along with the above, this made it extremely difficult and error-prone to generate correct command lines / environments for executing the linker, which led to e.g. not being able to find `crti.o` (as one symptom addressed by this problem).

### Solution

- Introduce `CToolchain` and `CppToolchain` in `environment.py`, which can be generated from `LLVMCToolchain`, `LLVMCppToolchain`, `GCCCToolchain`, or `GCCCppToolchain`. These toolchain datatypes are created in `native_toolchain.py`, where a single `@rule` for each ensures that no C or C++ compiler that someone can request was made without an accompanying linker, which will be configured to work with the compiler.
- Introduce the `extra_args` property to the `Executable` mixin in `environment.py`, which `Executable` subclasses can just declare a datatype field named `extra_args` in order to override. This is used in `native_toolchain.py` to ensure platform-specific arguments and environment variables are set in the same `@rule` which produces a paired compiler and linker -- there is a single place to look at to see where all the process invocation environment variables and command-line arguments are set for a given toolchain.
- Introduce the `ArchiveFileMapper` subsystem and use it to declare sets of directories to resolve within our BinaryTool archives `GCC` and `LLVM`. This subsystem allows globbing (and checks that there is a unique expansion), which makes it robust to e.g. platform-specific paths to things like include or lib directories.

### Result

Removes several FIXMEs, including heavily-commented parts of `test_native_toolchain.py`. Partially addresses #5951 -- `setup_py.py` still generates its own execution environment from scratch, and this could be made more hygienic in the future. As noted in #6179 and #6205, this PR seems to immediately fix the CI failures in those PRs.
CMLivingston pushed a commit to CMLivingston/pants that referenced this pull request Aug 27, 2018
…toolchain selection (pantsbuild#6217)

### Problem

pantsbuild#5951 explains the problem addressed by moving CLI arguments to individual `Executable` objects -- this reduces greatly the difficulty in generating appropriate command lines for the executables invoked. In this PR, it can be seen to remove a significant amount of repeated boilerplate.

Additionally, we weren't distinguishing between a `Linker` to link the compiled object files of `gcc` or `g++` vs `clang` or `clang++`. We were attempting to generate a linker object which would work with *any of* `gcc`, `g++`, `clang`, or `clang++`, and this wasn't really feasible. Along with the above, this made it extremely difficult and error-prone to generate correct command lines / environments for executing the linker, which led to e.g. not being able to find `crti.o` (as one symptom addressed by this problem).

### Solution

- Introduce `CToolchain` and `CppToolchain` in `environment.py`, which can be generated from `LLVMCToolchain`, `LLVMCppToolchain`, `GCCCToolchain`, or `GCCCppToolchain`. These toolchain datatypes are created in `native_toolchain.py`, where a single `@rule` for each ensures that no C or C++ compiler that someone can request was made without an accompanying linker, which will be configured to work with the compiler.
- Introduce the `extra_args` property to the `Executable` mixin in `environment.py`, which `Executable` subclasses can just declare a datatype field named `extra_args` in order to override. This is used in `native_toolchain.py` to ensure platform-specific arguments and environment variables are set in the same `@rule` which produces a paired compiler and linker -- there is a single place to look at to see where all the process invocation environment variables and command-line arguments are set for a given toolchain.
- Introduce the `ArchiveFileMapper` subsystem and use it to declare sets of directories to resolve within our BinaryTool archives `GCC` and `LLVM`. This subsystem allows globbing (and checks that there is a unique expansion), which makes it robust to e.g. platform-specific paths to things like include or lib directories.

### Result

Removes several FIXMEs, including heavily-commented parts of `test_native_toolchain.py`. Partially addresses pantsbuild#5951 -- `setup_py.py` still generates its own execution environment from scratch, and this could be made more hygienic in the future. As noted in pantsbuild#6179 and pantsbuild#6205, this PR seems to immediately fix the CI failures in those PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants