Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added infrastructure for Ruby plugin loading. #116

Open
wants to merge 2 commits into
base: stable-2.0
Choose a base branch
from

Conversation

Kerilk
Copy link

@Kerilk Kerilk commented Dec 3, 2020

Hello,

This is a proposal (working) implementation of ruby plugin support.

It goes hand in hand with the ruby bindings for babeltrace2 babeltrace2-ruby, where most of the tests are done. The bindings are not documented yet, and for now the test are the documentation. Nonetheless I kept as close as possible to the babeltrace2 design and anybody familiar with the API shouldn't have too much trouble navigating around the sources or understanding the tests. For now this is a pure Ruby (ffi) binding with no native code.

I am making this pull request to kickstart the discussion around plugin-provider support in babeltrace2, and the result of the discussion may end up shaping the final contribution, so I am making it sooner rather than later. All the remarks I am making are only representing my current understanding of the API and I could very well be wrong on some or all of the following points, sorry in advance if that is the case.

The work done here and on the bindings does raise a few questions and remarks:

  • the APIs to create and populate bt_plugin and bt_plugin_set are not exported currently:
    • This forces the plugin instantiation to be done inside the plugin-provider. If those APIs could be used inside the python or ruby library, the plugin-provider code could be greatly reduced.
    • If those APIs are exported and a slightly modified plugin-provider loading mechanism is implemented, it should become possible to implement plugin providers out of babeltrace2 source tree. plugin-providers would effectively become plugins themselves. (And it should be still possible to embed those that have their sources in the tree).
  • About bt_graph:
    • The bt_graph object lacks introspection. I would have expected to be able to print a full graph from an opaque bt_graph handle, this could be done if one had access to the components and connections inside the graph.
    • The bt_graph object does not have an bt_graph_interrupted API whereas it can be associated to interrupter objects. I understand that from a C application perspective you just provided the interrupter, so you can check it, but from a high level language you may want your run loop to look like:
    def run
      while ((res = Babeltrace2.bt_graph_run(@handle)) == :BT_GRAPH_RUN_STATUS_AGAIN)
        raise "interrupted" if interrupted?
        sleep BT_SLEEP_TIME
      end
      raise Babeltrace2.process_error(res) if res != :BT_GRAPH_RUN_STATUS_OK
      self
    end
  • Classes that have callback method associated (bt_component_class, bt_message_iterator_class) have finalize callbacks for their instance but not themselves. In C this is often not an issue as callback are rarely dynamically allocated. In bindings though, the callback are often dynamically allocated closures and would need to be freed once their classes are destroyed, which is not possible in the current implementation. In the grand scheme of things this is of little consequence, but the memory leak exists.
    • This is also true for bt_graph objects with associated callbacks.
    • On a side note, the order in which the finalize callbacks are called is not specified in the documentation, so one can't assume they are called in the reverse order they are added, which would greatly simplify some bookkeeping operations.

This more or less sums up my feedback. I would also like to state that working with/within the library is great, the job done on Babeltrace2 is awesome.

How would you like to proceed with this? Would you be willing to accept the ruby plugin-provider in the tree (with any fixes you would deem needed of course), or would you rather go the plugin-provider plugin route?

Thanks.

PS: the rest of those are small issues I noticed and I am putting there in case I forget to submit them at the appropriate place:

  • Documentation for Integer Field Class bt_field_class_integer_set_field_value_range states that signed integer range is: [-2^N, 2^N - 1] whereas from my testing (and it makes more sense to me) it should be [-2^(n-1), 2^(n-1)-1]
  • bt_field_string_clear does not set the NULL terminator byte in position 0, nor does it reset the set state, so reading from a cleared string using bt_field_string_get_value is possible even in developer mode and returns a pointer to the previous data. One needs to check the size to know that there is an issue. This could be the expected behavior, but it surprised me so I thought I would mention it.

@jgalar jgalar requested a review from eepp December 8, 2020 20:38
@eepp
Copy link
Member

eepp commented Dec 9, 2020

Hello,

Thank you for your interest in contributing to the project.

This kind of review is much appreciated by the team.

This is a proposal (working) implementation of ruby plugin support.

It goes hand in hand with the ruby bindings for babeltrace2 babeltrace2-ruby, where most of the tests are done. The bindings are not documented yet, and for now the test are the documentation. Nonetheless I kept as close as possible to the babeltrace2 design and anybody familiar with the API shouldn't have too much trouble navigating around the sources or understanding the tests. For now this is a pure Ruby (ffi) binding with no native code.

I'll let @jgalar answer to this part (Ruby bindings).

I am making this pull request to kickstart the discussion around plugin-provider support in babeltrace2, and the result of the discussion may end up shaping the final contribution, so I am making it sooner rather than later. All the remarks I am making are only representing my current understanding of the API and I could very well be wrong on some or all of the following points, sorry in advance if that is the case.

The work done here and on the bindings does raise a few questions and remarks:

  • the APIs to create and populate bt_plugin and bt_plugin_set are not exported currently:

    • This forces the plugin instantiation to be done inside the plugin-provider. If those APIs could be used inside the python or ruby library, the plugin-provider code could be greatly reduced.
    • If those APIs are exported and a slightly modified plugin-provider loading mechanism is implemented, it should become possible to implement plugin providers out of babeltrace2 source tree. plugin-providers would effectively become plugins themselves. (And it should be still possible to embed those that have their sources in the tree).

Yes, the plugin provider mechanism is not intended to be public as of Babeltrace 2.0.

We made this so that the Python plugin support can be distributed as a separate package to avoid having the base Babeltrace 2 depend on Python.

In fact, it's pretty hard coded now.

I'm open to making some plugin provider API public, which is totally possible (making plugin providers "plugins" themselves too, as you wrote), but it's not a priority.

  • About bt_graph:

    • The bt_graph object lacks introspection. I would have expected to be able to print a full graph from an opaque bt_graph handle, this could be done if one had access to the components and connections inside the graph.

This was not planned for Babeltrace 2.0, but it's always in the back of our minds.

Having an API which makes it possible to visualize the graph components/iterators and see the messages flowing would be nice, like Windows's GraphEdit:

image

  • The bt_graph object does not have an bt_graph_interrupted API whereas it can be associated to interrupter objects. I understand that from a C application perspective you just provided the interrupter, so you can check it, but from a high level language you may want your run loop to look like:
    def run
      while ((res = Babeltrace2.bt_graph_run(@handle)) == :BT_GRAPH_RUN_STATUS_AGAIN)
        raise "interrupted" if interrupted?
        sleep BT_SLEEP_TIME
      end
      raise Babeltrace2.process_error(res) if res != :BT_GRAPH_RUN_STATUS_OK
      self
    end

Why can't you keep that interrupter object, like you suggest?

What's to understand here is that, conceptually, there's no such thing as an interrupted graph: each message iterator and each sink component has a list of interrupters which can be shared. For Babeltrace 2.0, we made a hack: a graph has a default interrupter which is "shared" by all the message iterators/sinks (current and future), and you can also add one to all the current/future message iterators/sinks of a given graph. But the graph itself never owns/checks interrupters.

This design will make it possible in the future for a component to interrupt one of its message iterators, for any reason, while potentially leaving the other ones running.

  • Classes that have callback method associated (bt_component_class, bt_message_iterator_class) have finalize callbacks for their instance but not themselves. In C this is often not an issue as callback are rarely dynamically allocated. In bindings though, the callback are often dynamically allocated closures and would need to be freed once their classes are destroyed, which is not possible in the current implementation. In the grand scheme of things this is of little consequence, but the memory leak exists.

    • This is also true for bt_graph objects with associated callbacks.

At one point during Babeltrace 2's development, you could set a "listener removed" listener when adding a listener to a graph. I then removed that (f3d6b4c) when I realized that, unlike a trace class for example, a graph is not shared by design: sharing a graph after you create it is outside the scope of Babeltrace 2.

This is why, in the Python bindings, the bt2.Graph object keeps a list of listener callables itself so that they live as long as the Python object exists, and will be naturally destroyed when the object is garbage-collected.

I advise you to consider a similar approach (for component classes and message iterator classes as well).

  • On a side note, the order in which the finalize callbacks are called is not specified in the documentation, so one can't assume they are called in the reverse order they are added, which would greatly simplify some bookkeeping operations.

Which finalization callbacks are you talking about?

If you're talking about message iterator finalization, for example, then the C API doc says:

The library guarantees that all message iterators are destroyed before their component is destroyed.

Also, the finalization order is natural: if a message iterator A owns another message iterator B, then B is necessarily finalized before A. The same applies for sink components owning message iterators (they are finalized before the component owning them).

This more or less sums up my feedback. I would also like to state that working with/within the library is great, the job done on Babeltrace2 is awesome.

Thank you!

How would you like to proceed with this? Would you be willing to accept the ruby plugin-provider in the tree (with any fixes you would deem needed of course), or would you rather go the plugin-provider plugin route?

@jgalar will answer this too.

Thanks.

PS: the rest of those are small issues I noticed and I am putting there in case I forget to submit them at the appropriate place:

  • Documentation for Integer Field Class bt_field_class_integer_set_field_value_range states that signed integer range is: [-2^N, 2^N - 1] whereas from my testing (and it makes more sense to me) it should be [-2^(n-1), 2^(n-1)-1]

You're right; doc is wrong. I'll fix that.

  • bt_field_string_clear does not set the NULL terminator byte in position 0, nor does it reset the set state, so reading from a cleared string using bt_field_string_get_value is possible even in developer mode and returns a pointer to the previous data. One needs to check the size to know that there is an issue. This could be the expected behavior, but it surprised me so I thought I would mention it.

I believe that's a bug. Doc says:

Clears the string field field, making its value an empty string.

Code doesn't do that:

static inline
void clear_string_field(struct bt_field *field)
{
	struct bt_field_string *string_field = (void *) field;

	BT_ASSERT_DBG(field);
	string_field->length = 0;
	bt_field_set_single(field, true);
}

Moreover, bt_field_set_single(field, true) makes the field have a value in the eye of precondition checks, so yes that's a bug. Will fix.

@eepp
Copy link
Member

eepp commented Dec 9, 2020

Fixes posted here and here.

@Kerilk
Copy link
Author

Kerilk commented Dec 9, 2020

Thanks for the great feedback.

I will try to sum up some of the points and clear some things I might not have expressed correctly.

A lot of the previous discussion indeed flows from the fact that from your point of view graphs shouldn't be shared. In this case you can safely assume that when your python or ruby graph is garbage collected you can free the closures, and of course have access to the interrupters. All those assumption drop as soon as the graph handle is shared.

The component classes, are meant to be shared. For now I have to keep their closures alive, because even if they are garbage collected, I cannot assume someone else doesn't have the handle. They would, I think, benefit from a destruction listener. This would allow to move component class handle from ruby to C, and ensure that all the closures would be freed once the component class is destroyed, long after the ruby class ceased to exist.

  On a side note, the order in which the finalize callbacks are called is not
  specified in the documentation, so one can't assume they are called in
  the reverse order they are added, which would greatly simplify some bookkeeping operations.

Which finalization callbacks are you talking about?

I was talking about bt_trace_class_add_destruction_listener and friends, the documentation states:

All the destruction listener user functions of a trace class are called when it's being destroyed.

It doesn't state in which order those are called, having a reverse calling order would allow to enqueue a first callback to cleanup all the closures associated with the trace class. This is also true for all classes that have those.

One aspect I forgot to mention in my earlier post was the frozen status of a Babeltrace2 object. Once you start using high level languages like Ruby or Python you more or less forfeit high performance. I wanted my ruby binding to enforce the preconditions of the library, so you couldn't get a stop error, even (especially?) in developer mode. I hit a roadblock when trying to enforce the frozen preconditions. Tracking events which change objects to frozen is possible, but then this state needs to be propagated, or inferred, for subsequent objects created from the same handle. And those can come from a lot of different places. Introspection into the frozen status of a Babeltrace2 object would allow this to be done with minimal effort and simplify the design fo bindings wanting to enforce preconditions.

Thanks again.

@jgalar
Copy link
Member

jgalar commented Dec 9, 2020

First, thanks for this contribution, and for the great comments and questions.

I'll let @jgalar answer to this part (Ruby bindings).

I like that you kept the design close to the C and Python APIs. This way the concepts that are already documented remain clear for everyone.

I think it would make sense to refer to your bindings in the babeltrace documentation once you have the documentation and a proper release done.

@jgalar will answer this too.

As @eepp said, it's not a priority for us right now, but I like the plugin-provider idea.

Introspection into the frozen status of a Babeltrace2 object would allow this to be done with minimal effort and simplify the design fo bindings wanting to enforce preconditions.

The actual interface needed for this is indeed simple, but we don't keep track of that state outside of developer builds for performance reasons. For instance, the frozen state has to be set for each field and cleared when the objects are recycled.

I also suspect you will need to track whether or not a field has been set, and other state relevant to pre-conditions that we don't keep track of in "normal" builds.

I really think your best course of action is to track those on the bindings' side.
Do you have specific interrogations or concerns we could help with?

@Kerilk
Copy link
Author

Kerilk commented Dec 9, 2020

@jgalar thanks for the nice comment and feedback.

I think it would make sense to refer to your bindings in the babeltrace documentation once you have the documentation and a proper release done.

We will be experimenting internally with the bindings to iron out any issues I will have left in at this stage. I already pushed a babeltrace2 ruby gem in the wild (just for convenient installation), but I am not ready to publicize it yet. The lack of documentation is also a blocker, as you noted.

The actual interface needed for this is indeed simple, but we don't keep track of that state outside of developer builds for performance reasons. For instance, the frozen state has to be set for each field and cleared when the objects are recycled.

I also suspect you will need to track whether or not a field has been set, and other state relevant to pre-conditions that we don't keep track of in "normal" builds.

I was expecting something like this. This is not a problem, I'll try to enforce as much preconditions as I can.

@jgalar will answer this too.

As @eepp said, it's not a priority for us right now, but I like the plugin-provider idea.

So would you go with the pull request as is for now (minus corrections, comments and I will make another pass on it just to make sure everything is validated correctly)?

Thanks

@jgalar
Copy link
Member

jgalar commented Dec 9, 2020

So would you go with the pull request as is for now (minus corrections, comments and I will make another pass on it just to make sure everything is validated correctly)?

No. I would prefer something in line with @eepp's proposal:

I'm open to making some plugin provider API public, which is totally possible (making plugin providers "plugins" themselves too, as you wrote), but it's not a priority.

Essentially, I'd rather we modify the plug-in discovery and loading code to expose an API that accommodates out-of-tree plug-in providers than adding another "loader".

Both the Python and Ruby plug-in loaders could use this interface to discover plug-ins and allow the instantiation of their respective components.

@Kerilk
Copy link
Author

Kerilk commented Dec 9, 2020

That was also my second (and preferred sugestion):

If those APIs are exported and a slightly modified plugin-provider loading mechanism is implemented, it should become possible to implement plugin providers out of babeltrace2 source tree. plugin-providers would effectively become plugins themselves. (And it should be still possible to embed those that have their sources in the tree).

If you are otherwise busy with something else, I could make a proposal to transform the existing plugin support to allow external plugin providers as well as publicizing the relevant APIs so that they can be built out of tree. Tell me if that would be OK with you or if you'd rather do it yourself (I understand it wont be immediately of course).

@jgalar
Copy link
Member

jgalar commented Dec 9, 2020

I am busy on other things, but feel free to propose a first draft of the necessary changes and we'll comment on them :)

@eepp
Copy link
Member

eepp commented Dec 9, 2020

Also: our review process happens on Gerrit.

Please create one or more changes there when you're ready.

If you're not familiar with the Gerrit workflow, see its user guide.

@Kerilk
Copy link
Author

Kerilk commented Dec 9, 2020

Regarding the changes, for now, I have been working on the stable branch because I deploy the modified branches in production. What branch would you want me to target for this work?

@eepp
Copy link
Member

eepp commented Dec 10, 2020

What branch would you want me to target for this work?

master

@Kerilk
Copy link
Author

Kerilk commented Jan 15, 2021

I uploaded a first patch for review on Gerrit:
https://review.lttng.org/c/babeltrace/+/4642
It deals with the necessary opening of the API of the plugin and pluginset objects.

@Kerilk
Copy link
Author

Kerilk commented Jul 7, 2021

Fixes posted here and here.

@eepp
We encountered a use case of a valid lttng trace that is not read properly by babeltrace2 due to:
https://review.lttng.org/c/babeltrace/+/4528
String fields display the value of a previous event, despite their length being set to zero. Applying the patch to a stable 2.0 branch fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants