Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track publisher confirmations automatically #1687

Merged
merged 1 commit into from
Oct 8, 2024

Conversation

lukebakken
Copy link
Contributor

Fixes #1682

  • Remove ConfirmSelectAsync from IChannel
  • Add parameters to enable confirmations on IConnection.CreateChannelAsync

@Tornhoof
Copy link
Contributor

IConnection.CreateChannelAsync

I wonder if an enum wouldn't be better, something like PublisherConfirmation.None, Automatic, Manual.
Mostly because two bools can model 4 states and I think we only need three and two bools closely related with similar names always mean, I mix them up and use them wrongly. ;)

@lukebakken
Copy link
Contributor Author

lukebakken commented Sep 23, 2024

The current API is ONLY to facilitate removal of ConfirmSelectAsync, that's it! I'm sure @danielmarbach or @bording will have something different.

@danielmarbach / @bording - how should we coordinate work on this feature?

@danielmarbach
Copy link
Collaborator

@lukebakken what was your plan with this PR? Was your idea to get this going and then eventually get reviews etc from our end?

Regarding the options, I think it might be best to have something like @Tornhoof mentioned to better represent the states of options that are possible if the current configurability needs to be preserved.

Since it is nowhere explicitly stated, I wanted to ask if you and your team have discussed why confirms and tracking should be individual options or even allowed to be turned off? Assuming confirms is the default, why would someone then not want to have tracking enabled? Is it purely to preserve options that allow to opt in to not so safe options to get more speed?

@lukebakken
Copy link
Contributor Author

Is it purely to preserve options that allow to opt in to not so safe options to get more speed?

Yes, that's it. I'll get started on this today.

@bording
Copy link
Collaborator

bording commented Sep 23, 2024

I can see allowing publisher confirms to be enabled or not, but I would argue that once you've enabled them, the client should just handle tracking them for you now that the async API allows for that to be done in a per-message way.

@danielmarbach
Copy link
Collaborator

@lukebakken how do you intend to expose the number of outstanding confirms? Or is this something you want to follow up in another PR?

@lukebakken
Copy link
Contributor Author

🤷 I'm just now switching gears from working on our AMQP 1.0 Go client 😵‍💫

@danielmarbach
Copy link
Collaborator

Looking at all the things required to pass at channel creation I think it might be better to have a channel creation options type/class similar to Azure SDK options types to configure various client types. Adding more overloads with primitive types seems to burry things and make the usage unnecessary complex

@danielmarbach
Copy link
Collaborator

@lukebakken I pushed the options type as a small contribution since you thumbed that up

Regarding

There will be a maximum count of outstanding confirmations allowed (also configurable). When this maximum is reached, calls to BasicPublishAsync will immediately return an exception.

What is the relationship of PublisherConfirmationsEnabled and PublisherConfirmationTrackingEnabled to the number of outstanding confirms? Is the idea that there will be some sort of MaximumNumberOfOutstandingPublisherConfirms that is set to an initial value and get be set to null which then means infinite?

You wrote something about the publish starting throwing an exception. Why not simply let them wait until new slots are available? A cancellation token passed to the publish would allow enforcing some sort of timeout anyway? Or does it have to be more sophisticated?

@danielmarbach
Copy link
Collaborator

I wanted to use init only members on the option type but realized that would require something like Polyfill or Polysharp to fill the gaps.

@lukebakken
Copy link
Contributor Author

@danielmarbach - thanks for taking a look and contributing. I have been sidelined by RabbitMQ support this week.

What is the relationship of PublisherConfirmationsEnabled and PublisherConfirmationTrackingEnabled to the number of outstanding confirms? Is the idea that there will be some sort of MaximumNumberOfOutstandingPublisherConfirms that is set to an initial value and get be set to null which then means infinite?

I haven't thought this through all the way but that sounds right.

  • A way to totally disable publisher confirmations
  • A way to enable confirmations, but disable internal tracking
  • When internal tracking is enabled, allow setting a limit to the outstanding number of confirms

You wrote something about the publish starting throwing an exception. Why not simply let them wait until new slots are available? A cancellation token passed to the publish would allow enforcing some sort of timeout anyway? Or does it have to be more sophisticated?

A wait with a timeout sounds great to me. If a token isn't passed to BasicPublishAsync we should have a default timeout value, like 5 seconds.

@danielmarbach
Copy link
Collaborator

danielmarbach commented Sep 26, 2024

we should have a default timeout value, like 5 seconds.

Why is that? In the world of cooperative cancellation, the notion of a timeout is usually expressed by passing a token that is attached to a token source that enforces the timeout. Then everything is in the hand of the caller. If you introduce another arbitrary way of setting a timeout you need to deal with different overloads of the method.

Or is the intent to build a similar API to SemaphoreSlim.WaitAsync that allows having a timeout overload which makes the method return after the timeout without throwing (then you would have to have some kind of return value indicating the publish was not successful within the defined timeout which also sounds a bit weird but maybe I'm not understanding things clearly)

@lukebakken
Copy link
Contributor Author

Why is that? In the world of cooperative cancellation, the notion of a timeout is usually expressed by passing a token that is attached to a token source that enforces the timeout.

Ah, OK. We have places in the API where a TimeSpan can be passed for a timeout, but maybe we should re-re-review the public API for that and just expect users to know to use CancellationTokens.

@paulomorgado
Copy link
Contributor

paulomorgado commented Sep 27, 2024

I wanted to use init only members on the option type but realized that would require something like Polyfill or Polysharp to fill the gaps.

We shouldn't be depending on any polyfill, specially when there are more than one known to exist, because it can create conflicts for users..

The compiler depends only on fully qualified names (that's why those libraries can exist) and the types don't even need to be public. The compiler itself generates some internal types rather than relying on a runtime specific types.

We can just grab the source of whatever types we need and add them as internal to the client.

That being said, init members are great for when you are initializing them, but there's no guarantee that they are initialized when consuming (you can still create instances with reflection). But like them for controlled use cases and I'd go for it.

But it's not that hard to use create these kind of types with mandatory constructor parameters.

@danielmarbach
Copy link
Collaborator

danielmarbach commented Sep 27, 2024

We shouldn't be depending on any polyfill, specially when there are more than one known to exist, because it can create conflicts for users..

The compiler depends only on fully qualified names (that's why those libraries can exist) and the types don't even need to be public. The compiler itself generates some internal types rather than relying on a runtime specific types

We can just grab the source of whatever types we need and add them as internal to the client

Those statements seem to contradict each other. The polyfil libraries I linked are source only dependencies and by default are internal. There is no clash with public types and effectively the copying is no longer necessary because that has already been done by the authors of the polyfil libraries in a considerate and consumable way.

I'm not saying copying is not an option. In fact I mentioned that on the issue I opened. I'm commenting directly on the points you brought up as a counter argument to not take a Dependency. There are other downsides of taking a Dependency to those source packages that are worth discussing though.

FYI following through your reasoning it would also be necessary to remove the private assets only NuGet Dependency to Nullable which the client takes a Dependency on today

@danielmarbach
Copy link
Collaborator

Why is that? In the world of cooperative cancellation, the notion of a timeout is usually expressed by passing a token that is attached to a token source that enforces the timeout.

Ah, OK. We have places in the API where a TimeSpan can be passed for a timeout, but maybe we should re-re-review the public API for that and just expect users to know to use CancellationTokens.

I think there are valid reasons for timeouts when the underlying mechanism uses / enforces timeouts like HTTP, TCP etc. In the example with publisher confirms it feels forced but again I might be misunderstanding things

@paulomorgado
Copy link
Contributor

We shouldn't be depending on any polyfill, specially when there are more than one known to exist, because it can create conflicts for users..

The compiler depends only on fully qualified names (that's why those libraries can exist) and the types don't even need to be public. The compiler itself generates some internal types rather than relying on a runtime specific types

We can just grab the source of whatever types we need and add them as internal to the client

Those statements seem to contradict each other. The polyfil libraries I linked are source only dependencies and by default are internal. There is no clash with public types and effectively the copying is no longer necessary because that has already been done by the authors of the polyfil libraries in a considerate and consumable way.

I'm not saying copying is not an option. In fact I mentioned that on the issue I opened. I'm commenting directly on the points you brought up as a counter argument to not take a Dependency. There are other downsides of taking a Dependency to those source packages that are worth discussing though.

FYI following through your reasoning it would also be necessary to remove the private assets only NuGet Dependency to Nullable which the client takes a Dependency on today

Sorry, I assumed they were libs, without checking.

Polyfill looks like an all or nothing with the only option being making the code public or not.

I haven't tested Polysharp, but if it works as advertised, looks good.

@lukebakken
Copy link
Contributor Author

@danielmarbach @paulomorgado we won't be adding any polyfill libraries to this project unless they are absolutely necessary for a feature. I hope to continue with this PR soon. Thanks!

@paulomorgado
Copy link
Contributor

@danielmarbach @paulomorgado we won't be adding any polyfill libraries to this project unless they are absolutely necessary for a feature. I hope to continue with this PR soon. Thanks!

@lukebakken,

Those are not libraries (that was my mistake). They are tools.

Polyfill adds everything as source:

image

The downside there, as I see it, is that it requires C" 12.0.

Polysharp generates the polyfills:

image

And can be used with C# 10.0.

But, in the end, all we need is:

namespace System.Runtime.CompilerServices
{
    [global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
    [global::System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverage]
    internal static class IsExternalInit
    {
    }
}

With a few lines of MSBuild it's possible to remove the extra files from either tool

@lukebakken
Copy link
Contributor Author

"init only members" seems like a "nice to have" code feature, and not a "must have".

@lukebakken
Copy link
Contributor Author

lukebakken commented Oct 3, 2024

@danielmarbach @bording I've just committed support for basic.return and changed the BasicPublishAsync return value to be ValueTask<bool> - true if the message was routed and ack-ed, false otherwise.

I'd appreciate a review, thanks!

This API is not ideal for a library user, because it's up to them to correlate publishes with the associated ValueTask<bool> instances. However, it's not like we have a "message ID" other than publish sequence number to do that ourselves.

Thoughts? Leave that up to users?

@danielmarbach
Copy link
Collaborator

@lukebakken I see what I can do tomorrow before going on vacation for a week. Today it's already a bit late for me to review things

@lukebakken
Copy link
Contributor Author

@danielmarbach no rush! Enjoy your time off 🌴

@danielmarbach
Copy link
Collaborator

@lukebakken I have submitted a different approach which I think would be way more usable #1698. This is not done done because of my limited time but a start I guess

@danielmarbach
Copy link
Collaborator

A few more generic comments:

  • Overall it seems the complexity is non trivial and there might be multiple ways to reduce down the complexity of the bool checks a bit to reduce the cognitive load
  • Maybe it is also worth considering to always use a TaskCompletionSource in BasicPublish to streamline a bit the method implementation (of course there are some tradeoffs there to consider due to the additional allocation
  • Why doesn't the MaybeAddPublishSequenceNumberToHeaders use the NetworkOrderSerializer?
  • It was hard for me to follow the flow of when the sequence number is actually added to the headers due to the complicated logic flow and I wasn't sure if it really gets added in all necessary cases. It seems it should be added in all cases when publisherConfirms is enabled independent of wether the activity is present or not
  • It seems to be possible when publisher confirms are enabled for races to occurs around the sequence number. For example one operation tries to acquire the next publish sequence number over the new method while another operation does a basic publish. The basic publish might hold up the acquiring of the sequence number but then when the model send happens the operation of trying to acquire the sequence number is released, increments the sequence number, meanwhile an exception happens during model send which then decrements the sequence number again. Aren't then things out of sync? Should the confirmSemaphore be held longer for more safety? (Again I would probably need more time to really think this through so I might be wrong)

@lukebakken
Copy link
Contributor Author

@danielmarbach thank you!

Overall it seems the complexity is non trivial

Yes, I agree. I was just about to pull all the confirm-related code into its own class to at least keep everything in the same place.

Maybe it is also worth considering to always use a TaskCompletionSource in BasicPublish

I'll check your new PR

Why doesn't the MaybeAddPublishSequenceNumberToHeaders use the NetworkOrderSerializer?

I wasn't 100% sure of the best way to create a buffer to write the sequence number to use that method.

It was hard for me to follow the flow of when the sequence number is actually added to the headers

Yes, that is exactly when it is added - in all cases when confirms are enabled, whether or not tracking is enabled.

It seems to be possible when publisher confirms are enabled for races to occurs around the sequence number. For example one operation tries to acquire the next publish sequence number over the new method while another operation does a basic publish. The basic publish might hold up the acquiring of the sequence number but then when the model send happens the operation of trying to acquire the sequence number is released, increments the sequence number, meanwhile an exception happens during model send which then decrements the sequence number again. Aren't then things out of sync? Should the confirmSemaphore be held longer for more safety?

Yes, I think you're correct.

@danielmarbach
Copy link
Collaborator

Yes, that is exactly when it is added - in all cases when confirms are enabled, whether or not tracking is enabled.

It seems though it opts out when the activity is null

@lukebakken
Copy link
Contributor Author

lukebakken commented Oct 4, 2024

It seems though it opts out when the activity is null

When activity is null and confirmations are disabled. Yes, the logic is confusing so I'll add a comment.

@danielmarbach
Copy link
Collaborator

It seems though it opts out when the activity is null

When activity is null and confirmations are disabled. Yes, the logic is confusing so I'll add a comment.

I always knew I would fail any job interview when being presented with bool flags 😂

@lukebakken
Copy link
Contributor Author

I always knew I would fail any job interview when being presented with bool flags 😂

A-men! I got it wrong the first time... "Why aren't my headers being added???" ... I used || instead of && 🤦

Fixes #1682

* Remove `ConfirmSelectAsync` from `IChannel`
* Add parameters to enable confirmations on `IConnection.CreateChannelAsync`
* Remove / comment out all use of `WaitForConfirms...` methods.
* Dispose -> DisposeAsync
* Implement confirmation tracking and await-ing in `BasicPublishAsync`
* Ensure exceptions make into inner exception for `HardProtocolException`
* Remove commented-out code related to `WaitForConfirms...` methods.
* Unblock so that `CloseAsync` can succeed.
* Introduce channel creation options
* Allow cancellation of final await for publisher confirmation in `BasicPublishAsync`
* Fix `dotnet format` verification error
* Make `ConfirmSelectAsync` `private` and assume that semaphore is held.
* Track sequence number for basic.return
* Implement `basic.return` support.
* Fix the code that adds OTel and publish sequence number headers.
* Add publish sequence number as long as `_publisherConfirmationsEnabled` is true
* Fix the `PublisherConfirms` test program
* Add license headers
* Enable publisher confirms
* Spike an exception based approach (misses removing the bool value return type)
* Extend the use of `_confirmSemaphore` to the duration of when exceptions could be caught.
* Restore how @danielmarbach serialized the publish sequence number.
* Fix bug in how headers are added to `BasicProperties` that don't already have them.
* Use `ValueTask` as the `BasicPublishAsync` return value.

---------

Co-authored-by: Daniel Marbach <danielmarbach@users.noreply.github.com>
Co-authored-by: Luke Bakken <luke@bakken.io>

* Add `PublishException` class.

* Test that non-routable messages result in `PublishException` with `IsReturn = true`

* Code documentation

Simplify code.
@lukebakken lukebakken marked this pull request as ready for review October 8, 2024 18:31
@lukebakken lukebakken merged commit e84b658 into main Oct 8, 2024
17 checks passed
@lukebakken lukebakken deleted the rabbitmq-dotnet-client-1682 branch October 8, 2024 18:56
@lukebakken
Copy link
Contributor Author

@danielmarbach I'm planning to make the code more readable before the final release, at least by moving the publisher confirmation related code to a partial class or its own class.

I also haven't begun work on blocking publishers when there are too many outstanding confirms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make handling of publisher confirmations transparent to the user
5 participants