Turn nrefs into atomic_int #3370

ffesti · 2024-10-11T06:38:51Z

Throughout the code base reference counting is done in a thread-unsave way. A few places use the content lock (aka mutex) to also protect the reference counter. This is wrong and complicates freeing interconnected data structures. This lock should only protect the content of the instance.

Turn all nrefs members into atomic_int so they are thread-save themselves.

Make sure to only use ++ and -- operators and avoid non atomic combinations of reading and writing. Drop all the static *Unlink functions that are involved in this.

Make sure the data lock is not aquired while updating the nrefs members.

Technically the *Free() functions should *Link() the instance after the ref count has reached 0 and the decision to free it has been made. Otherwise the cleanup code linking and freeing to will lead to a recusions. Right now only rpmts needs that. SO it is omitted everywhere else.

FD_t is special in that fdFree return the instance after free iff it still has not reached a ref count of 0. Unfortunately code in rpmShowProgress() relies on that.

This does not make the data structures thread save on its own. But it gives a foundation on which data locking can be implemented.

pmatilai · 2024-10-11T06:50:09Z

Whoa. Unfortunately it's not quite that simple, the increment is atomic but the comparison is not.

It's a step in the right direction though and we can take that step, but slice this up a bit: the places where the change is just a straightforward and obvious Unlink -> --nrefs change can go in one bigger commit, but do each of the more special cases and involved changes as a separate commits, each with their own explanations. Makes it easier to bisect it if something goes wrong.

Mind, this is not a review, just took a quick overview of it.

pmatilai · 2024-10-11T07:10:14Z

And to make it clear: let's NOT get into the atomic compare-and-exchange stuff just here and now, this is a good intermediate step to that direction.

Other than that, I'll leave the review to @dmnks as per GH request, just commenting as I happened to notice this on the PR list 😅

ffesti · 2024-10-11T13:20:10Z

To me https://en.cppreference.com/w/cpp/atomic/atomic/operator_arith reads like the value returned from the ++ and -- operators are indeed obtained atomically:

    operator++() performs atomic pre-increment. Equivalent to return fetch_add(1) + 1;.
    operator++(int) performs atomic post-increment. Equivalent to return fetch_add(1);.
    operator--() performs atomic pre-decrement. Equivalent to return fetch_sub(1) - 1;.
    operator--(int) performs atomic post-decrement. Equivalent to return fetch_sub(1);.

But then there is all this talk about fences and memory_order that makes my head spin. So what do I know?

pmatilai · 2024-10-14T06:02:00Z

But then there is all this talk about fences and memory_order that makes my head spin. So what do I know?

I thought we basically agreed to learn the ropes with the keyring+keys first, before changing anything else. I seem to recall that even being your suggestion 😅

pmatilai · 2024-10-14T06:37:20Z

To me https://en.cppreference.com/w/cpp/atomic/atomic/operator_arith reads like the value returned from the ++ and -- operators are indeed obtained atomically:

Yes, that much is atomic. But the number of actual references can change between the returned information and the comparison, it's a separate instruction already. It's not like I eat this stuff for breakfast either, but AIUI handling this requires the atomic compare-exchange thing. Of the memory models I've no idea either.

ffesti · 2024-10-14T07:29:00Z

Yes, this was an issue in the previous code as it did the comparison first and then decided what to do. This here is different. If nrefs reaches 0 it is no longer allowed to change as no one should have a reference. So if the -- returns zero there is no longer a race going on. So we can do the comparison later and then free the instance in peace.

If the -- does not return 0 it will for some other call to *Free. So if we decide not to free here it is still ensured someone else will. So even if the value of nrefs changes to 0 between the change and the comparison everything works out in the end.

pmatilai · 2024-10-14T07:43:47Z

Oh... right 💡 Reaching zero is indeed special for our purposes here, that part I missed. Would be worth explicitly explaining in the commit message - it kinda touches on it in the "decision to free has been made" but it's not clear.

pmatilai · 2024-10-14T07:45:44Z

This stuff really puts your brain through a twister 😄

pmatilai · 2024-10-14T08:18:19Z

From the commit message

Technically the *Free() functions should *Link() the instance after the
ref count has reached 0 and the decision to free it has been made.
Otherwise the cleanup code linking and freeing to will lead to a
recusions. This is omitted here as it is not needed.

This is actually rather confusing, because a destructor should not be mucking around with reference counts. If the reference count reached zero and we decided that it's time to kill it, then we are the last thing that can be accessing it, no?

If the rpmts stuff really needs these gimmicks then that's a special case needing the explanation indeed, but drop it from elsewhere.

ffesti · 2024-10-14T08:26:37Z

Yes, this is confusing, still it is not wrong. If the destructor code itself e.g. uses an iterator to free up sub components that iterator may add and then free a reference. This leads to a loop that tries to free the item over and over again unless the destructor code holds a proper reference.

From an conceptual POV it is also correct that the destructor basically takes the last reference from the caller and holds it until it is finished with the clean up.

pmatilai · 2024-10-14T08:38:34Z

IFF the destructor does such things - it shouldn't really. A destructor certainly should not be allocating anything new.
That rpm may do so in someplaces doesn't make it right. I do remember rpmts cleanup being super funky though, so no surprises in that requiring special care - been there.

ffesti · 2024-10-14T08:50:27Z

I'd rather say the destrutors of most of our data types are super simple so they don't run into this issue. If you do something (serious) with an instance you need a valid reference - even if you are the destructor.

pmatilai · 2024-10-14T09:26:25Z

No, the point is that a destructor is not allowed to fail, so basically any destructor that needs to do allocations or references is broken. Like the rpmtsFree() path.

All I'm asking here is to drop the claim of linking being the right thing to do in any general case because it's not.

ffesti · 2024-10-14T09:29:57Z

Added a "reaching zero" sentence to the commit messages.

dmnks · 2024-10-14T16:53:15Z

If nrefs reaches 0 it is no longer allowed to change as no one should have a reference.

In other words, thread synchronization for the an object's lifecycle is still supposed to be handled explicitly by the programmer. This PR only addresses the nrefs counter so that it can't be racy. Is that correct?

dmnks · 2024-10-14T16:56:03Z

If that's the case, how does the change from the unlink functions to the inline --nrefs > 0 statements relate to making nrefs atomic? The cleanup functions themselves are still racy.

ffesti · 2024-10-14T17:02:10Z

Yes, this is just about nrefs and does not add thread safety for the content of the data types.

--nrefs decreases the nrefs and returns the new value atomically. As reaching 0 means no one has a reference it "can not" increase again. So checking the value later is fine. Note that we get a copy of the value returned from --nrefs so other changes to the nrefs attibute won't change the return value of the -- operator.

So during the -- operator the decision if made whether we need to free the instance. And we get a copy of that decision. So the *Free() function is not racy.

dmnks · 2024-10-15T07:05:12Z

My point is, if two threads are running e.g. headerFree() while nrefs is 1, then both threads will fail the condition --nrefs > 0, even if the nrefs decrement itself is atomic, and thus proceed to the cleanup.

I guess your point is that you can't have two threads accessing a header object while nrefs is 1 because that's a programming error. That's of course true, but then, isn't that the point of reference counting in the first place that it prevents such programming errors?

It's likely that I just misunderstand the purpose of nrefs altogether here, so take that with a grain of salt...

ffesti · 2024-10-15T07:46:14Z

Yes, this all only works if everyone is using *Link and *Free as pairs.

dmnks · 2024-10-15T07:49:05Z

Yes, this all only works if everyone is using *Link and *Free as pairs.

OK, paired with some background on "why reference counting" (in rpm) from @pmatilai now, somehow this makes a lot more sense.

dmnks · 2024-10-15T08:03:23Z

With that part clarified, the patches LGTM. I've made a tiny code style fixup so I guess I'll squash it and then merge.

dmnks · 2024-10-15T08:04:30Z

Oh, and @pmatilai and @ffesti thanks for the patience (in explaining the refcount stuff to me)! 😄

Throughout the code base reference counting is done in a thread-unsave way. Turn nrefs members into atomic_int so they are thread-save themselves. Make sure to only use ++ and -- operators and avoid non atomic combinations of reading and writing. As the nref can't change again after reaching zero this is thread-save even as we look at the result an instruction later. Drop all the static *Unlink functions. Technically the *Free() functions should *Link() the instance after the ref count has reached 0 and the decision to free it has been made. Otherwise the cleanup code linking and freeing to will lead to a recusions. This is omitted here as it is not needed. This does not make the data structures thread save on its own. But it gives a foundation on which further thread safety work can be based on.

Throughout the code base reference counting is done in a thread-unsave way. Turn nrefs into atomic_int so it is thread-save. Make sure to only use ++ and -- operators and avoid non atomic combinations of reading and writing. As the nref can't change again after reaching zero this is thread-save even as we look at the result an instruction later. Drop all the static *Unlink functions. Technically all the *Free() functions should *Link() the instance after the ref count has reached 0 and the decision to free it has been made. Otherwise the cleanup code linking and freeing to will lead to a recusions. Right now only rpmts needs that. So we only do that here. This does not make the data structures thread save on its own. But it gives a good foundation.

Throughout the code base reference counting is done in a thread-unsave way. Turn nrefs into atomic_int so they are thread-save themselves. As the nref can't change again after reaching zero this is thread-save even as we look at the result an instruction later. FD_t is special in that fdFree return the instance after free iff it still has not reached a ref count of 0. Unfortunately code in rpmShowProgress() relies on that. This does not make the data structures thread save on its own. But it gives a foundation on which data locking can be implemented.

These data structures use the content lock (aka mutex) to also protect the reference counter. This is wrong and complicates freeing interconnected data structures. This lock should only protect the content of the instance. Turn nrefs members into atomic_int so they are thread-save themselves. Make sure to only use ++ and -- operators and avoid non atomic combinations of reading and writing. As the nref can't change again after reaching zero this is thread-save even as we look at the result an instruction later. Make sure the data lock is not aquired for updating the nrefs members. Technically the *Free() functions should *Link() the instance after the ref count has reached 0 and the decision to free it has been made. Otherwise the cleanup code linking and freeing to will lead to a recusions. This is not neccessary here so it is omitted.

pmatilai · 2024-10-15T08:22:36Z

And so, rpm enters the atomic age

ffesti requested a review from a team as a code owner October 11, 2024 06:38

ffesti requested review from dmnks and removed request for a team October 11, 2024 06:38

ffesti mentioned this pull request Oct 11, 2024

Add a rpmKeyring iterator #3365

Closed

ffesti force-pushed the refs branch from 3c811ae to 1fde5cc Compare October 11, 2024 09:31

ffesti force-pushed the refs branch from 1fde5cc to 8863e0d Compare October 11, 2024 19:26

ffesti force-pushed the refs branch from 8863e0d to 81417e1 Compare October 14, 2024 09:29

ffesti added 3 commits October 15, 2024 10:09

dmnks force-pushed the refs branch from a933401 to 87aad15 Compare October 15, 2024 08:12

dmnks approved these changes Oct 15, 2024

View reviewed changes

dmnks merged commit a695776 into rpm-software-management:master Oct 15, 2024
1 check passed

ffesti deleted the refs branch October 15, 2024 08:31

ffesti mentioned this pull request Oct 15, 2024

rpmKeyringAddKey() should add subkeys too #3350

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn nrefs into atomic_int #3370

Turn nrefs into atomic_int #3370

ffesti commented Oct 11, 2024

pmatilai commented Oct 11, 2024 •

edited

Loading

pmatilai commented Oct 11, 2024

ffesti commented Oct 11, 2024

pmatilai commented Oct 14, 2024

pmatilai commented Oct 14, 2024

ffesti commented Oct 14, 2024 •

edited

Loading

pmatilai commented Oct 14, 2024

pmatilai commented Oct 14, 2024

pmatilai commented Oct 14, 2024 •

edited

Loading

ffesti commented Oct 14, 2024

pmatilai commented Oct 14, 2024 •

edited

Loading

ffesti commented Oct 14, 2024

pmatilai commented Oct 14, 2024

ffesti commented Oct 14, 2024

dmnks commented Oct 14, 2024

dmnks commented Oct 14, 2024 •

edited

Loading

ffesti commented Oct 14, 2024

dmnks commented Oct 15, 2024 •

edited

Loading

ffesti commented Oct 15, 2024

dmnks commented Oct 15, 2024

dmnks commented Oct 15, 2024

dmnks commented Oct 15, 2024

pmatilai commented Oct 15, 2024 •

edited

Loading

Turn nrefs into atomic_int #3370

Turn nrefs into atomic_int #3370

Conversation

ffesti commented Oct 11, 2024

pmatilai commented Oct 11, 2024 • edited Loading

pmatilai commented Oct 11, 2024

ffesti commented Oct 11, 2024

pmatilai commented Oct 14, 2024

pmatilai commented Oct 14, 2024

ffesti commented Oct 14, 2024 • edited Loading

pmatilai commented Oct 14, 2024

pmatilai commented Oct 14, 2024

pmatilai commented Oct 14, 2024 • edited Loading

ffesti commented Oct 14, 2024

pmatilai commented Oct 14, 2024 • edited Loading

ffesti commented Oct 14, 2024

pmatilai commented Oct 14, 2024

ffesti commented Oct 14, 2024

dmnks commented Oct 14, 2024

dmnks commented Oct 14, 2024 • edited Loading

ffesti commented Oct 14, 2024

dmnks commented Oct 15, 2024 • edited Loading

ffesti commented Oct 15, 2024

dmnks commented Oct 15, 2024

dmnks commented Oct 15, 2024

dmnks commented Oct 15, 2024

pmatilai commented Oct 15, 2024 • edited Loading

pmatilai commented Oct 11, 2024 •

edited

Loading

ffesti commented Oct 14, 2024 •

edited

Loading

pmatilai commented Oct 14, 2024 •

edited

Loading

pmatilai commented Oct 14, 2024 •

edited

Loading

dmnks commented Oct 14, 2024 •

edited

Loading

dmnks commented Oct 15, 2024 •

edited

Loading

pmatilai commented Oct 15, 2024 •

edited

Loading