Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convention, deinit doesn't invalidate #6322

Closed
marler8997 opened this issue Sep 11, 2020 · 16 comments
Closed

Convention, deinit doesn't invalidate #6322

marler8997 opened this issue Sep 11, 2020 · 16 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@marler8997
Copy link
Contributor

marler8997 commented Sep 11, 2020

This is a variation of #6299 that @andrewrk suggested be a separate proposal.

Proposal

This issue proposes that by convention, deinit never invalidates memory and always takes self as a value. So instead of this:

pub fn deinit(self: *@This()) void {
    // release resources
    self.* = undefined;
}

we use this:

pub fn deinit(self: @This()) void {
    // release resources
}

Furthermore, in the cases where invalidation is more than just self.* = undefined, it's proposed that a type can define a function named invalidate to contain type-specific invalidation logic, i.e.

pub fn invalidate(self: *@This()) void {
    self.file_open = false;
}

Why?

The reasoning for this convention appeals to the Single Responsibility Principle. Currently in the standard library there is a convention that deinit perform 2 operations:

  1. release resources
  2. invalidate memory
pub fn deinit(self: *@This()) void {
    self.foo.deinit(); // example of releasing resources
    // ...
    self.* = undefined; // invalidate memory
}

However, there are use cases where code may want to release resources, but not invalidate memory. Since deinit always does both, programs may not be able to use the struct in the way that makes sense in their situation. For example, this convention prevents programs calling deinit on const instances. Say we have a Page type that wraps a page address:

const Page = struct {
    ptr: *u8,
    pub fn deinit(self: *@This()) void {
        std.os.unmap(self.ptr);
        self.* = undefined;
    }
};

Because deinit invalidates memory, it must take a mutable reference. This prevents a program from being able to call deinit on a const version of Page:

const page = try Page.init();
defer page.deinit(); // compile error

For nested types, this can also cause the same memory to be invalidated multiple times when a type calls deinit on its members but also invalidate its own memory with self.* = undefined. Foo contains Bar, which contains Baz, which contains Buz, all of which call deinit on all their members and invalidate themselves, this would mean Foo would invalidate Buz 4 times.

You will also find that throughout the standard library, some types choose to invalidate memory inside deinit while other do not. These variations are likely caused by differing programmer sensibilities and different cost/benefit analyses for each case. The issue here is that the type definition itself doesn't have the full picture, only the user will. By establishing that types never force invalidation in deinit, this puts the "responsibility" for deciding whether to invalidate on the party with the all the information, the user. This is the same reasoning that makes Zig's allocators work so well, it puts the responsibility of deciding an allocation strategy on the user, not the functions that only need to allocate memory.

Note that some types in the standard library have already addressed these issues. For example HashMap defines the deallocate function, StreamServer defines the close function, which both release resources without invalidating. This proposal aims to help developers do the "right" thing by default, by allowing the user to make this decision insteading of forcing one on them. By establishing that deinit should always take self as a value type, it keeps invalidation and resource deallocation separate.

What about Invalidation?

In the common case that a user wants to release resources and invalidate memory, a common function can be implemented to do both, i.e.

pub fn destroy(o: var) void {
    o.deinit();
    if (@hasDecl(o, "invalidate")) {
        o.invalidate();
    } else {
        o.* = undefined;
    }
}

Usage:

var foo = Foo.init();
defer std.destroy(&foo);

Some may not like that this function is no longer a member function. If this is the case, we could also define it like this:

pub fn defineDestroy(comptime T: type) type {
    return struct {
        pub fn destroy(self: *T) void {
            self.deinit();
            if (@hasDecl(o, "invalidate")) {
                self.invalidate();
            } else {
                self.* = undefined;
            }
        }
    };
}


pub const Foo = struct {
    pub usingnamespace defineDestroy(@This());
};

Usage:

var foo = Foo.init();
defer foo.destroy();
@Tetralux
Copy link
Contributor

It's worth noting that init acquires those resources, and thus, you might expect deinit to release them.
I'd also consider destroy over deinitAndInvalidate.

@marler8997
Copy link
Contributor Author

@Tetralux I like the term destroy. I'm going to ammend the proposal to use that instead.

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 12, 2020
@Vexu Vexu added this to the 0.8.0 milestone Sep 12, 2020
@eriksik2
Copy link

should a const reference to a resource be able to release it?

@marler8997
Copy link
Contributor Author

marler8997 commented Sep 14, 2020

should a const reference to a resource be able to release it?

This proposal suggests that all deinit functions (that release resources) take self by value, which is functionally the same as a const reference. The only real difference between the two is that a const reference forces the compiler to pass it by reference, whereas passing it by value allows the compiler to decide whether it should be passed by value or reference under the hood.

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
ibokuri pushed a commit to getty-zig/json that referenced this issue Oct 23, 2021
This follows the convention described here: ziglang/zig#6322.
@andrewrk
Copy link
Member

andrewrk commented Nov 23, 2021

I'm happy with deinit doing invalidation. There can be otherly named methods that do things such as reset or clear.

@andrewrk andrewrk added the standard library This issue involves writing Zig code for the standard library. label Nov 23, 2021
@marler8997
Copy link
Contributor Author

marler8997 commented Nov 23, 2021

I'm happy with deinit doing invalidation.

There are already types that have a way to free resources without invalidation. For example, HashMap has deallocate which will release resources without invalidating. Why does HashMap support this case but you've decided not to support this for other types? If you're happy without this separation, are you happy with removing it from HashMap?

@andrewrk
Copy link
Member

HashMap has

        pub fn deinit(self: *Self, allocator: *Allocator) void {
            self.deallocate(allocator);
            self.* = undefined;
        }

so it is doing the convention that deinit invalidates. I'm confused what you're trying to communicate.

@marler8997
Copy link
Contributor Author

marler8997 commented Nov 23, 2021

The proposal is to establish a convention that allows you to "cleanup type resources" without invalidating them. HashMap has done this by adding a deallocate function. If everything had a deallocate function like HashMap then this proposal would be satisfied. Are you ok with all the types in std using this interface? If not, then why does HashMap provide a way to do this while others do not?

@andrewrk
Copy link
Member

HashMap does not have a deallocate function. It is private.

Also the deallocate function does invalidate the memory, it just does not additionally have the safety of setting to undefined. If you tried to use the memory after deallocate, it would be just as broken as if the memory had been set to undefined, just slightly harder to debug.

@marler8997
Copy link
Contributor Author

HashMap does not have a deallocate function. It is private.

Oh shit I didn't realize that. Well then at least you're consistent, wrong in my opinion, but consistent :)

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 23, 2021

From an api perspective, a freeing function must do one of two things:

  1. free memory and leave the object in an unusable state (deinit).
  2. free memory and leave the object in a reusable state (clear).

From a safety perspective, a freeing function of the first form should always leave pointers as undefined. There is no use for a freeing function without this safety on a public api, except as an optimization. SRP doesn't really apply.
As an optimization, this isn't a particularly good one. It saves repeated writing of a few words to cached memory when in debug mode, which is pretty cheap. If this is important, libraries can fix it pretty easily:

const MyType = struct {
    x: std.ArrayList(u32),
    y: std.AutoHashMap(u32, u32),
    z: u32,
    pub fn deinit(self: *@This()) void {
        self.x.deinit();
        self.y.deinit();
        self.z = undefined;
    }
};

Additionally, writing undefined communicates additional information that enables optimizations. So removing the assignment may be a pessimization.

The only remaining potential downside of this api is that it requires a pointer to mutable memory. However, this is not really a problem in practice.

  • Data structures which have a meaningful deinit() must be using some sort of runtime init(), so the underlying memory will not be a static constant.
  • Data structures which are intended to be used as constant data (like your Page example) do not need to implement this api this way.
  • In the extreme case, const can be cast away or the value can be copied into mutable memory.

In the specific cases we're looking at, like ArrayList and HashMap, an immutable one is useless because you can't modify the len field. So there's no need for the deinitializer to operate on const data.

@marler8997
Copy link
Contributor Author

Thanks for the very thorough analysis/explanation here @SpexGuy. I concede.

@marler8997
Copy link
Contributor Author

There's one more thing I thought of. As an example, I use "const file" alot. Currently std.fs.File doesn't invalidate (but is supposed to change) so I will have to use "var file" from now on, do you think that could have some downsides in optimization? I've had many similar cases that will no longer be able to use const if I start invalidating in all my of deinit functions. How sure are we that this is the right direction?

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 23, 2021

This depends a lot on what const x = T.init(); actually means for structs. Because of RLS, x is definitely mutable for part of its lifetime. Whether x has immutable provenance is kind of up for debate at this point, especially once the result location becomes accessible through #2765. But even assuming it does, in the case of std.fs.file that really doesn't mean much. That provenance is probably not invalidated often, and if this is really important for a specific optimization you can copy the file into a const temporary, use that, and then deinit the original. So I think the safety wins out here in my mind. I'm not really worried that this is going to significantly inhibit the optimizer in meaningful ways.

That said, I think we can take this on a case by case basis. If std.fs.File is intended to be used as a const, then deinit shouldn't invalidate. Whether a type is meant to be stored in const is part of the api design of that type, so deinit can be adjusted accordingly.

To be clear, the decision here isn't "everything should invalidate", it's just "invalidation should be preferred, but decided on a case by case basis".

@daurnimator
Copy link
Contributor

If std.fs.File is intended to be used as a const, then deinit shouldn't invalidate. Whether a type is meant to be stored in const is part of the api design of that type, so deinit can be adjusted accordingly.

To be clear, the decision here isn't "everything should invalidate", it's just "invalidation should be preferred, but decided on a case by case basis".

The problem I've found is that non-constness proliferates.
--> I'll add a new field to my struct (e.g. a AutoHashMap), and because it's .deinit() is non-const, now my .deinit() needs to be non-const; and users of my struct all need to convert to var.. which potentially has API issues for them.
I think this happens often enough that we should never invalidate.


Off the cuff idea: what if assigning undefined was allowed to const fields? It wouldn't actually write 0xaaaa, but it could still result in e.g. valgrind annotations that the memory area is undefined.

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 23, 2021

The problem I've found is that non-constness proliferates.

Yeah, this phenomenon is sometimes uncharitably called "const poisoning", and it's one reason that extending #224 to pointers is a really bad idea. However, unlike languages like C++ or D where const is transitive through pointers/structures, in Zig it only applies to specific memory regions. So these changes can only proliferate through value inclusions, which limits their scope.

But also, the only real objection I'm seeing here is about backwards compatibility, which is not really a reason yet for std lib code. None of it is stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

7 participants