Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge EnumSymbol and EnumValue into Enum field #2044

Merged
merged 3 commits into from
Sep 15, 2022
Merged

Merge EnumSymbol and EnumValue into Enum field #2044

merged 3 commits into from
Sep 15, 2022

Conversation

lafrech
Copy link
Member

@lafrech lafrech commented Sep 4, 2022

On second thought, this fits more with our fields naming rationale. Generally, the field name represents the type in object world and the serialization type is defined at init (with an argument such as format, etc.).

I'd like to merge this before publishing 3.18. If this needs more thought and we want to ship a bugfix before, we can revert the merge of #2017.

@lafrech lafrech mentioned this pull request Sep 4, 2022
@sloria
Copy link
Member

sloria commented Sep 7, 2022

i agree that Enum fits our field naming convention better. would it make more sense to rename EnumSymbol->Enum and keep EnumValue as is?

i'm lukewarm on merging the two together and using a parameter to determine the internal type, as this makes it more difficult to add correct typings. wdyt?

@deckar01
Copy link
Member

deckar01 commented Sep 8, 2022

The only difference between these two use cases is how the members are accessed from the enum, which seems congruent with the data_key and attribute as constructor arguments. Switching the behavior from symbol to value based on the presence of the field argument seems cryptic. I can understand the desire to make one of the use cases the default to simplify the interface. I am not sure which behavior should be the default.

foo = Enum()
bar = Enum(from_symbol=True)

@lafrech
Copy link
Member Author

lafrech commented Sep 8, 2022

Thanks for the feedback.

The only difference between these two use cases is how the members are accessed from the enum, which seems congruent with the data_key and attribute as constructor arguments.

Yes, exactly, the object is always the same (an Enum member). Hence the will to call the field Enum. This plus the fact that is "looks" simpler (not really a great argument). A non-great argument the other way is that the field name may collide with the Python Enum class but that shouldn't be an issue.

i'm lukewarm on merging the two together and using a parameter to determine the internal type, as this makes it more difficult to add correct typings. wdyt?

In object world it is always an Enum member, so no problem. In serialized world, it is either a string (symbol) or whatever the field serializes to (value). I don't see this as an issue. Splitting the two fields would guarantee the output is a string in the symbol case but leave it open in the value case, IIUC. I see little benefit to the split in terms of typing and I wouldn't trade simplicity for that. But maybe I'm missing something.

Switching the behavior from symbol to value based on the presence of the field argument seems cryptic.

I understand that. OTOH, the parameter seems redundant with the field and accepting both means we'd need to manage all cases.

  • as value + field => OK
  • as symbol + no field => OK
  • as value + no field => error or default to String (I'm fine with String as default, both in terms of usage and implementation)
  • as symbol + field => error or actually try to serialize symbol name (string) with the field, which sounds wrong in most cases, although it might allow twisted stuff like passing a String field with validators to apply between symbol string deserialization and Enum member instantiation (I don't see the point of this).

I can understand the desire to make one of the use cases the default to simplify the interface. I am not sure which behavior should be the default.

From my perspective (usage), default would be symbol, but YMMV. Symbol is generally a snake-case Python variable name which suits an API (some like them camel-case but oh well), while value may be anything. Typically, I'll have

class Enum:
    item_one = "The first item"
    item_two = "The second item"

Although both are strings, I serialize as symbol (code) rather than value (human readable string).

I can add an as_value parameter (defaulting to False) but I'm not sure how to deal with odd field / as_value combinations. as_symbol + field defaulting to String is fine, nice actually. as_symbol + field not String sounds definitely wrong and should probably error.

Or we state explicitly that field is only used if as_value is set. We may even warn or error otherwise.

@deckar01
Copy link
Member

deckar01 commented Sep 8, 2022

Enum allows arbitrary value types like lists and dicts for member values and the types don't have to be homogeneous. The default could be Field.

@lafrech
Copy link
Member Author

lafrech commented Sep 8, 2022

Yeah, I wondered if we should support (i.e. test) Enum(Nested(...)), Enum(List(Nested)) and such, and figured it was nice already to support simple types, but we can add tests with more complicated ones. Still I'd rather support simple types only, maybe documenting the limitation, than drop the whole feature if advanced ones can't be supported.

String is a good default for most cases in practice but won't work for advanced ones. I'm afraid Field will be useless most of the time, shouldn't we make the field argument mandatory when by_value is set, then?

@deckar01
Copy link
Member

deckar01 commented Sep 8, 2022

As far as I can tell the current code handles exotic enums fine. I am suggesting that strict value type validation does not need to be the default, because enum membership validation works regardless of value type. The error message providing the member choices seems like a more useful default than an error message telling me the expected value type. Actually, even if the value field is specialized, it might be more useful show the member choices error message.

@lafrech
Copy link
Member Author

lafrech commented Sep 8, 2022

I just pushed an updated version introducing by_value argument. If True, field defaults to Field. Otherwise an exception is raised if passing a field.

Tests are incomplete (I didn't test the exception and the default value of field) but at least it gives us an idea of what it would look like.

I'm not really fan of the signature, but I'm afraid it is either this or the two classes.

    def __init__(
        self,
        enum: type[EnumType],
        by_value: bool = False,
        field: Field | type | None = None,
        **kwargs,
    ):
# Call using kwargs
field = fields.Enum(HairColorEnum, by_value=True, field=fields.String)

# Call using args. I don't like True arguments in function calls. Not explicit enough.
# (Yet, I didn't prevent it in the signature by making it kwargs only.)
field = fields.Enum(HairColorEnum, True, fields.String)

# Field would be explicit enough to be passed as positional but we can't do this.
# And we don't want to put field first because one may pass only by_value and rely on default Field.
field = fields.Enum(HairColorEnum, by_value=True, fields.String)

@deckar01
Copy link
Member

deckar01 commented Sep 8, 2022

Switching the behavior from symbol to value based on the presence of the field argument seems cryptic.

My main concern was that field did not imply value. What if by_value was keyword only and could be a boolean or a field?

    def __init__(
        self,
        enum: type[EnumType],
        *,
        by_value: bool | Field | type = False,
        **kwargs,
    ):
# Default: by_value=False, field=String
field = fields.Enum(HairColorEnum)

# Convenience: field=Field
field = fields.Enum(HairColorEnum, by_value=True)

# Custom: field=by_value
field = fields.Enum(HairColorEnum, by_value=fields.Int)

@lafrech
Copy link
Member Author

lafrech commented Sep 8, 2022

This is technically better as it makes one less argument and avoids the forbidden case of by_symbol + field.

I'm not a huge fan of arguments that can be multiple types (strings or list, boolean or field) but I still think it is better.

Yet, True is just a shortcut for fields.Field and I while I understand the intent, I believe this convenience will almost never be used because in practice users will want to pass a field, most of the time String or Integer, right? So is it worth keeping the multiple types argument if it (almost) always receives a field anyway?

I also understand the argument about the name conveying the meaning, and by_value expecting only a field (not a boolean) would sound weird, while field implying by value is not obvious.

OK, I just tested and now I get your point. If I set Field instead of String or Integer, it works the same. The purpose of using those fields generally is to validate the input (plus marginal stuff like ensure type), which is useless in fields.Enum because enum.Enum does it just as well if not better (better error message). So your point is that in most cases people may leave the field empty and rely on enum.Enum to do the job. Now I agree.

Note that there may be slight differences. For instance, for enum integer values, Field will let enum.Enum choke on integers passed as strings, while Integer will cast them (it is not strict by default). I don't think that's a problem, though.

OK, lemme try to do that.

@lafrech lafrech force-pushed the rework_enum branch 2 times, most recently from eb1b77d to 8359b1f Compare September 9, 2022 06:32
@lafrech
Copy link
Member Author

lafrech commented Sep 9, 2022

Thanks for the help. I like current implementation.

I modified the serialization of value and choices to always use the String field when serializing as name, for consistency, although this may be useless in practice and therefore not the most efficient. The perf impact should be minimal, though.

Copy link
Member

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me. just a few small things

src/marshmallow/fields.py Outdated Show resolved Hide resolved
src/marshmallow/fields.py Outdated Show resolved Hide resolved
src/marshmallow/fields.py Outdated Show resolved Hide resolved
src/marshmallow/fields.py Outdated Show resolved Hide resolved
@lafrech
Copy link
Member Author

lafrech commented Sep 11, 2022

Thanks for the review. Good catches.

Fixed!

@lafrech
Copy link
Member Author

lafrech commented Sep 11, 2022

I'll merge, update CHANGELOG and release soonish (this week).

@lafrech
Copy link
Member Author

lafrech commented Sep 15, 2022

Good! I just fixed a typo and merged. Now releasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants