Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IUtf8SpanFormattable and IUtf8SpanParsable #81500

Open
tannergooding opened this issue Feb 1, 2023 · 15 comments
Open

IUtf8SpanFormattable and IUtf8SpanParsable #81500

tannergooding opened this issue Feb 1, 2023 · 15 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Feb 1, 2023

Background and Motivation

We currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8.

With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality.

As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support Span today and as we do not have a corresponding Utf8String type that would make exposing IUtf8Formattable or IUtf8Parsable viable today. We could express those as byte[], but that is "less ideal" and blocks us from supporting any future utf8 string type.

Proposed API

namespace System;

public interface IUtf8SpanFormattable
{
    bool TryFormat(Span<byte> destination, out int bytesWritten, ReadOnlySpan<byte> format, IFormatProvider? provider);
}

public interface IUtf8SpanParsable<TSelf>
    where TSelf : IUtf8SpanParsable<TSelf>?
{
    static abstract TSelf Parse(ReadOnlySpan<byte> s, IFormatProvider? provider);

    static abstract bool TryParse(ReadOnlySpan<byte> s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}

Initial types that will implement the interface

namespace System
{
    public partial struct Byte : IUtf8SpanFormattable, IUtf8SpanParsable<byte>;
    public partial struct Char : IUtf8SpanFormattable, IUtf8SpanParsable<char>;
    public partial struct Decimal : IUtf8SpanFormattable, IUtf8SpanParsable<decimal>;
    public partial struct Double : IUtf8SpanFormattable, IUtf8SpanParsable<double>;
    public partial struct Half : IUtf8SpanFormattable, IUtf8SpanParsable<Half>;
    public partial struct Int16 : IUtf8SpanFormattable, IUtf8SpanParsable<short>;
    public partial struct Int32 : IUtf8SpanFormattable, IUtf8SpanParsable<int>;
    public partial struct Int64 : IUtf8SpanFormattable, IUtf8SpanParsable<long>;
    public partial struct Int128 : IUtf8SpanFormattable, IUtf8SpanParsable<Int128>;
    public partial struct IntPtr : IUtf8SpanFormattable, IUtf8SpanParsable<nint>;
    public partial struct SByte : IUtf8SpanFormattable, IUtf8SpanParsable<sbyte>;
    public partial struct Single : IUtf8SpanFormattable, IUtf8SpanParsable<float>;
    public partial struct UInt16 : IUtf8SpanFormattable, IUtf8SpanParsable<ushort>;
    public partial struct UInt32 : IUtf8SpanFormattable, IUtf8SpanParsable<uint>;
    public partial struct UInt64 : IUtf8SpanFormattable, IUtf8SpanParsable<ulong>;
    public partial struct UInt128 : IUtf8SpanFormattable, IUtf8SpanParsable<UInt128>;
    public partial struct UIntPtr : IUtf8SpanFormattable, IUtf8SpanParsable<nuint>;
    
    public partial struct DateOnly : IUtf8SpanFormattable, IUtf8SpanParsable<DateOnly>;
    public partial struct DateTime : IUtf8SpanFormattable, IUtf8SpanParsable<DateTime>;
    public partial struct DateTimeOffset : IUtf8SpanFormattable, IUtf8SpanParsable<DateTimeOffset>;
    public partial struct Guid : IUtf8SpanFormattable, IUtf8SpanParsable<Guid>;
    public partial struct TimeOnly : IUtf8SpanFormattable, IUtf8SpanParsable<TimeOnly>;
    public partial struct TimeSpan : IUtf8SpanFormattable, IUtf8SpanParsable<TimeSpan>;
}

namespace System.Numerics
{
    public partial struct Complex : IUtf8SpanFormattable, IUtf8SpanParsable<Complex>;
    public partial struct BigInteger : IUtf8SpanFormattable, IUtf8SpanParsable<BigInteger>;
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat : IUtf8SpanFormattable, IUtf8SpanParsable<NFloat>;
}

System.Enum, System.Rune, and System.Version all implement ISpanFormattable today. They could optionally implement IUtf8SpanFormattable as well.

We should ideally have System.Numerics.INumberBase<TSelf> implement both IUtf8SpanFormattable and IUtf8SpanParsable<TSelf>. Doing this would require a DIM that defers to the UTF-16 variant.

Additional Considerations

It may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed.

These APIs operate like ISpanFormattable and ISpanParsable and not like Utf8Formatter or Utf8Parser. That is, they fail if they encounter unrecognized or unsupported data where-as the latter instead treat it as effectively "end of data to parse". There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.

This doesn't account for number parsing which would likely entail extending INumberBase<TSelf> with new UTF-8 APIs as well. If we expose such APIs, we'd also extend INumberBase<TSelf with the following methods (which would be DIM and defer to the UTF-16 variants):

static virtual TSelf Parse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider);
static virtual bool TryParse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result);

Should we take ReadOnlySpan<byte> format or string format. There are pros/cons to each approach.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 1, 2023
@ghost
Copy link

ghost commented Feb 1, 2023

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

We currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8.

With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality.

As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support Span today and as we do not have a corresponding Utf8String type that would make exposing IUtf8Formattable or IUtf8Parsable viable today. We could express those as byte[], but that is "less ideal" and blocks us from supporting any future utf8 string type.

Proposed API

namespace System;

public interface IUtf8SpanFormattable : IUtf8Formattable
{
	bool TryFormat(Span<byte> destination, out int bytesWritten, ReadOnlySpan<byte> format, IFormatProvider? provider);
}

public interface IUtf8SpanParsable<TSelf> : IUtf8Parsable<TSelf>
	where TSelf : ISpanParsable<TSelf>?
{
	static abstract TSelf Parse(ReadOnlySpan<byte> s, IFormatProvider? provider);

	static abstract bool TryParse(ReadOnlySpan<byte> s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}

Additional Considerations

It may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed.

These APIs operate like ISpanFormattable and ISpanParsable and not like Utf8Formatter or Utf8Parser. That is, they fail if they encounter unrecognized or unsupported data where-as the latter instead treat it as effectively "end of data to parse".

There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.

Author: tannergooding
Assignees: -
Labels:

area-System.Memory, untriaged

Milestone: -

@tannergooding tannergooding added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime and removed area-System.Memory untriaged New issue has not been triaged by the area owner labels Feb 1, 2023
@ghost
Copy link

ghost commented Feb 1, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

We currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8.

With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality.

As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support Span today and as we do not have a corresponding Utf8String type that would make exposing IUtf8Formattable or IUtf8Parsable viable today. We could express those as byte[], but that is "less ideal" and blocks us from supporting any future utf8 string type.

Proposed API

namespace System;

public interface IUtf8SpanFormattable : IUtf8Formattable
{
	bool TryFormat(Span<byte> destination, out int bytesWritten, ReadOnlySpan<byte> format, IFormatProvider? provider);
}

public interface IUtf8SpanParsable<TSelf> : IUtf8Parsable<TSelf>
	where TSelf : ISpanParsable<TSelf>?
{
	static abstract TSelf Parse(ReadOnlySpan<byte> s, IFormatProvider? provider);

	static abstract bool TryParse(ReadOnlySpan<byte> s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}

Additional Considerations

It may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed.

These APIs operate like ISpanFormattable and ISpanParsable and not like Utf8Formatter or Utf8Parser. That is, they fail if they encounter unrecognized or unsupported data where-as the latter instead treat it as effectively "end of data to parse".

There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.

Author: tannergooding
Assignees: -
Labels:

api-suggestion, area-System.Runtime

Milestone: -

@KTSnowy
Copy link

KTSnowy commented Feb 1, 2023

Hi @tannergooding, would the new Decimal128 type from #81376 be able to support this API?

You mentioned that this API proposal doesn't account for number parsing, so I'm assuming that more work is needed for this to be compatible with the new decimal types, right?

Is there anything I can do to help with this?

@tannergooding
Copy link
Member Author

It specifically doesn't account for the overloads that take NumberFormat, those would be a separate consideration we make as part of this PR review or in a separate one.

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Feb 3, 2023
@tannergooding tannergooding added this to the 8.0.0 milestone Feb 3, 2023
@terrajobst
Copy link
Member

terrajobst commented Mar 16, 2023

Video

  • We should add utf8 in the parameter names
  • We should use ReadOnlySpan<char> as the format to aid with composition with compiler features around string formatting
namespace System;

public interface IUtf8SpanFormattable
{
    bool TryFormat(Span<byte> utf8Destination, out int bytesWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
}

public interface IUtf8SpanParsable<TSelf>
    where TSelf : IUtf8SpanParsable<TSelf>?
{
    static abstract TSelf Parse(ReadOnlySpan<byte> utf8, IFormatProvider? provider);
    static abstract bool TryParse(ReadOnlySpan<byte> utf8, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}
namespace System.Numerics;

public interface INumberBase<TSelf>
{
    static virtual TSelf Parse(ReadOnlySpan<byte> utf8Text, NumberStyles style, IFormatProvider? provider);
    static virtual bool TryParse(ReadOnlySpan<byte> utf8Text, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result);
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Mar 16, 2023
@Sergio0694
Copy link
Contributor

Is the parameter name here meant to just be "utf8" or shouldn't it be "utf8Text" like in INumberBase<TSelf>? 🤔

public interface IUtf8SpanParsable<TSelf>
    where TSelf : IUtf8SpanParsable<TSelf>?
{
    static abstract TSelf Parse(ReadOnlySpan<byte> utf8, IFormatProvider? provider);
    static abstract bool TryParse(ReadOnlySpan<byte> utf8, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}

@davidfowl
Copy link
Member

davidfowl commented Mar 17, 2023

Does this make Utf8Formatter and Utf8Parser obsolete?

cc @DamianEdwards

@stephentoub
Copy link
Member

I expect there will be little need for Utf8Formatter.

Utf8Parser diverged from the standard number parsing behavior. When it encounters something that's not part of the number and stops parsing, it returns what it has so far rather than failing. Analogous to StartsWith rather than Equals. That behavior is sometimes what you want, so it still has use, and I expect at some point we'll want to actually add the char equivalent, though I think it more likely we'd do so via NumberStyles so that it's integrated with generic math... at which point Utf8Parser would also no longer have much value.

@mellinoe
Copy link
Contributor

Suggestion: include the System.Numerics vector, matrix, and quaternion types as well. From a gamedev perspective it would be very nice to format these types directly to a Utf8 buffer without allocating.

@tannergooding
Copy link
Member Author

We'll plan on expanding the list of types as necessary. We're just looking at covering the most core types in the first pass.

Please feel free to open API proposals for other types as appropriate.

@stephentoub
Copy link
Member

stephentoub commented Apr 8, 2023

Implementation progress...

Interfaces:

IUtf8SpanFormattable implementations:

IUtf8SpanParsable implementations:

  • BigInteger : IUtf8SpanParsable
  • Byte : IUtf8SpanParsable
  • Char : IUtf8SpanParsable
  • Complex : IUtf8SpanParsable
  • DateOnly : IUtf8SpanParsable
  • DateTime : IUtf8SpanParsable
  • DateTimeOffset : IUtf8SpanParsable
  • Decimal : IUtf8SpanParsable
  • Double : IUtf8SpanParsable
  • Enum : IUtf8SpanParsable
  • Guid : IUtf8SpanParsable
  • Half : IUtf8SpanParsable
  • Int16: IUtf8SpanParsable
  • Int32 : IUtf8SpanParsable
  • Int64 : IUtf8SpanParsable
  • Int128 : IUtf8SpanParsable
  • IntPtr : IUtf8SpanParsable
  • IPAddress : IUtf8SpanParsable
  • IPNetwork : IUtf8SpanParsable
  • NFloat : IUtf8SpanParsable
  • Rune : IUtf8SpanParsable
  • SByte : IUtf8SpanParsable
  • Single : IUtf8SpanParsable
  • TimeOnly : IUtf8SpanParsable
  • TimeSpan : IUtf8SpanParsable
  • UInt16 : IUtf8SpanParsable
  • UInt32 : IUtf8SpanParsable
  • UInt64 : IUtf8SpanParsable
  • UInt128 : IUtf8SpanParsable
  • UIntPtr : IUtf8SpanParsable
  • Version : IUtf8SpanParsable

@stephentoub stephentoub self-assigned this Apr 8, 2023
@stephentoub
Copy link
Member

This issue covers adding UTF8 to things already implementing ISpanFormattable. Please open separate issues for other types. #83201 exists for PhysicalAddress.

@bartonjs bartonjs added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-approved API was approved in API review, it can be implemented labels Apr 13, 2023
@bartonjs
Copy link
Member

bartonjs commented Apr 13, 2023

Video

This came up in review to discuss whether the types implementing this interface should implement the methods implicitly (public on the type) or explicitly (requiring the interface cast/coercion).

The answer was "match what we did for ISpanParsable/ISpanFormattable", which seems to be implicit everywhere except System.Char (explicit there).

@tannergooding
Copy link
Member Author

We landed IUtf8SpanFormattable and IUtf8Parsable as well as implementing both on the primitive numeric types.

There are some types that didn't get support for both interfaces which we'll hopefully land early in .NET 9

@tannergooding
Copy link
Member Author

Pending work here

IUtf8SpanFormattable implementations:

  • BigInteger : IUtf8SpanFormattable, indirectly supported in generic contexts via INumberBase DIM
  • Enum : IUtf8SpanFormattable

IUtf8SpanParsable implementations:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

No branches or pull requests

8 participants