-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Simplify integers and floats #8111
Comments
A part of this RFC includes an older one #6626, about making integer type depend on the platform. |
I'm perfectly happy with the end result of this change, but I wonder how best to stage this change into the language. It doesn't seem like there's a way to incrementally apply this, the only way is to have a single release break all existing programs and libraries. Which I'm fine with, since there doesn't seem to be an alternative. |
Yeah... it's even hard to develop because In any case I think this can be delayed to the future, after we get parallelism and windows. But it's something I would definitely like to have before 1.0 because it's a big change. |
It's curious that I'm also repeating myself (#6626) but I'm glad what I wrote here is what we ended up concluding there (though I don't know why I said it's impossible to do so). |
I'll happily welcome the change. I grew to really dislike Swift also has distinct Yet, I can't find a language with architecture dependent floats. Swift has |
Good catch! Yeah, I think for float we should have |
If this is going to be a huge breaking change surely it makes sense to get this out the way as soon as possible, not delay it until the language has even more users. First, we could move to free up the Could probably introduce the change behind a flag at the same time as the aliases, so that libraries can test for compliance, but the same code still compiles without the flag. |
I like that idea! Just note that:
So I guess the first thing for me will be to try this out and see how it works. |
One thing to think about: when you want to map an integer to a database you usually want Another problem: the literal |
Those are great counter-examples of having architecture-specific Literals If I use something higher than Int32::MAX then I actually expect an Int64, not an Int, and it just happens to work on 64-bit targets. Having a compile time error for 32-bit targets seems appropriate? It means Crystal can't infer 2147483648 as an Int64 or 9223372036854775808 as an Int128, and we'll have to manually type them (oh no), but does it happen much? maybe some explicitness ain't that bad? Database I believe database columns should be explicit, that is either Int32 or Int64, but if integers are usually an |
Another point to coincider is to separate the notion of base integers and native integers. Currently, there are some operations and overloads that work only with native, but since The current alias to a union for primitives works on overloads but not on definitions in the base class. |
I think that we could make all the std work with That seems kind of bad but if |
How will that work when math shard A uses Int (now fixed at Int32), math shard B uses Int64 and serialized formats (for example protobuf) are a mix of Int8|16|32|64, UInt8|16|32|64? Will I need to manually convert between types every time a variable crosses a function boundary? Where does over/underflow checking happen? Do I have need to check manually with each conversion? |
I think that's also a problem right now with
The answer is |
Looks like I can pass any type of
Output:
|
@didactic-drunk alias Int = Int8 | Int16 | Int32 | Int64 We can rename the alias |
Based on my example doesn't that mean math functions (or most functions) should use A major complaint when working in physics with c++ is They probably should have used a template but that's beyond them. They tend to use the default. If Why?
|
@didactic-drunk Names are exchangeable. I won't go into details about pro and contra of which name. The problem isn't names. It's default behaviour. A union type can't be used as type of an instance variable. But some type must be specified everywhere you need to store integers. Currently, we advocate to use Even your non-programmer algorithm writers need to pick data types for their integers. And it can't always be a union type, no matter whether it's called |
+1, (in my opinion as a novice to Crystal) would be a good change. I just asked this question how to hack crystal to use Int and Float everywhere and got link to that issue. Clean, readable, compact code is one of the key feature of Ruby. Hard to justify |
+1 for making them the same on all platforms. Less confusion porting (and debugging somebody else's code). If they want to interface with C...maybe create a new type called "NativeInt" or something, that can be used as the parameter? |
I attempted to ask here: https://forum.crystal-lang.org/t/int32-and-float64-why-the-defaults/1797 why are Int32 and float64 the defaults? Curious, since one is "32" and the other "64", thanks :) |
I think it makes sense to have this before 1.0 |
@cyangle This is not going to happen before 1.0. No other major changes are expected before 1.0 |
Really? I think this and #8872 are just as important as overflow checks. It changes everything about numbers in the language... |
The thing is that @waj just showed me a couple of benchmarks. For example this: require "benchmark"
puts 1
a = Array(Int32).new(50_000_000) { rand(Int32) }
puts 2
b = Array(Int64).new(50_000_000) { rand(Int64) }
sa = 0_i32
sb = 0_i64
Benchmark.ips do |ips|
ips.report("Int32") { sa = a.reduce(0_i32) { |s, i| s &+ i } }
ips.report("Int64") { sb = b.reduce(0_i64) { |s, i| s &+ i } }
end
puts sa
puts sb It's slower for Int64. The reason is that even though math operations take probably the same time, the data that you can put on a cache line or bus is smaller, so there's that performance loss with Int64. What we are considering, though, is adding a |
I'm not sold on that. When 32-bit vs 64-bit performance matters (and 32-bits are big enough to hold the data) you can simply optimize your code by using Int32 explicitly. But that's actually an edge case for heavy math operations. |
Does the Rust-way fits Crystal? I think Crystal is closer to Go and Swift: abstract details but give access to low-level when needed. In that benchmark, if Int32's are enough, then you can optimize (cool), thought we're talking of 190MB vs 380MB arrays. That's kinda big, and the performance hit ain't so bad (1.28× slower) given that the CPU caches are busted twice as many times. Having a specific |
Personally I think discussion new integer types right now is entirely missing the point of 1.0. The original plan was to release 1.0-pre1 as 0.35.0+bugfixes and now we're discussing this? Even #9357 can be implemented after 1.0 by adding a |
I personally wouldn't mind having a default integer type that's We've also been talking about making the However, nothing is set in stone yet, this is what we've been discussing so far. |
This comment has been minimized.
This comment has been minimized.
That's even worse 😭 |
I think being specific about the type in a statically typed language is a positive. It shouldn't feel redundant, it should feel good because it's explicit. Not against an |
I agree with this proposal, but it may change too much and be difficult to implement. Definition
Principles
Step 1: Modify method signature and document descriptionChange the return value type of Change the return value type of For floating-point numbers, the same applies. What about At this step, we only modified the method signature and document description, unified the coding style, without breaking change. Therefore, the old code can compile normally without being affected. So this step can be completed in 1.x. Step 2: Unify the coding style related to operating numerical valuesWhether it is a standard library or a third-party library, follow the coding style determined in the step 1 and gradually unify the code related to numerical operations. This step will go through a considerable amount of time, not only covering a wide range, but more importantly, it will take a considerable amount of time to cultivate user habits. This step is only about better unifing the code related to numerical operations, without breaking change, so it can be completed in 1.x. Step 3: Add practical methodsIf possible, these two methods can be added under the Top Level namespace:
This is more convenient to use than Add Add There was no breaking change in this step, so it can be completed in 1.x. Step 4: Simplify overflow handling strategiesAlthough there are not many cases of overflow, we cannot ignore it. There are two strategies for handling overflow:
I suggest referring to rust-lang strategy here. Deprecate all wrap operations and only retain regular operations. For which overflow handling strategy, specify it in the compilation options:
We can even deprecate methods For very few cases where users need to specify overflow handling strategies in their code, we can refer to approach in C# lang. There was no breaking change in this step, so it can be completed in 1.x. Step 5: Improve non primitive numerical typesThere is an undeniable fact here: I personally strongly recommend lifting their inheritance relationships with In this way, In addition, the
NOTE: There are some breaking changes in this step, but these are all related to specific application areas and the impact will not be significant. Step 6: Change default numerical typesChange the default integer type to platform specific. On the 64-bit platform is The internal implementation of types such as The changes in this step are a bit significant, but considering that it has been quite some time since the Step 1 and the user's coding habits have formed, it is safe to make breaking changes at this time. Step 7: Automatic numerical type conversionWhat type of result is obtained when performing operations on different integer types? There are several different strategies:
I think the third strategy is more suitable for 'static typed languages that write like dynamically typed languages'. When users mix different types of integers, it indicates that they do not care about specific numerical types, and we should automatically elevate them to appropriate numerical types. Unsigned and signed operations promoted to signed: For different types of floating-point operations, the same applies.
The signature for division is as follows: Assignment operations for class variables and instance variables, automatic type conversion. The assignment operation of a local variable remains the same as the current implementation, that is, union type. For overflow during automatic type conversion, refer to the overflow handling strategy section. Bit arithmetic is not considered here, and the rules of bit arithmetic need to be considered separately. The changes in this step are significant and have a wide range of impacts, and can only be implemented in 2.0. Step 8: Clean up deprecated codeThis step is done in 2.0, and we can freely clean up deprecated code in 1.x. At this point, all the simplification of the numerical system has been completed. |
Thanks @erdian718 for this detailed proposal. I think it might be a bit more extensive than the scope of this individual issue though. Step 3 and 4 may be good ideas but they don't seem directly related to the simplification of integer and float types (either set of changes could be implemented independently). |
Yes, there are currently no What I mainly want to express is that
This will simplify the code in many places, we don't need to always consider: can this code handle non primitive types correctly? Especially for user-defined types. |
Minor note: something like #14393 adds the possibility of a platform-specific |
To be honest the C types for AVR are challenging: the CPU registers are 8-bits, so native integers are also 8-bits, but pointers are 16-bits (for up to 64KB of memory) while some boards have 128KB of flash memory (?!); Anyway, AVR having a very limited program space, only the bare types are interesting, and so far I'm still pondering whether I'd like the default integer to be 16 or 32-bits. |
ISO C mandates |
Please solve this problem in some way because it's scary to use. I'm not kidding it's really scary how many errors there can be. My friend and I are very interested in your language, but these problems with numeric types quickly put us off in the beginning. We'll just keep watching for now. Good luck to you. You have made a very interesting programming language. |
@nerzh Could you elaborate what you find "scary" about using number types? This hasn't been brough up yet in the discussion, so it's really not clear what you're referring to. |
This came up on the Discord and the gist of it, as I followed it, was: a = 1
b = 9223372036854775805
pp a + b # => Unhandled exception: Arithmetic overflow (OverflowError) Whereas in other languages they're used to, this would be a compile time error vs runtime. So more so #8872 than this issue itself I'd say. |
@straight-shoota sorry, I got my issues mixed up, this is in response to #8872 |
Right now Crystal has a variety of integer and float types:
Int8
,Int16
,Int32
,Int64
,UInt8
,UInt16
,UInt32
,UInt64
Float32
,Float64
The default integer type when you don't use a suffix is
Int32
and the default float type isFloat64
.This kind of works but I imagine something better.
Int32 and Float64
Given that Int32 and Float64 are the default types it feels a bit redundant to type those 32 and 64 numbers all the time.
So here's an initial idea: what if we name those types
Int
andFloat
? We would of course need to rename the existing base typesInt
andFloat
but that's not a problem, we can maybe call themIntBase
andFloatBase
orIntegral
andFloating
, it doesn't matter much because those names won't be used a lot.Then talking about ints and floats is so much simpler: just use
Int
andFloat
everywhere. In the case where you do need a specific limit, which is rare and usually only useful in low-level code such as interfacing with C or writing binary protocols, you can still use the namesInt32
,Int64
,Float32
or whatever you need.What to alias to
Now, we could make
Int
be an alias ofInt32
andFloat
an alias ofFloat64
, but maybe it's better if we makeInt
depend on the architecture. That meansInt
would be equivalent toInt64
in 64 bits architectures.This is also how Go works. They recommend using
int
everywhere unless you have good reasons to use a specific size. It's probably the case that usingInt64
by default instead ofInt32
works equally fine (maybe even better because the range is bigger so overflow is less possible) without a real performance degradation.Another nice thing is that if eventually 128 bit architectures appear all programs will automatically start using this bigger range (if we want to) without needing to change any code.
To alias or not
Now, we could make
Int
be an alias of the respective underlying type, but I don't think that's a good idea. The reason is that if you have a program that does:that would compile in 32 bits but would stop compiling in 64 bits. Ideally we'd like our programs to always compile regardless of the architecture.
So, we could make
Int
andFloat
be different types. To assignInt32
orInt64
to them you would need to callto_i
first. Then programs on 32 and 64 bits will go through that explicit conversion process.Another benefit is that we could start making collections use
Int
as the size. This increases their amount a bit but I think it's fine: it's probably not a huge performance/memory penalty (most of the memory is in the actual data). But then their limit becomes the limit of the architecture's memory (well, half of it if we use signed integers, but it's still a lot more than we can do right now). And, like before, this limit will automatically increase when the architectures improve (well, if the Amazon burning doesn't mean our imminent doom 😞).Friction?
If we need these conversions between
Int
and all other integer types, and same forFloat
, wouldn't it make it really hard to write programs, having to convert between integer types all the time?No, I don't think so. Because
Int
will be the default type everywhere, except for the few cases I mentioned before (C bindings and binary protocols) there would be no reason to use another integer type.More benefits
Right now when we parse JSON and YAML we use
Int64
because it would be a shame to parse toInt32
because we might lose some precision.With this change the type would be
Int
, as everywhere else, and this can be assigned to everything else too if we stick toInt
as a default. I know in 32 bits the limit will be smaller, but 32 bits machines are starting to become obsolete (for example I think Mac is dropping support for 32 bit apps).Breaking change?
This is probably a breaking change, but a good one.
Summary
In summary if we do this change we get:
Int
andFloat
Int
The text was updated successfully, but these errors were encountered: