Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Great Line Ending & Cursor Range Cleanup #362

Closed
cessen opened this issue Jun 23, 2021 · 12 comments
Closed

The Great Line Ending & Cursor Range Cleanup #362

cessen opened this issue Jun 23, 2021 · 12 comments
Assignees
Labels
A-core Area: Helix core improvements E-hard Call for participation: Experience needed to fix: Hard / a lot

Comments

@cessen
Copy link
Contributor

cessen commented Jun 23, 2021

Update:

As discussed in the comments, I'll be switching Helix to gap indexing (what I called the "between-character model" in this RFC), but without any behavior changes to Helix. A minimum 1-width cursor range will still be enforced, for example.

At the same time, I'll be addressing the file-end line ending issue discussed in #309.


In this issue I'd like to propose what I am (overly grandiosely) calling The Great Line Ending & Cursor Range Cleanup.

As I see it, Helix currently has two issues with how it internally thinks about text:

  1. It assumes that documents always end with a line break (discussed in Don't assume all text files end with a line break #309).
  2. It views cursors as sitting on top of characters rather than between characters.

I would like to change both of these things in one big PR, because the former creates special cases that... well, aren't special cases anymore if we also do the latter. But I want to get buy-off on both of these things before proceeding, because they're both changes to the underlying way that Helix thinks about text.

Since I already got buy-off about line endings in #309, I'm going to focus on cursor ranges here.

Proposal

Helix considers a cursor and a contiguous selection range to be the same thing. A cursor is simply a selection range where the start and end of the range are equal. This is a great architecture, and we should keep it. Helix calls this combined concept a Range, and it can be found in selection.rs in helix-core.

Right now, Helix considers the start and end indices of a range to sit on top of characters. So, for example, a range of [1, 1] looks like this (bold highlights the range ends, and square brackets mark the selected text):

S[o]me text.

And a range of [0, 3] looks like this:

[Some] text.

I propose to change the start and end indices to sit between characters instead. A range of [1, 1] would then look like this (square brackets mark the range ends and the selected text):

S[]ome text.

And a range of [0, 3] would look like this:

[Som]e text

(Side note: the on-character model can equivalently be formulated as "end-inclusive", and the between-character model as "end-exclusive". But with text, I think the "on-character" and "between-character" formulation is much more intuitive, and makes it more obvious how each model works, as well as how to code for them.)

Quick justification

The quick justification for changing to the between-character model is that it's strictly more powerful: it can represent everything the on-character model can, plus zero-width ranges. So it gives us more flexibility.

If we want to, we could still make everything in Helix function identically to how it does now by ensuring that ranges are always at least one character wide, and keeping all the command behavior the same with respect to those one-character ranges. But it gives us the flexibility to have zero-width ranges if we decide we want that.

Full justification

Zero-width ranges aren't just abstractly more powerful, they actually let us represent important things more precisely. I'll illustrate this with a few representative examples.

Example 1: text insertion

Consider the following two situations:

  1. The user wants to insert text between two characters.
  2. The user wants to insert text, overwriting a single-character selection.

With the current on-character model, the two situations look like this:

  1. "So[m]e text." → insert 'zz' → "Sozz[m]e text."
  2. "So[m]e text." → insert 'zz' → "Sozz[e] text."

The range itself can't distinguish between these two cases at all, so the distinction has to be at the operation/command level. Whether we want the distinction to be at the operation/command level or not is a reasonable discussion we can have. But the point is that the on-character model traps us into that choice because it cannot distinguish a non-selection vs a single-character selection.

In contrast, the same two situations can be distinguished easily with the between-character model:

  1. "So[]me text." → insert 'zz' → "Sozz[]me text."
  2. "So[m]e text." → insert 'zz' → "Sozz[]e text."

Example 2: the extend-line command

The extend-line command (currently on key x in Helix) expands the current range to select the entire line, unless the entire line is already fully encompassed by the range, in which case it extends the range to select more and more lines on subsequent invocations.

This is a super useful command. For example, deleting a line is just the key combo xd... unless you're on a blank line. If you're on a blank line, then xd will actually delete both the current and next line. The reason why is obvious if we visualize the on-character ranges involved:

First line.\n
[\n]
Third line.\n

x

First line.\n
[\n
Third line.\n]

d

First line.[\n]

The problem is that there is no way to distinguish between the cursor simply being on the line vs selecting the line. This is easily solved with the proposed between-character model:

First line.\n
[]\n
Third line.\n

x

First line.\n
[\n]
Third line.\n

d

First line.\n
[]Third line.\n

This way x always behaves consistently: first selecting the current line, then on subsequent invocations extending the selection to more and more lines. This also makes xd a muscle-memory shortcut for deleting the current line: it will never delete two lines, even if the current line is blank.

Example 3: empty files

When we move away from requiring a file-end line ending (as per #309), there will sometimes be buffers with literally no text in them.

In such cases, the on-character model becomes ill-defined (or we have a weird "one-past-the-end" case) and we'll need to handle it as a special case in various places.

The between-character model, on the other hand, represents this gracefully with a zero-width range. Of course, that doesn't mean there won't be special cases: commands that require a non-zero-width range will still need to check for zero-width. But those checks fall naturally out of the relationship between those commands and the range representation itself.

More generally, that's one of the main differences between the on-character and between-character models: the on-character model has special cases because of ambiguity and things it can't represent, whereas the between-character model has special cases because of command/operation requirements. The latter is a lot easier to reason about.

Other benefits

I think we probably want to support plugins/addons in the future. And the proposed between-character model is generally a simpler programming API, so we probably want to present that model to plugins for range manipulation and text edits.

Having that model consistent between the internal APIs and the plugin APIs will likely make everything simpler and less error prone.

If we want to enforce a minimum range-width on the user-facing side of things, that can easily be a validation step after plugin code runs: just expand all zero-width ranges to single-width. Such a validation pass will likely be needed anyway, for e.g. resolving overlapping ranges, etc.

Drawbacks

Of course, this change won't be all sunshine and rainbows:

How to visualize this in a terminal?

The main challenge with the proposed between-character model is that we can't draw vertical-line cursors in a terminal, we can only draw block-style cursors. (Some terminals do support vertical-line cursors, but only for the terminal's single "real" cursor, which isn't useful in a multi-cursor editor like Helix. And not all terminals support that anyway.)

At the moment, I can think of three ways to tackle this limitation:

  1. Simply draw zero-width ranges and single-character ranges identically. This could potentially be confusing. However, if we design the behavior of commands so that there isn't a behavioral distinction when it doesn't matter, then I think it could work. For example:
    • Commands like replace (r) would operate on next-character (if it exists) when the range is zero-width.
    • Commands like extend-line (x) would take advantage of the zero-width distinction and gain the nicer behavior outlined earlier.
    • Selection mode (v) would automatically expand zero-width ranges to single-width when you enter it, and range manipulation will handle range-end crossing in a way that matches current behavior.
  2. Give zero-width cursors a distinct visual style. Even with this approach, we will likely still want commands like r to operate on next-character in the case of zero-width ranges, because that keeps them useful at all times.
  3. Disallow zero-width cursors except in cases like an empty buffer, and have everything continue to behave identically to how it does now from the user's perspective. This doesn't give us any user-facing improvements, but it at least gives us the other technical and API benefits.

Assuming people generally agree on making the switch to between-character at all, this is definitely the aspect of things that I think needs the most consideration and discussion.

Introducing bugs

An obvious drawback of doing a "cleanup" like this is that it will inevitably (re-)introduce indexing and command-behavior bugs, and for a while we'll be (re-)fixing them.

Personally, I think that's acceptable at this stage in Helix's development. Helix is still (comparatively) in its early days, so if we're going to make this kind of switch, now is the time to do it. And I think such bugs will actually be easier to identify and fix with the between-character model, because the special cases involved are more localized and "natural", rather than arising from ambiguities in the range model itself.

But I definitely understand if people are hesitant about this risk.

Wrapping up

So that's my proposal. To summarize my arguments in favor of between-character ranges:

  • The between-character model is strictly more powerful, and handles corner cases more naturally.
  • It presents a simpler and less ambiguous API both to Helix's internals and (in the future) to plugins.
  • It pushes the tricky questions out of the core APIs and into the "how do we want Helix to behave for users" realm.
  • It gives us the flexibility to try things out while attempting to answer those tricky questions, without impacting core APIs as we do so.
  • Depending on our answers to those tricky questions, it may bring direct benefits to the user as well, in the form of more powerful/flexible features, etc.

Thoughts? Opinions?

@cessen cessen added the C-discussion Category: Discussion or questions that doesn't represent real issues label Jun 23, 2021
@kellytk
Copy link
Contributor

kellytk commented Jun 24, 2021

Potentially related: #136

@cessen
Copy link
Contributor Author

cessen commented Jun 24, 2021

I don't think #136 is strictly related to this, although I certainly would like a decision on this RFC before I tackle #136, just to avoid code churn.

@pickfire
Copy link
Contributor

Commands like replace (r) would operate on next-character (if it exists) when the range is zero-width.

This is confusing.

@cessen
Copy link
Contributor Author

cessen commented Jun 24, 2021

This RFC kind of has two proposals in it:

  1. Switching to a between-character range representation internally.
  2. Possibly altering the behavior of Helix for end users via that representation.

I just want to make clear that the second point is very optional, and has a lot of open questions. And I'm happy to stick to just the first point during the cleanup, keeping the behavior of Helix the same. I do want to discuss the second point as well, however, if people have ideas.

@pickfire pickfire added A-core Area: Helix core improvements E-hard Call for participation: Experience needed to fix: Hard / a lot labels Jun 24, 2021
@cessen
Copy link
Contributor Author

cessen commented Jun 24, 2021

Commands like replace (r) would operate on next-character (if it exists) when the range is zero-width.

This is confusing.

Ah, sorry. To clarify, what I mean is that if you have this situation:

So[]me text.

And hit r (or a similar command that only makes sense with a selection), then it operates as if it were this situation:

So[m]e text.

@cessen
Copy link
Contributor Author

cessen commented Jun 24, 2021

Re: "How to visualize this in a terminal?"

I had some more thoughts about this. My general feeling now is that (assuming we want to change behavior at all), a mix of option 1 and option 2 might be best:

  • The cursor changes style for zero-width ranges. However, users don't need to know about the distinction between zero-width and non-zero width per se. Instead, we call zero-width ranges "cursors" and non-zero-width ranges "selections". So from the user perspective, the styling difference indicates normal cursors vs a selected area.
  • Other than that, we follow option 1, where most behaviors stay the same, and only things like x where the distinction matters change their behavior.

This way there is a visual change when e.g. first hitting x on an empty line. But also there is a visual change when entering visual/selection mode, as all the ranges are expanded to a min-width of one.

@pickfire
Copy link
Contributor

pickfire commented Jun 24, 2021

Ah, sorry. To clarify, what I mean is that if you have this situation:

So[]me text.

And hit r (or a similar command that only makes sense with a selection), then it operates as if it were this situation:

So[m]e text.

Ah, I don't mean the text is confusing but I mean the behavior is confusing. If So[]me how can m be replaced? It's not even covered by the cursor. It's like you toggle the "Insert" key on your keyboard, confusing.

Then doesn't So[]me, for both a and i insert at the same position?

And [Some], pressing aa will make it [Somae] rather than [Somea]?

@cessen
Copy link
Contributor Author

cessen commented Jun 24, 2021

@pickfire
I don't think it's any more confusing than the behavior of Del or Backspace...? They function properly even without a selection, operating on the character behind/in-front of the cursor in that case. This would just extend that behavior to other operations that would also be a no-op without a selection.

Then doesn't So[]me, for both a and i insert at the same position?

My general proposal for how to handle operations like that is to operate as if the range were a minimum width of one, so that behavior is the same as now from the user perspective, and still visually makes sense in a terminal. So I don't think it would end up being confusing for the user.

@archseer
Copy link
Member

Let's call this approach gap indexing, based on this comment on the LSP spec:

	 * Character offset on a line in a document (zero-based). Assuming that
	 * the line is represented as a string, the `character` value represents
	 * the gap between the `character` and `character + 1`.

(As discussed on Matrix:) I think we should change the Range to use gap indexing, it would simplify some of the code. The transactions already use this approach, which is why a lot of the ranges have to be extended with + 1 to fit.

I think we could also truncate the value inside the transaction so we no longer have to do the max_to clipping.

I'm a bit unsure on the approach to cursors, so I think we should start by enforcing that ranges are always at least 1-width. I don't think there's a good approach to solve this in the terminal (it makes i/a behavior a bit more confusing because it looks like you're over a character but you're positioned right before it, both i and a will insert in the same spot), but it would be interesting in a proper UI because we could render 0-width cursors.

@cessen
Copy link
Contributor Author

cessen commented Jun 24, 2021

Awesome. In that case, I'll put this on my todo list. I'll switch Helix to gap indexing, but keep all current behavior the same and enforce a minimum-1-width range.

Behavioral changes can be a separate discussion, and dealt with later as a separate project if we decide to change anything.

@cessen cessen removed the C-discussion Category: Discussion or questions that doesn't represent real issues label Jun 24, 2021
@cessen cessen self-assigned this Jun 24, 2021
@cessen cessen changed the title RFC: The Great Line Ending & Cursor Range Cleanup The Great Line Ending & Cursor Range Cleanup Jun 24, 2021
@cessen
Copy link
Contributor Author

cessen commented Jun 25, 2021

Started work on this in #376.

@archseer
Copy link
Member

Resolved by #376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements E-hard Call for participation: Experience needed to fix: Hard / a lot
Projects
None yet
Development

No branches or pull requests

4 participants