Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git source control: "Stage Selected Ranges" creates a "Staged Changes" file of type UTF-8 when the orginal file is ANSI (Windows 1252) #111915

Open
jekitf opened this issue Dec 4, 2020 · 17 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug git GIT issues help wanted Issues identified as good community contribution opportunities
Milestone

Comments

@jekitf
Copy link

jekitf commented Dec 4, 2020

Issue Type: Bug

  1. Create a file of type ANSI (Windows 1252). In Notepoad++ (Encoding-ANSI, Encoding-Character sets-Western European-Windows 1252).
  2. Add the scandinavian letters: ØÆÅ to the file. (hex D8 C6 C5)
  3. Add the file to a git repo
    --
  4. Edit the file and add "\r\n-" (hex 0D 0A 2D)
  5. Git source control: "Stage Selected Ranges" for the one line containing "-" (hex 2D)
  6. Commit the file from "Staged Changes"
  7. Delete the file on disk
  8. Git source control: Select the file and "Discard changes" (to restore the file from git)
    --
    The file have the content: C3 98 C3 86 C3 85 0D 0A 2D
    Expected content: D8 C6 C5 0D 0A 2D

The file have been changed from ANSI to UTF-8 in git.

In C++ using "Use Multi-Byte Character Set" this is a fatal bug

VS Code version: Code 1.51.1 (e5a624b, 2020-11-10T23:34:32.027Z)
OS version: Windows_NT x64 10.0.19041

System Info
Item Value
CPUs Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz (8 x 3602)
GPU Status 2d_canvas: enabled
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
oop_rasterization: disabled_off
opengl: enabled_on
protected_video_decode: unavailable_off
rasterization: enabled
skia_renderer: disabled_off_ok
video_decode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled
Load (avg) undefined
Memory (System) 15.98GB (4.12GB free)
Process Argv --crash-reporter-id 48361099-29ee-4112-bdbe-983a488188b4
Screen Reader no
VM 0%
Extensions (9)
Extension Author (truncated) Version
xml Dot 2.5.1
gc-excelviewer Gra 3.0.40
csharp ms- 1.23.6
cpptools ms- 1.1.3
hexeditor ms- 1.3.0
powershell ms- 2020.6.0
vetur oct 0.30.3
vscode-zipexplorer sle 0.3.1
pdf tom 1.1.0
@jekitf
Copy link
Author

jekitf commented Dec 4, 2020

Is this the problem?

\extensions\git\src\git.ts

image

Value if specified in settings should be used

image

@eamodio
Copy link
Contributor

eamodio commented Jan 6, 2021

@jekitf I think that utf8 is ok, but the one 2 lines above it might be the issue. But not sure. I won't really be able to look at this this iteration at least, so any testing/PR that you can provide would be greatly appreciated.

@eamodio eamodio added git GIT issues bug Issue identified by VS Code Team member as probable bug help wanted Issues identified as good community contribution opportunities labels Jan 6, 2021
@eamodio eamodio added this to the Backlog milestone Jan 6, 2021
@MadsAdrian
Copy link

MadsAdrian commented Mar 19, 2021

Facing a similar issue. Related to #36219?

Edit:
Suppose it is relevant that the file encoding differs from the OS. I specify the encoding in settings:

"sas.files.encoding": "windows1252",

@hkcomori
Copy link

This problem also occurs with Shift-JIS files.

  • VSCode Version 1.54.3
  • Windows 10 Pro x64 Version 1909

Steps to Reproduce:

  1. Setup user settings according to this:
{
    "files.encoding": "shiftjis"
}
  1. Initialize a git repository
  2. Create a file which is encoded by Shift-JIS
  3. Open the diff of the created file
  4. Select some lines and run "Stage Selected Ranges"

@Dub1shu
Copy link

Dub1shu commented Apr 6, 2021

I'm facing the same problem with Shift-JIS.
Therefore, I tried to verify it.
As eamodio says, this line seems to be the cause of the bug.

child.stdin!.end(data, 'utf8');

Verification Code

\extensions\git\src\git.ts
verification

Verification Process

  1. Create a non-UTF-8 file (Shift-JIS in this case), add some text and commit it.
  2. Set the workspace encoding setting to Shift-JIS.
  3. Add new line.
  4. In source control view, select the new line and execute "Stage Selected Ranges".

Current Stable
The Shift-JIS file is converted to UTF-8 by "Stage Selected Ranges".
Before

Verification code result
Shift-JIS files are staged as is.
After

This verification code is an ad hoc fix and I'm not sure if it's the right way to fix the bug.

@ankostis
Copy link

ankostis commented Dec 3, 2021

Isn't this one supposed to have already been fixed by #84130 (as reported on #55110 --> #36219)?
Maybe that fix is the root cause?

@lszomoru
Copy link
Member

#84130 has addressed the problem related to encoding when files are being shown in the diff editor. Unfortunately it did not address the encoding related issue when it comes to the "Stage Selected Ranges" command. Fixing the "Stage Selected Ranges" command is currently blocked on #824 as at the moment extensions do not have access to the encoding of the text document.

@gdh1995
Copy link

gdh1995 commented Apr 6, 2022

I also run into this today. Why does this issue exist yet?

@ankostis
Copy link

ankostis commented Apr 6, 2022

The viciousness of this bug is that the corrupted files manifest themselves very late in git history, when eg bisecting old commits and discovering differences in irrelevant but huge ranges of text due to EOLs.
I've been bitten by this bug, and discovered it roughly after a year!

@ams-tschoening
Copy link

ams-tschoening commented May 8, 2022

Is working-tree-encoding of .gitattributes a workaround? Using that GIT is expected to store UTF-8 internally, while converting to some different encoding locally. In that case it might be OK if VSCode forwards UTF-8 at some point.

*			text=auto
*.bat		text eol=crlf	working-tree-encoding=cp850
*.c			text eol=crlf	working-tree-encoding=windows-1252

@Kiuchi
Copy link

Kiuchi commented Oct 25, 2022

I tried to create an extension to add a "Stage Selected Range (ANSI)" command for ANSI to work around this problem, but it was not possible because the registerDiffInformationCommand is still in the proposed stage. (#84899)
We hope this will be corrected as soon as possible.

@Dub1shu
Copy link

Dub1shu commented Oct 30, 2022

This issue requires an API for encoding, but the issue of adding this API(#824) has remained open for 6 years. Also, adding such a core API would be difficult for the average contributor.
Until the API is added, how about getting the encoding from the config and staging based on that encoding instead of the actual encoding?

@h8nor
Copy link

h8nor commented Jul 28, 2023

It is true that GIT always stores files in UTF-8 encoding on a remote server.
With the setting in comments given a year ago the Github Desktop works fine. But the vscode doesn't use the setting when comparing commits. The solution was not very obvious.
https://code.visualstudio.com/updates/v1_48#_browser-support
https://learn.microsoft.com/en-us/powershell/scripting/dev-cross-plat/vscode/understanding-file-encoding#configuring-vs-code

@ams-tschoening
Copy link

It is true that GIT always stores files in UTF-8 encoding on a remote server.

Don't be misleading, that's not the case "always", but depends on settings in .gitattributes. Git itself is fine to store files in arbitrary encoding as-is, depends on various different settings of the client. Remember that it's able to store binary files as well.

@ErikSteiner
Copy link

ErikSteiner commented Nov 9, 2023

Is there an update to this issue? I also opened an issue on VSCodium's GitHub: VSCodium/vscodium#1418

I have reproduced and documented the current problem in two examples. The difference between the two use cases is the use of a .gitattributes. The post has become a little longer than planned.

Case 1

  1. mkdir .\test-repo
  2. cd test-repo
  3. git init
  4. git config --local --list
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
  1. Create script.vbs with content
# Test file
Msg "Hello"
  1. commit script.vbs in VSCode with Stage Changes
  2. edit the script.vbs and add some umlauts
# Test file
Msg "Hello"
Msg "Änderungen"
  1. commit script.vbs in VSCode with Stage Changes
  2. edit the script.vbs and add some umlauts
# Test file
Msg "Hello"
Msg "Überlegung"
Msg "Änderungen"
  1. commit script.vbs in VSCode with Stage Changes
  2. edit the script.vbs and add some umlauts
# Test file
Msg "Hello"
Msg "ÄÖ first commit"
Msg "Überlegung"
Msg "ÜÖ later commit"
Msg "Änderungen"
  1. in VSCode under Source Control highlight the row with Msg "ÄÖ first commit" and select "Stage selected Ranges".
  2. Commit the Staged Change
  3. After that:
  • Source Control shows the following state:
    grafik
  • The explorer shows the following state (just ignore the file name). Note the lines marked in blue.:
    grafik

Case 2

  1. mkdir .\test-repo2
  2. cd test-repo2
  3. git init
  4. git config --local --list
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
  1. Create script.vbs with content
# Test file
Msg "Hello"
  1. Create .gitattributes with content
*			text=auto
*.vbs		text eol=crlf	working-tree-encoding=windows-1252
  1. commit .gitattributes in VSCode with Stage Changes
  2. commit script.vbs in VSCode with Stage Changes
  3. edit the script.vbs and add some umlauts
# Test file
Msg "Hello"
Msg "Änderungen"
  1. commit script.vbs in VSCode with Stage Changes
  2. from now on, the explorer shows a change in line three, even if Source Control shows no changes:
    grafik
  3. edit the script.vbs and add some umlauts
# Test file
Msg "Hello"
Msg "Überlegung"
Msg "Änderungen"
  1. commit script.vbs in VSCode with Stage Changes
  2. the explorer shows a change in line three, even if Source Control shows no changes. Interestingly the text shows "1 of 1 change":
    grafik
  3. edit the script.vbs and add some umlauts
# Test file
Msg "Hello"
Msg "ÄÖ first commit"
Msg "Überlegung"
Msg "ÜÖ later commit"
Msg "Änderungen"
  1. in VSCode under Source Control highlight the row with Msg "ÄÖ first commit" and select "Stage selected Ranges".
  2. Commit the Staged Change
  3. After that:
  • Source Control shows the following state:
    grafik
  • The explorer shows the following state (just ignore the file name). Note the lines marked in blue.:
    grafik

Details

settings.json

"files.autoGuessEncoding": false,
"[vbs]": {
        "files.encoding": "windows1252"
    }

@rasmussehlin
Copy link

I had this problem today. Running version 1.85.1. Thought I'd make a comment to give this issue a kick, since this issue is three years old. :)

@michaelmesser
Copy link

Can this bug be prioritized higher? Silently changing files is a significant bug. Many users are likely running into this bug without realizing it. If it breaks something later, they might not find the original cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug git GIT issues help wanted Issues identified as good community contribution opportunities
Projects
None yet
Development

No branches or pull requests