Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PhpSpreadsheet generates a faulty Xls file when cells contain emojis #642

Closed
rzds opened this issue Aug 23, 2018 · 6 comments · Fixed by #3696
Closed

PhpSpreadsheet generates a faulty Xls file when cells contain emojis #642

rzds opened this issue Aug 23, 2018 · 6 comments · Fixed by #3696

Comments

@rzds
Copy link

rzds commented Aug 23, 2018

This is an old story and it only happens when generating Xls files, it doesn't with Xlsx files. The same issue in the deprecated PHPExcel PHPOffice/PHPExcel#1320
No one said why this is happening.
The reader works fine with emojis.
Isn't this issue fixable?

@chandon
Copy link

chandon commented Sep 26, 2018

I've got the same issue

@PowerKiKi
Copy link
Member

I believe Xls cannot support emoji. And even if it did, you really should use xlsx which has been around for more than 10 years already

@xuanskyer
Copy link

tks all , i got same q.:)

@Daizygod
Copy link

Still not supported 💀💀💀

@oleibman
Copy link
Collaborator

The problem is not restricted to emojis; it affects any text which uses characters not in the Unicode BMP. For Xls, Excel encodes its characters in UTF-16, generally a double-byte character set. But, for characters outside the BMP, UTF-16 uses "surrogates" to encode the character as 2 double-byte characters. Excel stores the number of 2-byte characters (so each non-BMP character counts as 2), but PhpSpreadsheet is storing the number of characters (so each non-BMP character counts as 1). Expect a fix in a day or two.

@oleibman oleibman reopened this Aug 30, 2023
@Daizygod
Copy link

The problem is not restricted to emojis; it affects any text which uses characters not in the Unicode BMP. For Xls, Excel encodes its characters in UTF-16, generally a double-byte character set. But, for characters outside the BMP, UTF-16 uses "surrogates" to encode the character as 2 double-byte characters. Excel stores the number of 2-byte characters (so each non-BMP character counts as 2), but PhpSpreadsheet is storing the number of characters (so each non-BMP character counts as 1). Expect a fix in a day or two.

It`s really can be fixed? I've been thinking about switching to XLSWriter, I will wait thanks.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Aug 30, 2023
Fix PHPOffice#642. Opened over 5 years ago, probably the oldest problem I've worked on. And PHPOffice/PHPExcel#1320, opened a year before that. And SpartnerNL/Laravel-Excel#1521.

Shared/StringHelper::UTF8toBIFF8UnicodeLong calculates incorrect length for strings when they contain characters outside Unicode BMP. Xls uses UTF-16 to encode its strings, and characters outside BMP require a surrogate pair to encode. PhpSpreadsheet (and PhpExcel before it) have been counting these as a single character, but Excel counts them as 2. Change to compute the length as half the number of bytes in the UTF-16 string, as Excel does.

A formal test is added, but it's a bit difficult to follow. So I aso added a non-BMP emoji to 27template.xls, which will cause it to be both read by Xls reader and written by Xls writer. This would previously have created a corrupt worksheet. The emoji is now handled correctly.
PowerKiKi pushed a commit that referenced this issue Aug 31, 2023
Fix #642. Opened over 5 years ago, probably the oldest problem I've worked on. And PHPOffice/PHPExcel#1320, opened a year before that. And SpartnerNL/Laravel-Excel#1521.

Shared/StringHelper::UTF8toBIFF8UnicodeLong calculates incorrect length for strings when they contain characters outside Unicode BMP. Xls uses UTF-16 to encode its strings, and characters outside BMP require a surrogate pair to encode. PhpSpreadsheet (and PhpExcel before it) have been counting these as a single character, but Excel counts them as 2. Change to compute the length as half the number of bytes in the UTF-16 string, as Excel does.

A formal test is added, but it's a bit difficult to follow. So I aso added a non-BMP emoji to 27template.xls, which will cause it to be both read by Xls reader and written by Xls writer. This would previously have created a corrupt worksheet. The emoji is now handled correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

6 participants