forked from PHPOffice/PhpSpreadsheet
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Writer Xls Handle Characters Outside Unicode BMP
Fix PHPOffice#642. Opened over 5 years ago, probably the oldest problem I've worked on. And PHPOffice/PHPExcel#1320, opened a year before that. And SpartnerNL/Laravel-Excel#1521. Shared/StringHelper::UTF8toBIFF8UnicodeLong calculates incorrect length for strings when they contain characters outside Unicode BMP. Xls uses UTF-16 to encode its strings, and characters outside BMP require a surrogate pair to encode. PhpSpreadsheet (and PhpExcel before it) have been counting these as a single character, but Excel counts them as 2. Change to compute the length as half the number of bytes in the UTF-16 string, as Excel does. A formal test is added, but it's a bit difficult to follow. So I aso added a non-BMP emoji to 27template.xls, which will cause it to be both read by Xls reader and written by Xls writer. This would previously have created a corrupt worksheet. The emoji is now handled correctly.
- Loading branch information
Showing
4 changed files
with
34 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
<?php | ||
|
||
namespace PhpOffice\PhpSpreadsheetTests\Writer\Xls; | ||
|
||
use PhpOffice\PhpSpreadsheet\Shared\File; | ||
use PhpOffice\PhpSpreadsheet\Spreadsheet; | ||
use PhpOffice\PhpSpreadsheet\Writer\Xls; | ||
use PHPUnit\Framework\TestCase; | ||
|
||
class Issue642Test extends TestCase | ||
{ | ||
public function testCharOutsideBMP(): void | ||
{ | ||
$spreadsheet = new Spreadsheet(); | ||
$sheet = $spreadsheet->getActiveSheet(); | ||
$stringUtf8 = "Hello\u{1f600}goodbye"; | ||
self::assertSame(13, mb_strlen($stringUtf8)); | ||
$stringUtf16 = (string) iconv('UTF-8', 'UTF-16LE', $stringUtf8); | ||
self::assertSame(28, strlen($stringUtf16)); // each character requires 2 bytes except for non-BMP which requires 4 | ||
$sheet->getCell('A1')->setValue($stringUtf8); | ||
$outputFilename = File::temporaryFilename(); | ||
$writer = new Xls($spreadsheet); | ||
$writer->save($outputFilename); | ||
$spreadsheet->disconnectWorksheets(); | ||
$contents = (string) file_get_contents($outputFilename); | ||
unlink($outputFilename); | ||
$expected = "\x00\x0e\x00\x01" . $stringUtf16; // length is 14 (0e), not 13 | ||
self::assertStringContainsString($expected, $contents); | ||
$unexpected = "\x00\x0d\x00\x01" . $stringUtf16; // length is 14 (0e), not 13 | ||
self::assertStringNotContainsString($unexpected, $contents); | ||
} | ||
} |