Skip to content

Commit

Permalink
Writer Xls Handle Characters Outside Unicode BMP
Browse files Browse the repository at this point in the history
Fix PHPOffice#642. Opened over 5 years ago, probably the oldest problem I've worked on. And PHPOffice/PHPExcel#1320, opened a year before that. And SpartnerNL/Laravel-Excel#1521.

Shared/StringHelper::UTF8toBIFF8UnicodeLong calculates incorrect length for strings when they contain characters outside Unicode BMP. Xls uses UTF-16 to encode its strings, and characters outside BMP require a surrogate pair to encode. PhpSpreadsheet (and PhpExcel before it) have been counting these as a single character, but Excel counts them as 2. Change to compute the length as half the number of bytes in the UTF-16 string, as Excel does.

A formal test is added, but it's a bit difficult to follow. So I aso added a non-BMP emoji to 27template.xls, which will cause it to be both read by Xls reader and written by Xls writer. This would previously have created a corrupt worksheet. The emoji is now handled correctly.
  • Loading branch information
oleibman committed Aug 30, 2023
1 parent 4971801 commit acf3708
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 4 deletions.
2 changes: 1 addition & 1 deletion samples/Basic/27_Images_Xls.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
require __DIR__ . '/../Header.php';

// Read from Xls (.xls) template
$helper->log('Load Xlsx template file');
$helper->log('Load Xls template file');
$reader = IOFactory::createReader('Xls');
$spreadsheet = $reader->load(__DIR__ . '/../templates/27template.xls');

Expand Down
Binary file modified samples/templates/27template.xls
Binary file not shown.
4 changes: 1 addition & 3 deletions src/PhpSpreadsheet/Shared/StringHelper.php
Original file line number Diff line number Diff line change
Expand Up @@ -412,11 +412,9 @@ public static function UTF8toBIFF8UnicodeShort(string $textValue, array $arrcRun
*/
public static function UTF8toBIFF8UnicodeLong(string $textValue): string
{
// character count
$ln = self::countCharacters($textValue, 'UTF-8');

// characters
$chars = self::convertEncoding($textValue, 'UTF-16LE', 'UTF-8');
$ln = (int) (strlen($chars) / 2); // N.B. - strlen, not mb_strlen issue #642

return pack('vC', $ln, 0x0001) . $chars;
}
Expand Down
32 changes: 32 additions & 0 deletions tests/PhpSpreadsheetTests/Writer/Xls/Issue642Test.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<?php

namespace PhpOffice\PhpSpreadsheetTests\Writer\Xls;

use PhpOffice\PhpSpreadsheet\Shared\File;
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xls;
use PHPUnit\Framework\TestCase;

class Issue642Test extends TestCase
{
public function testCharOutsideBMP(): void
{
$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$stringUtf8 = "Hello\u{1f600}goodbye";
self::assertSame(13, mb_strlen($stringUtf8));
$stringUtf16 = (string) iconv('UTF-8', 'UTF-16LE', $stringUtf8);
self::assertSame(28, strlen($stringUtf16)); // each character requires 2 bytes except for non-BMP which requires 4
$sheet->getCell('A1')->setValue($stringUtf8);
$outputFilename = File::temporaryFilename();
$writer = new Xls($spreadsheet);
$writer->save($outputFilename);
$spreadsheet->disconnectWorksheets();
$contents = (string) file_get_contents($outputFilename);
unlink($outputFilename);
$expected = "\x00\x0e\x00\x01" . $stringUtf16; // length is 14 (0e), not 13
self::assertStringContainsString($expected, $contents);
$unexpected = "\x00\x0d\x00\x01" . $stringUtf16; // length is 14 (0e), not 13
self::assertStringNotContainsString($unexpected, $contents);
}
}

0 comments on commit acf3708

Please sign in to comment.