Skip to content

Commit

Permalink
Use Only mb_convert_encoding in StringHelper sanitizeUTF8 (#2994)
Browse files Browse the repository at this point in the history
* Test if UConverter Exists Without Autoload

Fix #2982. That issue is actually closed, but it did expose a problem. Our test environments all enable php-intl, but that extension isn't a formal requirement for PhpSpreadsheet. Perhaps it ought to be. Nevertheless ...

Using UConverter for string translation solved some problems for us. However, it is only available when php-intl is enabled. The code tests if it exists before using it, so no big deal ... except it seems likely that the people reporting the issue not only did not have php-intl, but they do have their own autoloader which issues an exception when the class isn't found. The test for existence of UConverter defaulted to attempting to autoload it if not found. So, on a system without php-intl but with a custom autoloader, there is a problem. Code is changed to suppress autoload when testing UConverter existence.

Pending this fix, the workaround for this issue is to enable php-intl.

* Minor Improvement

Make mb_convert_encoding use same substitution character as UConverter, ensuring consistent results whatever the user's environment.

* And Now That I Figured That Out

Since mb_convert_encoding can now return the same output as UConverter, we don't need UConverter (or iconv) after all in sanitizeUTF8.
  • Loading branch information
oleibman committed Aug 13, 2022
1 parent d13b07b commit 0492ea6
Showing 1 changed file with 3 additions and 17 deletions.
20 changes: 3 additions & 17 deletions src/PhpSpreadsheet/Shared/StringHelper.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
namespace PhpOffice\PhpSpreadsheet\Shared;

use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
use UConverter;

class StringHelper
{
Expand Down Expand Up @@ -334,26 +333,13 @@ public static function controlCharacterPHP2OOXML($textValue)
public static function sanitizeUTF8(string $textValue): string
{
$textValue = str_replace(["\xef\xbf\xbe", "\xef\xbf\xbf"], "\xef\xbf\xbd", $textValue);
if (class_exists(UConverter::class)) {
$returnValue = UConverter::transcode($textValue, 'UTF-8', 'UTF-8');
if ($returnValue !== false) {
return $returnValue;
}
}
// @codeCoverageIgnoreStart
// I don't think any of the code below should ever be executed.
if (self::getIsIconvEnabled()) {
$returnValue = @iconv('UTF-8', 'UTF-8', $textValue);
if ($returnValue !== false) {
return $returnValue;
}
}

$subst = mb_substitute_character(); // default is question mark
mb_substitute_character(65533); // Unicode substitution character
// Phpstan does not think this can return false.
$returnValue = mb_convert_encoding($textValue, 'UTF-8', 'UTF-8');
mb_substitute_character($subst);

return $returnValue;
// @codeCoverageIgnoreEnd
}

/**
Expand Down

0 comments on commit 0492ea6

Please sign in to comment.