Skip to content

Commit

Permalink
fix: Fix unicode Regex miscounting emoji length (#2942)
Browse files Browse the repository at this point in the history
Many emojis are 2+ unicode bytes long. The \u tag which allows searching for punctuation also counts emojis as single chars. Slicing the strings into an array restores the correct character count.
  • Loading branch information
calculuschild authored Aug 15, 2023
1 parent 8fb1711 commit f3af23e
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 3 deletions.
7 changes: 4 additions & 3 deletions src/Tokenizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -625,7 +625,8 @@ export class _Tokenizer {
const nextChar = match[1] || match[2] || '';

if (!nextChar || !prevChar || this.rules.inline.punctuation.exec(prevChar)) {
const lLength = match[0].length - 1;
// unicode Regex counts emoji as 1 char; spread into array for proper count (used multiple times below)
const lLength = [...match[0]].length - 1;
let rDelim, rLength, delimTotal = lLength, midDelimTotal = 0;

const endReg = match[0][0] === '*' ? this.rules.inline.emStrong.rDelimAst : this.rules.inline.emStrong.rDelimUnd;
Expand All @@ -639,7 +640,7 @@ export class _Tokenizer {

if (!rDelim) continue; // skip single * in __abc*abc__

rLength = rDelim.length;
rLength = [...rDelim].length;

if (match[3] || match[4]) { // found another Left Delim
delimTotal += rLength;
Expand All @@ -658,7 +659,7 @@ export class _Tokenizer {
// Remove extra characters. *a*** -> *a*
rLength = Math.min(rLength, rLength + delimTotal + midDelimTotal);

const raw = src.slice(0, lLength + match.index + rLength + 1);
const raw = [...src].slice(0, lLength + match.index + rLength + 1).join('');

// Create `em` if smallest delimiter has odd char count. *a***
if (Math.min(lLength, rLength) % 2) {
Expand Down
11 changes: 11 additions & 0 deletions test/specs/new/emoji_inline.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<p>Situations where it fails:</p>
<p><strong>test πŸ’</strong></p>
<p><strong>πŸ’ test</strong></p>
<p><strong>πŸ€“ test</strong></p>
<p><strong>πŸ–οΈ test</strong></p>
<p><strong>πŸ–οΈπŸ€“πŸ’ test</strong></p>
<p>Situations where it works:</p>
<p>**πŸ’ **</p>
<p><strong>⚠️ test</strong></p>
<p>Here, the emoji rendering works, but the text doesn't get rendered in italic.</p>
<p><em>πŸ’ test</em></p>
21 changes: 21 additions & 0 deletions test/specs/new/emoji_inline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Situations where it fails:

**test πŸ’**

**πŸ’ test**

**πŸ€“ test**

**πŸ–οΈ test**

**πŸ–οΈπŸ€“πŸ’ test**

Situations where it works:

**πŸ’ **

**⚠️ test**

Here, the emoji rendering works, but the text doesn't get rendered in italic.

*πŸ’ test*

1 comment on commit f3af23e

@vercel
Copy link

@vercel vercel bot commented on f3af23e Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

marked-website – ./

markedjs.vercel.app
marked-website-git-master-markedjs.vercel.app
marked-website-markedjs.vercel.app
marked.js.org

Please sign in to comment.