Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for rendering CJK glyphs top-to-bottom #3402

Closed
wants to merge 26 commits into from

Conversation

lucaswoj
Copy link
Contributor

@lucaswoj lucaswoj commented Oct 18, 2016

fixes #1246
see also mapbox/mapbox-gl-native#1682

Requirements

  • GL JS must render CJK glyphs top-to-bottom along vertical lines, as is conventional in cartographic design

vertical-label-rotation

(this PR does not add support for top-to-bottom point labels or mixed orientation glyphs within a single label)

Specifications

  • we will use naïve "balanced" breaking
  • we will enable / disable top-to-bottom labels based on language detection
  • we will use character ranges for language detection (data)
  • if a single label has mixed CJK/non-CJK glyphs, we will determine the appropriate glyph orientation.

Launch Checklist

  • remove all dead code
  • define all jargon relevant to this project
  • do a code review (@lucaswoj)
  • refactor code to use "top-to-bottom" terminology rather than "vertical" terminology
  • fix glyph centerline alignment for top-to-bottom labels
  • fix bug causing "top-to-bottom" labels to disappear at certain bearings (code)
  • fix failing unit tests
  • support automatically enabling / disabling top-to-bottom labels based on script detection
  • write test-suite tests for top-to-bottom labels
  • document any changes to public APIs
  • post benchmark scores
  • manually rebase onto master (copy-paste style 🍝 )

@lucaswoj lucaswoj changed the title Add support for orienting CJK glyphs along north-south lines vertically Add support for rendering CJK glyphs top-to-bottom along north-south lines Oct 18, 2016
lineHeight, horizontalAlign, verticalAlign, justify, spacing, textOffset);
lineHeight, horizontalAlign, verticalAlign, justify, spacing, textOffset, oneEm, verticalOrientation);

if (layout['text-rotation-alignment'] === 'map' && layout['symbol-placement'] === 'line') {
Copy link
Contributor

@1ec5 1ec5 Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid verticalizing non-CJK* text, match textFeatures[k] against this regular expression, courtesy of Wiktionary:

&& !/[^ᄀ-ᇿ가-힣ㄱ-ㆎ一-鿌㐀-䶵 -〿𠀀-𬺯!-○ぁ-ゟ゠-ヿㇰ-ㇿꀀ-꓆᠀-ᢪ]/.exec(textFeatures[k])

* For the purpose of this PR, “CJK” is Hangul, Hanzi, Hiragana, Katakana, Mongolian, and Yi scripts. However, note that Hangul and Mongolian words are delimited by spaces and thus should retain the Latin-style line breaking algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(You’ll need to make the regular expression a little more lenient to allow numerals and punctuation.)

Copy link
Contributor

@1ec5 1ec5 Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaScript can’t handle characters above U+FFFF in character classes – like emoji! 😛 – so the regular expression will need to be a tad more complicated to detect Hanzi from 𠀀 onwards. Specifically, we’ll need to capture anything from \uD840\uDC00 to \uD873\uDEAF, inclusive. (Here’s a very handy tool for calculating surrogate pairs.) My weary Tuesday-evening eyes aren’t helping me come with the correct regex.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1ec5 thanks, will look at this tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1ec5 Can we use plain ol' inequalities rather than a regex?

Copy link
Contributor

@1ec5 1ec5 Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, whichever is easier to maintain and more performant. It probably doesn’t make a big difference either way.

Character classes can contain \uxxxx character references instead of Unicode literals, if that’s your concern. The surrogate pair issue remains because of JavaScript’s string encoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that since we switched to Buble, we can now use u RegExp flag that allows using any unicode characters in regexps (they get transpiled to \u....).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Babel doesn’t change the fact that JavaScript strings (and therefore regular expressions) must represent characters above U+FFFF as surrogate pairs.

@1ec5 1ec5 changed the title Add support for rendering CJK glyphs top-to-bottom along north-south lines Add support for rendering CJK glyphs top-to-bottom Oct 18, 2016
0xb7: true, // middle dot
0x200b: true, // zero-width space
0x2010: true, // hyphen
0x2013: true // en dash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to re-add these ☝️ to the breakable lookup

@1ec5
Copy link
Contributor

1ec5 commented Oct 18, 2016

support automatically enabling / disabling vertical labels based on language detection

FYI, we don’t need language detection, only script detection. (As far as I can tell, there isn’t a major script that was traditionally written vertically for one language but never vertically for another.) So looking at Unicode codepoints is sufficient for this task.

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Oct 18, 2016

Some notes as I try to grock the symbol placement code in GL JS

Glossary

Classes & Interfaces

Anchor

The geographic coordiante and, optionally, line segment that determines the position of a symbol.

interface Anchor {
    x: number;
    y: number
    angle: number;
    segment: ?number; // ?
}

Glyph

The bitmap and dimensions of a particular character in a particular font.

interface Glyph {
    id: number;
    bitmap: any; // ?
    width: number;
    height: number;
    left: number;
    top: number;
    advance: number; // glyph-specific right-edge padding
}

CollisionBox

A rectangular area of the map that is covered by a source feature. Each source feature may have multiple CollisionBoxes.

interface CollisionBox {
    anchorPointX: number;
    anchorPointY: number;
    x1: number;
    y1: number;
    x2: number;
    y2: number;
    maxScale: number;
    featureIndex: number;
    sourceLayerIndex: number;
    bucketIndex: number;
    bbox0: number;
    bbox1: number;
    bbox2: number;
    bbox3: number;
    placementScale: number;
}

CollisionFeature

The set of all collision boxes for a source feature.

interface CollisionFeature {
    boxStartIndex: number;
    boxEndIndex: number;
    boxes: any; // ?
}

CollisionTile

The set of all CollisionFeatures for a map tile

interface CollisionTile {
    grid: GridIndex;
    ignoredGrid: GridIndex;
    angle: number;
    pitch: number;
    rotationMatrix: [number, number, number, number];
    reverseRotationMatrix: [number, number, number, number];
    yStretch: number;
    collisionBoxArray: StructArray<CollisionBox>;
    tempCollisionBox: CollisionBox;
    edges: [CollisionBox, CollisionBox, CollisionBox, CollisionBox];
    minScale: number;
}

SymbolQuad in SymbolQuadsArray

interface SymbolQuad {
    anchorPointX: number;
    anchorPointY: number;
    tlX: number;
    tlY: number;
    trX: number;
    trY: number;
    blX: number;
    blY: number;
    brX: number;
    brY: number;
    texH: number;
    texW: number;
    texX: number;
    texY: number;
    anchorAngle: number;
    glyphAngle: number;
    maxScale: number;
    minScale: number;
}

SymbolQuad otherwise

An icon or glpyh, its coordinates, and its size for rendering

interface SymbolQuad {
    anchorPoint: Point; // ?
    tl: Point;
    tr: Point;
    bl: Point;
    br: Point;
    tex: Object; // ?
    anchorAngle: number;
    glyphAngle: number;
    minScale: number;
    maxScale: number;
}

Shaping

A collection of positioned glyphs and their position on screen. Contains multiple orientations of each glyph. The best orientation for the current map bearing is chosen at render time.

interface Shaping {
    positionedGlyphs: Array<PositionedGlyph>;
    text: string;
    top: number;
    bottom: number;
    left: number;
    right: number;
}

PositionedGlyph

interface PositionedGlyph {
    codePoint: number;
    x: number;
    y: number;
    glyph: Glyph;
}

PositionedIcon

interface PositionedIcon {
    image: any; // ?
    top: numbr;
    bottom: numbr;
    left: numbr;
    right: numbr;
}

SymbolBucket

The collision tile, symbol quads, and symbol instances for a map tile

interface SymbolBucket {
    :grimacing:
}

SymbolInstance

interface SymbolInstance {
    textBoxStartIndex: number;
    textBoxEndIndex: number;
    iconBoxStartIndex: number;
    iconBoxEndIndex: number;
    glyphQuadStartIndex: number;
    glyphQuadEndIndex: number;
    iconQuadStartIndex: number;
    iconQuadEndIndex: number;
    anchorPointX: number;
    anchorPointY: number;
    index: number;
}

Misc Terminology

  • box scale zoom-specific scaling factor used to convert between glyph units and geometry units
  • CJK: acronym for "Chinese, Japanese, and Korean", the three languages that require top-to-bottom orientation
  • leading: distance between the baselines of subsequent lines of text (see also advance)
  • orientation: the direction in which a text is rendered: top-to-bottom (CJK-only) or left-to-right
  • shaped icon: a Shaping containing an icon
  • shaped text: a Shaping containing glyphs
  • text feature: the text string associated with a particular feature

@1ec5
Copy link
Contributor

1ec5 commented Oct 18, 2016

CJK: acronym for "Chinese, Japanese, and Korean", the three languages that require top-to-bottom orientation

Crossposted from #3402 (comment): For the purpose of this PR, “CJK” is Hangul, Hanzi/Hanja/Kanji, Hiragana, Katakana, Mongolian, and Yi scripts (roughly corresponding to the Chinese, Japanese, Korean, Mongolian, and Yi languages). However, note that Hangul and Mongolian words are delimited by spaces and thus should retain the Latin-style line breaking algorithm. If vertical, space-delimited Hangul and Mongolian is a problem, they can be horizontal for now.

@friedbunny
Copy link
Contributor

Per chat w/@1ec5, for Japanese it would probably be reasonable to exclude names that include romaji (roman characters) from verticalization. It can be done, but horizontal seems to be generally preferred (especially if a name only contains romaji).

We’ll have to contend with fullwidth variants, as well.

@1ec5
Copy link
Contributor

1ec5 commented Oct 19, 2016

GL JS must render CJK glyphs top-to-bottom when appropriate

#3402 (comment) addresses horizontal/vertical switching based on scripts.

Even in the context of a Chinese-only map, laying out all labels vertically only makes sense for an archaic-looking style. (Think yellowed background and calligraphic fonts.) A modern-looking style would typically lay out point-placed labels horizontally but fall back to a vertical layout to avoid collision. Meanwhile, line-placed labels would be laid out horizontally or vertically based on the angle of the road, in an attempt to avoid rotating glyphs beyond 45°. (See mapbox/mapbox-gl-native#1682 for further discussion.)

If there’s a need to land this feature before handling those nuances, I recommend placing vertical layout behind a style specification property such as writing-mode: traditional. Per-character line breaking (i.e., word-break: break-all), as described in mapbox/mapbox-gl-native#1223, remains the highest priority for general-purpose Chinese text, beyond vertical text fallback.

@nickidlugash
Copy link

Even in the context of a Chinese-only map, laying out all labels vertically only makes sense for an archaic-looking style. (Think yellowed background and calligraphic fonts.) A modern-looking style would typically lay out point-placed labels horizontally but fall back to a vertical layout to avoid collision.

We are not implementing vertical labels for point placement.

Meanwhile, line-placed labels would be laid our horizontally or vertically based on the angle of the road, in an attempt to avoid rotating glyphs beyond 45°.

Yes, this is what this PR does.

continue;
// ^^^ this check is where all the vertical labels are being skipped
}
}*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucaswoj this is a check that I temporarily commented out because it was behaving strangely with vertical labels, but should be kept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up! I'll restore this in a sec.

@jfirebaugh jfirebaugh mentioned this pull request Oct 19, 2016
77 tasks
@lucaswoj
Copy link
Contributor Author

Rebased and debugged 👉 #3438

@lucaswoj lucaswoj closed this Oct 21, 2016
@lucaswoj lucaswoj deleted the cjk-vertical-labels-2 branch October 21, 2016 20:32
@lucaswoj lucaswoj restored the cjk-vertical-labels-2 branch October 21, 2016 20:32
@jfirebaugh jfirebaugh deleted the cjk-vertical-labels-2 branch February 3, 2017 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Orient CJK glyphs vertically along vertically oriented lines
6 participants