-
-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Punctiation regexp is incomplete #108
Comments
PS: that said, the example above is a fragment from chinese text here:
I don't know the language, but they don't seem to use whitespace at all. Maybe commonmark rules don't quite work with chinese. Still, commonmark.js works differently from whatever is written in the spec, so I opened bugreport here. |
Thanks! |
@jgm i'd recommend to |
markdown-it does not use commonmark.js.
cinty8b <notifications@github.com> writes:
… It seems the problem is still there.
![snipaste_2018-09-29_00-45-15](https://user-images.githubusercontent.com/5980459/46221831-1bd5e400-c381-11e8-97da-0a629f66df38.jpg)
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#108 (comment)
|
That's correct, but it uses existing spec for tests. Current sample from commit pass without issues. @cinty8b, i'd recommend you to make reports more detailed. Screenshots are very inconvenient, and almost useless:
May be that's new issue. @jgm IMHO existing fix with hardcoded regexp is not human-readable and not maintainable - nobody knows data source and how actual is it. If you don't like to use external packages - it worth add comments with link to original & unicode version number. |
Sorry for the screenshot. Tried https://spec.commonmark.org/dingus/ , it works the same as markdown-it. |
This behavior accords with the spec. (So, it is
"expected" in that sense.)
brown*fox.*jumps
Here the second `*` delimiter is not "right-flanking."
Notice that this gives you emphasis:
brown*fox.*.jumps
because now the second `*` is both right- and left-
flanking. If you really need the `fox.` to be
emphasized here, you could try inserting a zero-width
space:
the brown*fox.*​jumps
If you think this is a flaw in the spec, you could
bring it up on talk.commonmark.org. But be aware that
there are always tradeoffs; the question is whether
there's an improvement that could be made without
messing up other things we currently get right.
|
@jgm @puzrin Thanks for your explanation and possible workaround. In English there are always spaces and punctuations to seperate words, so it is rare that a part of a continuous string has to be emphasized or italic with an ending or starting punctuation, like But it's different in Chinese. We seldom use spaces in sentences. Punctuations do almost all the seperating work in a paragraph. As a result, it's common in Chinese that I want to emphasize a sentence with its period together ( I hope I make it clear. |
@cinty8b AFAIK, there are some known spec issues with asian languages (no spaces), without good resolution. With high probability this one was discussed at commonmark forum. Try to post there. Probably, it worth to kick such topic again. |
Vitaly Puzrin <notifications@github.com> writes:
@cinty8b AFAIK, there are some known spec issues with asian languages (no spaces), without good resolution. With high probability this one was discussed at commonmark forum. Try to post there. Probably, it worth to kick such topic again.
Here are some relevant links:
https://talk.commonmark.org/t/emphasis-and-east-asian-text/2491
commonmark/cmark#208 (comment)
|
This is due to the fact that the definitions of left- & right-flanking delimiter run introduced in CM 0.14+ are designed under the erroneous assumption that all languages (including Chinese and Japanese!) included spaces around punctuation marks. 当社の**[製品A](https://example.com/product-a)**をぜひお試しください! If the spec were revised based on commonmark/commonmark-spec#650 (comment), most cases would be improved. |
I came across a discrepancy between cmark and commonmark.js output:
So, according to spec v26,
Character "。" or U+3002 belongs to a class
Punctuation, Other [Po]
(see http://www.fileformat.info/info/unicode/char/3002/index.htm), but it's not included here:https://github.com/jgm/commonmark.js/blob/3587c91c62128e54a236648ff1ac4a1ad1cd5ad8/lib/inlines.js#L41
For the reference, here's the regexp from unicode-8.0.0 package (we're using that in markdown-it), which includes this character (and appears to be a lot larger):
https://github.com/mathiasbynens/unicode-8.0.0/blob/master/General_Category/Punctuation/regex.js
The text was updated successfully, but these errors were encountered: