Unicode.Char.General
General character property related functions.
Types of Code Points
data CodePointType Source #
Types of Code Points.
These classes are defined in the section 2.4 “Code Points and Characters” of the Unicode standard.
Since: 0.4.1
Constructors
GraphicType | Graphic: defined by the following general categories:
|
FormatType | Format: invisible but affects neighboring characters. Defined by the following general categories:
|
ControlType | Control: usage defined by protocols or standards outside the Unicode Standard. Defined by the general category |
PrivateUseType | Private-use: usage defined by private agreement outside the Unicode Standard. Defined by the general category |
SurrogateType | Surrogate: Permanently reserved for UTF-16. Defined by the general category |
NoncharacterType | Noncharacter: a code point that is permanently reserved for
internal use (see definition D14 in the section
3.4 “Characters and Encoding”
of the Unicode Standard).
Noncharacters consist of the values They are a subset of the general category |
ReservedType | Reserved: any code point of the Unicode Standard that is reserved for future assignment (see definition D15 in the section 3.4 “Characters and Encoding” of the Unicode Standard). Also known as an unassigned code point. They are a subset of the general category |
Instances
codePointType :: Char -> CodePointType Source #
Returns the CodePointType
of a character.
Since: 0.6.0
Unicode general categories
data GeneralCategory Source #
Unicode General Categories.
These classes are defined in the Unicode Character Database, part of the Unicode standard.
Note: the classes must be in the same order they are listed in the Unicode Standard,
because some functions (e.g. generalCategory
) rely on the Enum
instance.
Since: 0.3.0
Constructors
UppercaseLetter |
|
LowercaseLetter |
|
TitlecaseLetter |
|
ModifierLetter |
|
OtherLetter |
|
NonSpacingMark |
|
SpacingCombiningMark |
|
EnclosingMark |
|
DecimalNumber |
|
LetterNumber |
|
OtherNumber |
|
ConnectorPunctuation |
|
DashPunctuation |
|
OpenPunctuation |
|
ClosePunctuation |
|
InitialQuote |
|
FinalQuote |
|
OtherPunctuation |
|
MathSymbol |
|
CurrencySymbol |
|
ModifierSymbol |
|
OtherSymbol |
|
Space |
|
LineSeparator |
|
ParagraphSeparator |
|
Control |
|
Format |
|
Surrogate |
|
PrivateUse |
|
NotAssigned |
|
Instances
generalCategoryAbbr :: GeneralCategory -> String Source #
Abbreviation of GeneralCategory
used in the Unicode standard.
Since: 0.3.0
generalCategory :: Char -> GeneralCategory Source #
The Unicode general category of the character.
This property is defined in the column 2 of the UnicodeData
table.
This relies on the Enum
instance of GeneralCategory
, which must remain in the
same order as the categories are presented in the Unicode standard.
show (generalCategory c) == show (Data.Char.generalCategory c)
Since: 0.3.0
Character classification
isAlphabetic :: Char -> Bool Source #
Returns True
for alphabetic Unicode characters (lower-case, upper-case
and title-case letters, plus letters of caseless scripts and modifiers
letters).
Note: this function is not equivalent to
isAlpha
/ isLetter
:
isAlpha
matches the following general categories:UppercaseLetter
(Lu
)LowercaseLetter
(Ll
)TitlecaseLetter
(Lt
)ModifierLetter
(Lm
)OtherLetter
(Lo
)
whereas
isAlphabetic
matches:Uppercase
propertyLowercase
propertyTitlecaseLetter
(Lt
)ModifierLetter
(Lm
)OtherLetter
(Lo
)LetterNumber
(Nl
)Other_Alphabetic
property
Since: 0.3.0
isAlphaNum :: Char -> Bool Source #
Deprecated: Use Unicode.Char.General.Compat.isAlphaNum instead.
Selects alphabetic or numeric Unicode characters.
This function returns True
if its argument has one of the
following GeneralCategory
s, or False
otherwise:
UppercaseLetter
LowercaseLetter
TitlecaseLetter
ModifierLetter
OtherLetter
DecimalNumber
LetterNumber
OtherNumber
isAlphaNum c == Data.Char.isAlphaNum c
Note: this function is incompatible with isAlphabetic
:
>>>
isAlphabetic '\x345'
True>>>
isAlphaNum '\x345'
False
Since: 0.3.0
isControl :: Char -> Bool Source #
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
This function returns True
if its argument has the GeneralCategory
Control
.
isControl c == Data.Char.isControl c
Since: 0.3.0
isMark :: Char -> Bool Source #
Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.
This function returns True
if its argument has one of the
following GeneralCategory
s, or False
otherwise:
isMark c == Data.Char.isMark c
Since: 0.3.0
isPrint :: Char -> Bool Source #
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
This function returns False
if its argument has one of the
following GeneralCategory
s, or True
otherwise:
isPrint c == Data.Char.isPrint c
Since: 0.3.0
isPunctuation :: Char -> Bool Source #
Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.
This function returns True
if its argument has one of the
following GeneralCategory
s, or False
otherwise:
ConnectorPunctuation
DashPunctuation
OpenPunctuation
ClosePunctuation
InitialQuote
FinalQuote
OtherPunctuation
isPunctuation c == Data.Char.isPunctuation c
Since: 0.3.0
isSeparator :: Char -> Bool Source #
Selects Unicode space and separator characters.
This function returns True
if its argument has one of the
following GeneralCategory
s, or False
otherwise:
isSeparator c == Data.Char.isSeparator c
Since: 0.3.0
isSymbol :: Char -> Bool Source #
Selects Unicode symbol characters, including mathematical and currency symbols.
This function returns True
if its argument has one of the
following GeneralCategory
s, or False
otherwise:
* MathSymbol
* CurrencySymbol
* ModifierSymbol
* OtherSymbol
isSymbol c == Data.Char.isSymbol c
Since: 0.3.0
isWhiteSpace :: Char -> Bool Source #
Returns True
for any whitespace characters, and the control
characters \t
, \n
, \r
, \f
, \v
.
See: Unicode White_Space
.
Note: isWhiteSpace
is not equivalent to isSpace
.
isWhiteSpace
selects the same characters from isSpace
plus the following:
U+0085
NEXT LINE (NEL)U+2028
LINE SEPARATORU+2029
PARAGRAPH SEPARATOR
Since: 0.3.0
isNoncharacter :: Char -> Bool Source #
Returns True
for any noncharacter.
A noncharacter is a code point that is permanently reserved for internal use (see definition D14 in the section 3.4 “Characters and Encoding” of the Unicode Standard).
Noncharacters consist of the values U+nFFFE
and U+nFFFF
(where n
is from 0 to 10₁₆) and the values U+FDD0..U+FDEF
.
Since: 0.6.0
Re-export
isAscii :: Char -> Bool Source #
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
isLatin1 :: Char -> Bool Source #
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
isAsciiUpper :: Char -> Bool Source #
isAsciiLower :: Char -> Bool Source #
Korean Hangul Characters
The Hangul script used in the Korean writing system consists of individual consonant and vowel letters (jamo) that are visually combined into square display cells to form entire syllable blocks. Hangul syllables may be encoded directly as precomposed combinations of individual jamo or as decomposed sequences of conjoining jamo. Modern Hangul syllable blocks can be expressed with either two or three jamo, either in the form consonant + vowel or in the form consonant + vowel + consonant. The leading consonant is represented as L, the vowel as V and the trailing consonant as T.
The Unicode Standard contains both a large set of precomposed modern Hangul syllables and a set of conjoining Hangul jamo, which can be used to encode archaic Korean syllable blocks as well as modern Korean syllable blocks.
Hangul characters can be composed or decomposed algorithmically instead of via mappings. These APIs are used mainly for Unicode normalization of Hangul text.
Please refer to the following resources for more information:
- The
Hangul
section of theEast Asia
chapter of the Unicode Standard - Conformance chapter of the Unicode Standard
- Unicode® Standard Annex #15 - Unicode Normalization Forms
- UCD file
HangulSyllableType.txt
- https://en.wikipedia.org/wiki/Hangul_Jamo_(Unicode_block)
- https://en.wikipedia.org/wiki/List_of_Hangul_jamo
Conjoining Jamo
Jamo L, V and T letters.
isJamo :: Char -> Bool Source #
Determine whether a character is a jamo L, V or T character.
Since: 0.1.0
jamoNCount :: Int Source #
Total count of all jamo characters.
jamoNCount = jamoVCount * jamoTCount
Since: 0.1.0
Jamo Leading (L)
jamoLFirst :: Int Source #
First leading consonant jamo.
Since: 0.1.0
jamoLCount :: Int Source #
Total count of leading consonant jamo.
Since: 0.3.0
jamoLIndex :: Char -> Maybe Int Source #
Given a Unicode character, if it is a leading jamo, return its index in
the list of leading jamo consonants, otherwise return Nothing
.
Since: 0.1.0
Jamo Vowel (V)
jamoVFirst :: Int Source #
First vowel jamo.
Since: 0.1.0
jamoVCount :: Int Source #
Total count of vowel jamo.
Since: 0.1.0
jamoVIndex :: Char -> Maybe Int Source #
Given a Unicode character, if it is a vowel jamo, return its index in the
list of vowel jamo, otherwise return Nothing
.
Since: 0.1.0
Jamo Trailing (T)
jamoTFirst :: Int Source #
The first trailing consonant jamo.
Note that jamoTFirst
does not represent a valid T, it represents a missing
T i.e. LV without a T. See comments under jamoTIndex
.
Since: 0.1.0
jamoTCount :: Int Source #
Total count of trailing consonant jamo.
Since: 0.1.0
jamoTIndex :: Char -> Maybe Int Source #
Given a Unicode character, if it is a trailing jamo consonant, return its
index in the list of trailing jamo consonants, otherwise return Nothing
.
Note that index 0 is not a valid index for a trailing consonant. Index 0 corresponds to an LV syllable, without a T. See "Hangul Syllable Decomposition" in the Conformance chapter of the Unicode standard for more details.
Since: 0.1.0
Hangul Syllables
Precomposed Hangul syllables.
hangulFirst :: Int Source #
Codepoint of the first pre-composed Hangul character.
Since: 0.1.0
hangulLast :: Int Source #
Codepoint of the last Hangul character.
Since: 0.1.0
isHangul :: Char -> Bool Source #
Determine if the given character is a precomposed Hangul syllable.
Since: 0.1.0
isHangulLV :: Char -> Bool Source #
Determine if the given character is a Hangul LV syllable.
Note: this function requires a precomposed Hangul syllable but does not
check it. Use isHangul
to check the input character before passing it to
isHangulLV
.
Since: 0.1.0