Unicode.Char.General

General character property related functions.

Types of Code Points

Types of Code Points.

These classes are defined in the section 2.4 “Code Points and Characters” of the Unicode standard.

Since: 0.4.1

Constructors

GraphicType	Graphic: defined by the following general categories: Letters (L): `UppercaseLetter`, `LowercaseLetter`, `TitlecaseLetter`, `ModifierLetter`, `OtherLetter`. Marks (M): `NonSpacingMark`, `SpacingCombiningMark`, `EnclosingMark`. Numbers (N): `DecimalNumber`, `LetterNumber`, `OtherNumber`. Punctuation (P): `ConnectorPunctuation`, `DashPunctuation`, `OpenPunctuation`, `ClosePunctuation`, `InitialQuote`, `FinalQuote`, `OtherPunctuation`. Symbol (S): `MathSymbol`, `CurrencySymbol`, `ModifierSymbol`, `OtherSymbol`. Separators: `Space`.
FormatType	Format: invisible but affects neighboring characters. Defined by the following general categories: `LineSeparator`, `ParagraphSeparator`, `Format`.
ControlType	Control: usage defined by protocols or standards outside the Unicode Standard. Defined by the general category `Control`.
PrivateUseType	Private-use: usage defined by private agreement outside the Unicode Standard. Defined by the general category `PrivateUse`.
SurrogateType	Surrogate: Permanently reserved for UTF-16. Defined by the general category `Surrogate`.
NoncharacterType	Noncharacter: a code point that is permanently reserved for internal use (see definition D14 in the section 3.4 “Characters and Encoding” of the Unicode Standard). Noncharacters consist of the values `U+nFFFE` and `U+nFFFF` (where `n` is from 0 to 10₁₆) and the values `U+FDD0..U+FDEF`. They are a subset of the general category `NotAssigned`.
ReservedType	Reserved: any code point of the Unicode Standard that is reserved for future assignment (see definition D15 in the section 3.4 “Characters and Encoding” of the Unicode Standard). Also known as an unassigned code point. They are a subset of the general category `NotAssigned`.

Instances

Instances details

Bounded CodePointType Source #
Instance details Defined in Unicode.Char.General Methods minBound :: CodePointType Source # maxBound :: CodePointType Source #
Enum CodePointType Source #
Instance details Defined in Unicode.Char.General Methods succ :: CodePointType -> CodePointType Source # pred :: CodePointType -> CodePointType Source # toEnum :: Int -> CodePointType Source # fromEnum :: CodePointType -> Int Source # enumFrom :: CodePointType -> [CodePointType] Source # enumFromThen :: CodePointType -> CodePointType -> [CodePointType] Source # enumFromTo :: CodePointType -> CodePointType -> [CodePointType] Source # enumFromThenTo :: CodePointType -> CodePointType -> CodePointType -> [CodePointType] Source #
Ix CodePointType Source #
Instance details Defined in Unicode.Char.General Methods range :: (CodePointType, CodePointType) -> [CodePointType] Source # index :: (CodePointType, CodePointType) -> CodePointType -> Int Source # unsafeIndex :: (CodePointType, CodePointType) -> CodePointType -> Int Source # inRange :: (CodePointType, CodePointType) -> CodePointType -> Bool Source # rangeSize :: (CodePointType, CodePointType) -> Int Source # unsafeRangeSize :: (CodePointType, CodePointType) -> Int Source #
Show CodePointType Source #
Instance details Defined in Unicode.Char.General Methods showsPrec :: Int -> CodePointType -> ShowS Source # show :: CodePointType -> String Source # showList :: [CodePointType] -> ShowS Source #
Eq CodePointType Source #
Instance details Defined in Unicode.Char.General Methods (==) :: CodePointType -> CodePointType -> Bool Source # (/=) :: CodePointType -> CodePointType -> Bool Source #
Ord CodePointType Source #
Instance details Defined in Unicode.Char.General Methods compare :: CodePointType -> CodePointType -> Ordering Source # (<) :: CodePointType -> CodePointType -> Bool Source # (<=) :: CodePointType -> CodePointType -> Bool Source # (>) :: CodePointType -> CodePointType -> Bool Source # (>=) :: CodePointType -> CodePointType -> Bool Source # max :: CodePointType -> CodePointType -> CodePointType Source # min :: CodePointType -> CodePointType -> CodePointType Source #

codePointType :: Char -> CodePointType Source #

Returns the CodePointType of a character.

Since: 0.6.0

Character classification

isAlphabetic :: Char -> Bool Source #

Returns True for alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters).

Note: this function is not equivalent to isAlpha / isLetter:

isAlpha matches the following general categories:
- UppercaseLetter (Lu)
- LowercaseLetter (Ll)
- TitlecaseLetter (Lt)
- ModifierLetter (Lm)
- OtherLetter (Lo)
whereas isAlphabetic matches:
- Uppercase property
- Lowercase property
- TitlecaseLetter (Lt)
- ModifierLetter (Lm)
- OtherLetter (Lo)
- LetterNumber (Nl)
- Other_Alphabetic property

Since: 0.3.0

isAlphaNum :: Char -> Bool Source #

Deprecated: Use Unicode.Char.General.Compat.isAlphaNum instead.

Selects alphabetic or numeric Unicode characters.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isAlphaNum c == Data.Char.isAlphaNum c

Note: this function is incompatible with isAlphabetic:

>>> isAlphabetic '\x345'
True
>>> isAlphaNum '\x345'
False

Since: 0.3.0

isControl :: Char -> Bool Source #

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

This function returns True if its argument has the GeneralCategory Control.

isControl c == Data.Char.isControl c

Since: 0.3.0

isMark :: Char -> Bool Source #

Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isMark c == Data.Char.isMark c

Since: 0.3.0

isPrint :: Char -> Bool Source #

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

This function returns False if its argument has one of the following GeneralCategorys, or True otherwise:

isPrint c == Data.Char.isPrint c

Since: 0.3.0

isPunctuation :: Char -> Bool Source #

Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isPunctuation c == Data.Char.isPunctuation c

Since: 0.3.0

isSeparator :: Char -> Bool Source #

Selects Unicode space and separator characters.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isSeparator c == Data.Char.isSeparator c

Since: 0.3.0

isSymbol :: Char -> Bool Source #

Selects Unicode symbol characters, including mathematical and currency symbols.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise: * MathSymbol * CurrencySymbol * ModifierSymbol * OtherSymbol

isSymbol c == Data.Char.isSymbol c

Since: 0.3.0

isWhiteSpace :: Char -> Bool Source #

Returns True for any whitespace characters, and the control characters \t, \n, \r, \f, \v.

See: Unicode White_Space.

Note: isWhiteSpace is not equivalent to isSpace. isWhiteSpace selects the same characters from isSpace plus the following:

U+0085 NEXT LINE (NEL)
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

Since: 0.3.0

isNoncharacter :: Char -> Bool Source #

Returns True for any noncharacter.

A noncharacter is a code point that is permanently reserved for internal use (see definition D14 in the section 3.4 “Characters and Encoding” of the Unicode Standard).

Noncharacters consist of the values U+nFFFE and U+nFFFF (where n is from 0 to 10₁₆) and the values U+FDD0..U+FDEF.

Since: 0.6.0

Re-export

isAscii :: Char -> Bool Source #

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool Source #

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isAsciiUpper :: Char -> Bool Source #

Selects ASCII upper-case letters, i.e. characters satisfying both isAscii and isUpper.

isAsciiLower :: Char -> Bool Source #

Selects ASCII lower-case letters, i.e. characters satisfying both isAscii and isLower.

The Hangul script used in the Korean writing system consists of individual consonant and vowel letters (jamo) that are visually combined into square display cells to form entire syllable blocks. Hangul syllables may be encoded directly as precomposed combinations of individual jamo or as decomposed sequences of conjoining jamo. Modern Hangul syllable blocks can be expressed with either two or three jamo, either in the form consonant + vowel or in the form consonant + vowel + consonant. The leading consonant is represented as L, the vowel as V and the trailing consonant as T.

The Unicode Standard contains both a large set of precomposed modern Hangul syllables and a set of conjoining Hangul jamo, which can be used to encode archaic Korean syllable blocks as well as modern Korean syllable blocks.

Hangul characters can be composed or decomposed algorithmically instead of via mappings. These APIs are used mainly for Unicode normalization of Hangul text.

Please refer to the following resources for more information:

The Hangul section of the East Asia chapter of the Unicode Standard
Conformance chapter of the Unicode Standard
Unicode® Standard Annex #15 - Unicode Normalization Forms
UCD file HangulSyllableType.txt
https://en.wikipedia.org/wiki/Hangul_Jamo_(Unicode_block)
https://en.wikipedia.org/wiki/List_of_Hangul_jamo

Conjoining Jamo

Jamo L, V and T letters.

isJamo :: Char -> Bool Source #

Determine whether a character is a jamo L, V or T character.

Since: 0.1.0

jamoNCount :: Int Source #

Total count of all jamo characters.

jamoNCount = jamoVCount * jamoTCount

Since: 0.1.0

Jamo Leading (L)

jamoLFirst :: Int Source #

First leading consonant jamo.

Since: 0.1.0

jamoLCount :: Int Source #

Total count of leading consonant jamo.

Since: 0.3.0

jamoLIndex :: Char -> Maybe Int Source #

Given a Unicode character, if it is a leading jamo, return its index in the list of leading jamo consonants, otherwise return Nothing.

Since: 0.1.0

jamoLLast :: Int Source #

Last leading consonant jamo.

Since: 0.1.0

Jamo Vowel (V)

jamoVFirst :: Int Source #

First vowel jamo.

Since: 0.1.0

jamoVCount :: Int Source #

Total count of vowel jamo.

Since: 0.1.0

jamoVIndex :: Char -> Maybe Int Source #

Given a Unicode character, if it is a vowel jamo, return its index in the list of vowel jamo, otherwise return Nothing.

Since: 0.1.0

jamoVLast :: Int Source #

Last vowel jamo.

Since: 0.1.0

Jamo Trailing (T)

jamoTFirst :: Int Source #

The first trailing consonant jamo.

Note that jamoTFirst does not represent a valid T, it represents a missing T i.e. LV without a T. See comments under jamoTIndex .

Since: 0.1.0

jamoTCount :: Int Source #

Total count of trailing consonant jamo.

Since: 0.1.0

jamoTIndex :: Char -> Maybe Int Source #

Given a Unicode character, if it is a trailing jamo consonant, return its index in the list of trailing jamo consonants, otherwise return Nothing.

Note that index 0 is not a valid index for a trailing consonant. Index 0 corresponds to an LV syllable, without a T. See "Hangul Syllable Decomposition" in the Conformance chapter of the Unicode standard for more details.

Since: 0.1.0

jamoTLast :: Int Source #

Last trailing consonant jamo.

Since: 0.1.0

Hangul Syllables

Precomposed Hangul syllables.

hangulFirst :: Int Source #

Codepoint of the first pre-composed Hangul character.

Since: 0.1.0

hangulLast :: Int Source #

Codepoint of the last Hangul character.

Since: 0.1.0

isHangul :: Char -> Bool Source #

Determine if the given character is a precomposed Hangul syllable.

Since: 0.1.0

isHangulLV :: Char -> Bool Source #

Determine if the given character is a Hangul LV syllable.

Note: this function requires a precomposed Hangul syllable but does not check it. Use isHangul to check the input character before passing it to isHangulLV.

Since: 0.1.0

Bounded GeneralCategory Source #
Instance details Defined in Unicode.Char.General Methods minBound :: GeneralCategory Source # maxBound :: GeneralCategory Source #
Enum GeneralCategory Source #
Instance details Defined in Unicode.Char.General Methods succ :: GeneralCategory -> GeneralCategory Source # pred :: GeneralCategory -> GeneralCategory Source # toEnum :: Int -> GeneralCategory Source # fromEnum :: GeneralCategory -> Int Source # enumFrom :: GeneralCategory -> [GeneralCategory] Source # enumFromThen :: GeneralCategory -> GeneralCategory -> [GeneralCategory] Source # enumFromTo :: GeneralCategory -> GeneralCategory -> [GeneralCategory] Source # enumFromThenTo :: GeneralCategory -> GeneralCategory -> GeneralCategory -> [GeneralCategory] Source #
Ix GeneralCategory Source #
Instance details Defined in Unicode.Char.General Methods range :: (GeneralCategory, GeneralCategory) -> [GeneralCategory] Source # index :: (GeneralCategory, GeneralCategory) -> GeneralCategory -> Int Source # unsafeIndex :: (GeneralCategory, GeneralCategory) -> GeneralCategory -> Int Source # inRange :: (GeneralCategory, GeneralCategory) -> GeneralCategory -> Bool Source # rangeSize :: (GeneralCategory, GeneralCategory) -> Int Source # unsafeRangeSize :: (GeneralCategory, GeneralCategory) -> Int Source #
Show GeneralCategory Source #
Instance details Defined in Unicode.Char.General Methods showsPrec :: Int -> GeneralCategory -> ShowS Source # show :: GeneralCategory -> String Source # showList :: [GeneralCategory] -> ShowS Source #
Eq GeneralCategory Source #
Instance details Defined in Unicode.Char.General Methods (==) :: GeneralCategory -> GeneralCategory -> Bool Source # (/=) :: GeneralCategory -> GeneralCategory -> Bool Source #
Ord GeneralCategory Source #
Instance details Defined in Unicode.Char.General Methods compare :: GeneralCategory -> GeneralCategory -> Ordering Source # (<) :: GeneralCategory -> GeneralCategory -> Bool Source # (<=) :: GeneralCategory -> GeneralCategory -> Bool Source # (>) :: GeneralCategory -> GeneralCategory -> Bool Source # (>=) :: GeneralCategory -> GeneralCategory -> Bool Source # max :: GeneralCategory -> GeneralCategory -> GeneralCategory Source # min :: GeneralCategory -> GeneralCategory -> GeneralCategory Source #

Unicode.Char.General

Types of Code Points

Instances

Unicode general categories

Instances

Character classification

Re-export

Korean Hangul Characters

Conjoining Jamo

Jamo Leading (L)

Jamo Vowel (V)

Jamo Trailing (T)

Hangul Syllables

Types of Code Points

Unicode general categories

Character classification

Re-export

Korean Hangul Characters

Conjoining Jamo

Jamo Leading (L)

Jamo Vowel (V)

Jamo Trailing (T)

Hangul Syllables

UppercaseLetter	`Lu`: Letter, Uppercase
LowercaseLetter	`Ll`: Letter, Lowercase
TitlecaseLetter	`Lt`: Letter, Titlecase
ModifierLetter	`Lm`: Letter, Modifier
OtherLetter	`Lo`: Letter, Other
NonSpacingMark	`Mn`: Mark, Non-Spacing
SpacingCombiningMark	`Mc`: Mark, Spacing Combining
EnclosingMark	`Me`: Mark, Enclosing
DecimalNumber	`Nd`: Number, Decimal
LetterNumber	`Nl`: Number, Letter
OtherNumber	`No`: Number, Other
ConnectorPunctuation	`Pc`: Punctuation, Connector
DashPunctuation	`Pd`: Punctuation, Dash
OpenPunctuation	`Ps`: Punctuation, Open
ClosePunctuation	`Pe`: Punctuation, Close
InitialQuote	`Pi`: Punctuation, Initial quote
FinalQuote	`Pf`: Punctuation, Final quote
OtherPunctuation	`Po`: Punctuation, Other
MathSymbol	`Sm`: Symbol, Math
CurrencySymbol	`Sc`: Symbol, Currency
ModifierSymbol	`Sk`: Symbol, Modifier
OtherSymbol	`So`: Symbol, Other
Space	`Zs`: Separator, Space
LineSeparator	`Zl`: Separator, Line
ParagraphSeparator	`Zp`: Separator, Paragraph
Control	`Cc`: Other, Control
Format	`Cf`: Other, Format
Surrogate	`Cs`: Other, Surrogate
PrivateUse	`Co`: Other, Private Use
NotAssigned	`Cn`: Other, Not Assigned