Streamly.Internal.Unicode.Stream
Setup
To execute the code examples provided in this module in ghci, please run the following commands first.
>>>
:m
>>>
import qualified Streamly.Data.Fold as Fold
>>>
import qualified Streamly.Data.Stream as Stream
>>>
import qualified Streamly.Unicode.Stream as Unicode
For APIs that have not been released yet.
>>>
:set -XMagicHash
>>>
import qualified Streamly.Internal.Unicode.Stream as Unicode
Construction (Decoding)
decodeLatin1 :: Monad m => Stream m Word8 -> Stream m Char Source #
Decode a stream of bytes to Unicode characters by mapping each byte to a
corresponding Unicode Char
in 0-255 range.
UTF-8 Byte Stream Decoding
data CodingFailureMode Source #
Instances
Show CodingFailureMode Source # | |
Defined in Streamly.Internal.Unicode.Stream |
parseCharUtf8With :: Monad m => CodingFailureMode -> Parser Word8 m Char Source #
decodeUtf8 :: Monad m => Stream m Word8 -> Stream m Char Source #
Decode a UTF-8 encoded bytestream to a stream of Unicode characters. Any invalid codepoint encountered is replaced with the unicode replacement character.
decodeUtf8' :: Monad m => Stream m Word8 -> Stream m Char Source #
Decode a UTF-8 encoded bytestream to a stream of Unicode characters. The function throws an error if an invalid codepoint is encountered.
decodeUtf8_ :: Monad m => Stream m Word8 -> Stream m Char Source #
Decode a UTF-8 encoded bytestream to a stream of Unicode characters. Any invalid codepoint encountered is dropped.
UTF-16 Byte Stream Decoding
decodeUtf16le' :: Monad m => Stream m Word16 -> Stream m Char Source #
Similar to decodeUtf16le
but throws an error if an invalid codepoint is
encountered.
decodeUtf16le :: Monad m => Stream m Word16 -> Stream m Char Source #
Decode a UTF-16 encoded stream to a stream of Unicode characters. Any invalid codepoint encountered is replaced with the unicode replacement character.
The Word16s are expected to be in the little-endian byte order.
Resumable UTF-8 Byte Stream Decoding
data DecodeError Source #
Constructors
DecodeError !DecodeState !CodePoint |
Instances
Show DecodeError Source # | |
Defined in Streamly.Internal.Unicode.Stream |
type DecodeState = Word8 Source #
decodeUtf8Either :: Monad m => Stream m Word8 -> Stream m (Either DecodeError Char) Source #
Pre-release
resumeDecodeUtf8Either :: Monad m => DecodeState -> CodePoint -> Stream m Word8 -> Stream m (Either DecodeError Char) Source #
Pre-release
UTF-8 Array Stream Decoding
decodeUtf8Chunks :: MonadIO m => Stream m (Array Word8) -> Stream m Char Source #
Like decodeUtf8
but for a chunked stream. It may be slightly faster than
flattening the stream and then decoding with decodeUtf8
.
decodeUtf8Chunks' :: MonadIO m => Stream m (Array Word8) -> Stream m Char Source #
Like 'decodeUtf8'' but for a chunked stream. It may be slightly faster than flattening the stream and then decoding with 'decodeUtf8''.
decodeUtf8Chunks_ :: MonadIO m => Stream m (Array Word8) -> Stream m Char Source #
Like decodeUtf8_
but for a chunked stream. It may be slightly faster
than flattening the stream and then decoding with decodeUtf8_
.
Elimination (Encoding)
Latin1 Encoding to Byte Stream
encodeLatin1 :: Monad m => Stream m Char -> Stream m Word8 Source #
Like encodeLatin1'
but silently maps input codepoints beyond 255 to
arbitrary Latin1 chars in 0-255 range. No error or exception is thrown when
such mapping occurs.
encodeLatin1' :: Monad m => Stream m Char -> Stream m Word8 Source #
Encode a stream of Unicode characters to bytes by mapping each character to a byte in 0-255 range. Throws an error if the input stream contains characters beyond 255.
encodeLatin1_ :: Monad m => Stream m Char -> Stream m Word8 Source #
Like encodeLatin1
but drops the input characters beyond 255.
UTF-8 Encoding to Byte Stream
encodeUtf8 :: Monad m => Stream m Char -> Stream m Word8 Source #
Encode a stream of Unicode characters to a UTF-8 encoded bytestream. Any Invalid characters (U+D800-U+D8FF) in the input stream are replaced by the Unicode replacement character U+FFFD.
encodeUtf8' :: Monad m => Stream m Char -> Stream m Word8 Source #
Encode a stream of Unicode characters to a UTF-8 encoded bytestream. When any invalid character (U+D800-U+D8FF) is encountered in the input stream the function errors out.
encodeUtf8_ :: Monad m => Stream m Char -> Stream m Word8 Source #
Encode a stream of Unicode characters to a UTF-8 encoded bytestream. Any Invalid characters (U+D800-U+D8FF) in the input stream are dropped.
encodeStrings :: MonadIO m => (Stream m Char -> Stream m Word8) -> Stream m String -> Stream m (Array Word8) Source #
Encode a stream of String
using the supplied encoding scheme. Each
string is encoded as an Array Word8
.
UTF-8 Encoding to Chunk Stream
UTF-16 Encoding to Byte Stream
encodeUtf16le' :: Monad m => Stream m Char -> Stream m Word16 Source #
Similar to encodeUtf16le
but throws an error if any invalid character is
encountered.
encodeUtf16le :: Monad m => Stream m Char -> Stream m Word16 Source #
Encode a stream of Unicode characters to a UTF-16 encoded stream. Any invalid characters in the input stream are replaced by the Unicode replacement character U+FFFD.
The resulting Word16s are encoded in little-endian byte order.
Transformation
stripHead :: Monad m => Stream m Char -> Stream m Char Source #
Remove leading whitespace from a string.
>>>
stripHead = Stream.dropWhile Char.isSpace
Pre-release
lines :: Monad m => Fold m Char b -> Stream m Char -> Stream m b Source #
Fold each line of the stream using the supplied Fold
and stream the result.
Definition:
>>>
lines f = Stream.foldMany (Fold.takeEndBy_ (== '\n') f)
Usage:
>>>
Stream.toList $ Unicode.lines Fold.toList (Stream.fromList "line1\nline2\nline3\n\n\n")
["line1","line2","line3","",""]
Pre-release
words :: Monad m => Fold m Char b -> Stream m Char -> Stream m b Source #
Fold each word of the stream using the supplied Fold
.
Definition:
>>>
words = Stream.wordsBy Char.isSpace
Usage:
>>>
Stream.toList $ Unicode.words Fold.toList (Stream.fromList " ab cd ef ")
["ab","cd","ef"]
Pre-release
unlines :: MonadIO m => Unfold m a Char -> Stream m a -> Stream m Char Source #
Unfold a stream to character streams using the supplied Unfold
and concat the results suffixing a newline character \n
to each stream.
Definition:
>>>
unlines = Stream.unfoldEachEndBy '\n'
>>>
unlines = Stream.unfoldEachEndBySeq "\n" Unfold.fromList
Pre-release
unwords :: MonadIO m => Unfold m a Char -> Stream m a -> Stream m Char Source #
Unfold the elements of a stream to character streams using the supplied
Unfold
and concat the results with a whitespace character infixed between
the streams.
>>>
unwords = Stream.unfoldEachSepBy ' '
>>>
unwords = Stream.unfoldEachSepBySeq " " Unfold.fromList
Pre-release
StreamD UTF8 Encoding / Decoding transformations.
encodeUtf8D :: Monad m => Stream m Char -> Stream m Word8 Source #
See section "3.9 Unicode Encoding Forms" in https://www.unicode.org/versions/Unicode13.0.0/UnicodeStandard-13.0.pdf
decodeUtf8EitherD :: Monad m => Stream m Word8 -> Stream m (Either DecodeError Char) Source #
resumeDecodeUtf8EitherD :: Monad m => DecodeState -> CodePoint -> Stream m Word8 -> Stream m (Either DecodeError Char) Source #
Decoding String Literals
fromStr# :: MonadIO m => Addr# -> Stream m Char Source #
Read UTF-8 encoded bytes as chars from an Addr#
until a 0 byte is
encountered, the 0 byte is not included in the stream.
Unsafe: The caller is responsible for safe addressing.
Note that this is completely safe when reading from Haskell string literals because they are guaranteed to be NULL terminated:
>>>
Stream.fold Fold.toList (Unicode.fromStr# "Haskell"#)
"Haskell"
Word16 Utilities
mkEvenW8Chunks :: Monad m => Stream m (Array Word8) -> Stream m (Array Word8) Source #
Ensure chunks of even length. This can be used before casting the arrays to Word16. Use this API when interacting with external data.
The chunks are split and merged accordingly to create arrays of even length. If the sum of length of all the arrays in the stream is odd then the trailing byte of the last array is dropped.
swapByteOrder :: Word16 -> Word16 Source #
Swap the byte order of Word16
swapByteOrder 0xABCD == 0xCDAB swapByteOrder . swapByteOrder == id
Deprecations
decodeUtf8Lax :: Monad m => Stream m Word8 -> Stream m Char Source #
Deprecated: Please use decodeUtf8
instead
Same as decodeUtf8
encodeLatin1Lax :: Monad m => Stream m Char -> Stream m Word8 Source #
Deprecated: Please use encodeLatin1
instead
Same as encodeLatin1
encodeUtf8Lax :: Monad m => Stream m Char -> Stream m Word8 Source #
Deprecated: Please use encodeUtf8
instead
Same as encodeUtf8