Streamly.Data.Scanl
Setup
To execute the code examples provided in this module in ghci, please run the following commands first.
>>>
:m
>>>
:set -XFlexibleContexts
>>>
import Control.Monad (void)
>>>
import qualified Data.Foldable as Foldable
>>>
import Data.Bifunctor(bimap)
>>>
import Data.Function ((&))
>>>
import Data.Functor.Identity (Identity, runIdentity)
>>>
import Data.IORef (newIORef, readIORef, writeIORef)
>>>
import Data.Maybe (fromJust, isJust)
>>>
import Data.Monoid (Endo(..), Last(..), Sum(..))
>>>
import Prelude hiding (length, sum, minimum, maximum)
>>>
import Streamly.Data.Array (Array)
>>>
import Streamly.Data.Fold (Fold, Tee(..))
>>>
import Streamly.Data.Stream (Stream)
>>>
import qualified Data.Map as Map
>>>
import qualified Data.Set as Set
>>>
import qualified Data.IntSet as IntSet
>>>
import qualified Streamly.Data.Array as Array
>>>
import qualified Streamly.Data.Fold as Fold
>>>
import qualified Streamly.Data.MutArray as MutArray
>>>
import qualified Streamly.Data.Parser as Parser
>>>
import qualified Streamly.Data.Scanl as Scanl
>>>
import qualified Streamly.Data.Stream as Stream
>>>
import qualified Streamly.Data.StreamK as StreamK
>>>
import qualified Streamly.Data.Unfold as Unfold
For APIs that have not been released yet.
>>>
import qualified Streamly.Internal.Data.Fold as Fold
>>>
import qualified Streamly.Internal.Data.RingArray as RingArray
>>>
import qualified Streamly.Internal.Data.Scanl as Scanl
>>>
import qualified Streamly.Internal.Data.Stream as Stream
Scanl Type
The type Fold m a b
represents a consumer of an input stream of values
of type a
and returning a final value of type b
in Monad
m
. The
constructor of a fold is Fold step initial extract final
.
The fold uses an internal state of type s
. The initial value of the state
s
is created by initial
. This function is called once and only once
before the fold starts consuming input. Any resource allocation can be done
in this function.
The step
function is called on each input, it consumes an input and
returns the next intermediate state (see Step
) or the final result b
if
the fold terminates.
If the fold is used as a scan, the extract
function is used by the scan
driver to map the current state s
of the fold to the fold result. Thus
extract
can be called multiple times. In some folds, where scanning does
not make sense, this function is left unimplemented; such folds cannot be
used as scans.
Before a fold terminates, final
is called once and only once (unless the
fold terminated in initial
itself). Any resources allocated by initial
can be released in final
. In folds that do not require any cleanup
extract
and final
are typically the same.
When implementing fold combinators, care should be taken to cleanup any
state of the argument folds held by the fold by calling the respective
final
at all exit points of the fold. Also, final
should not be called
more than once. Note that if a fold terminates by Done
constructor, there
is no state to cleanup.
NOTE: The constructor is not yet released, smart constructors are provided to create folds.
Instances
Monad m => Applicative (Scanl m a) Source # | |
Defined in Streamly.Internal.Data.Scanl.Type Methods pure :: a0 -> Scanl m a a0 Source # (<*>) :: Scanl m a (a0 -> b) -> Scanl m a a0 -> Scanl m a b Source # liftA2 :: (a0 -> b -> c) -> Scanl m a a0 -> Scanl m a b -> Scanl m a c Source # (*>) :: Scanl m a a0 -> Scanl m a b -> Scanl m a b Source # (<*) :: Scanl m a a0 -> Scanl m a b -> Scanl m a a0 Source # | |
Functor m => Functor (Scanl m a) Source # | Maps a function on the output of the fold (the type |
Constructors
mkScanlM :: Monad m => (b -> a -> m b) -> m b -> Scanl m a b Source #
Make a scan from a left fold style monadic step function and initial value of the accumulator.
mkScanl1 :: Monad m => (a -> a -> a) -> Scanl m a (Maybe a) Source #
Make a strict left scan, for non-empty streams, using first element as the starting value. Returns Nothing if the stream is empty.
Pre-release
mkScanl1M :: Monad m => (a -> a -> m a) -> Scanl m a (Maybe a) Source #
Like mkScanl1
but with a monadic step function.
Pre-release
mkScanr :: Monad m => (a -> b -> b) -> b -> Scanl m a b Source #
Make a scan using a right fold style step function and a terminal value. It performs a strict right fold via a left fold using function composition. Note that a strict right fold can only be useful for constructing strict structures in memory. For reductions this will be very inefficient.
Definitions:
>>>
mkScanr f z = fmap (flip appEndo z) $ Scanl.foldMap (Endo . f)
>>>
mkScanr f z = fmap ($ z) $ Scanl.mkScanl (\g x -> g . f x) id
Example:
>>>
Stream.toList $ Stream.scanl (Scanl.mkScanr (:) []) $ Stream.enumerateFromTo 1 5
[[],[1],[1,2],[1,2,3],[1,2,3,4],[1,2,3,4,5]]
Scans
Accumulators
Scans that never terminate, these scans are much like strict left
folds. mconcat
is the fundamental accumulator. All other accumulators
can be expressed in terms of mconcat
using a suitable Monoid. Instead
of writing scans we could write Monoids and turn them into scans.
sconcat :: (Monad m, Semigroup a) => a -> Scanl m a a Source #
Semigroup concat. Append the elements of an input stream to a provided starting value.
Definition:
>>>
sconcat = Scanl.mkScanl (<>)
>>>
semigroups = fmap Data.Monoid.Sum $ Stream.enumerateFromTo 1 3
>>>
Stream.toList $ Stream.scanl (Scanl.sconcat 3) semigroups
[Sum {getSum = 3},Sum {getSum = 4},Sum {getSum = 6},Sum {getSum = 9}]
mconcat :: (Monad m, Monoid a) => Scanl m a a Source #
Monoid concat. Fold an input stream consisting of monoidal elements using
mappend
and mempty
.
Definition:
>>>
mconcat = Scanl.sconcat mempty
>>>
monoids = fmap Data.Monoid.Sum $ Stream.enumerateFromTo 1 3
>>>
Stream.toList $ Stream.scanl Scanl.mconcat monoids
[Sum {getSum = 0},Sum {getSum = 1},Sum {getSum = 3},Sum {getSum = 6}]
foldMap :: (Monad m, Monoid b) => (a -> b) -> Scanl m a b Source #
Definition:
>>>
foldMap f = Scanl.lmap f Scanl.mconcat
Make a fold from a pure function that folds the output of the function
using mappend
and mempty
.
>>>
sum = Scanl.foldMap Data.Monoid.Sum
>>>
Stream.toList $ Stream.scanl sum $ Stream.enumerateFromTo 1 3
[Sum {getSum = 0},Sum {getSum = 1},Sum {getSum = 3},Sum {getSum = 6}]
foldMapM :: (Monad m, Monoid b) => (a -> m b) -> Scanl m a b Source #
Definition:
>>>
foldMapM f = Scanl.lmapM f Scanl.mconcat
Make a fold from a monadic function that folds the output of the function
using mappend
and mempty
.
>>>
sum = Scanl.foldMapM (return . Data.Monoid.Sum)
>>>
Stream.toList $ Stream.scanl sum $ Stream.enumerateFromTo 1 3
[Sum {getSum = 0},Sum {getSum = 1},Sum {getSum = 3},Sum {getSum = 6}]
drain :: Monad m => Scanl m a () Source #
A scan that drains all its input, running the effects and discarding the results.
>>>
drain = Scanl.drainMapM (const (return ()))
>>>
drain = Scanl.mkScanl (\_ _ -> ()) ()
length :: Monad m => Scanl m a Int Source #
Determine the length of the input stream.
Definition:
>>>
length = Scanl.genericLength
>>>
length = fmap getSum $ Scanl.foldMap (Sum . const 1)
countDistinct :: (Monad m, Ord a) => Scanl m a Int Source #
Count non-duplicate elements in the stream.
Definition:
>>>
countDistinct = fmap Set.size Scanl.toSet
>>>
countDistinct = Scanl.postscanl Scanl.nub $ Scanl.catMaybes $ Scanl.length
The memory used is proportional to the number of distinct elements in the stream, to guard against using too much memory use it as a scan and terminate if the count reaches more than a threshold.
Space: \(\mathcal{O}(n)\)
Pre-release
countDistinctInt :: Monad m => Scanl m Int Int Source #
Like countDistinct
but specialized to a stream of Int
, for better
performance.
Definition:
>>>
countDistinctInt = fmap IntSet.size Scanl.toIntSet
>>>
countDistinctInt = Scanl.postscanl Scanl.nubInt $ Scanl.catMaybes $ Scanl.length
Pre-release
sum :: (Monad m, Num a) => Scanl m a a Source #
Determine the sum of all elements of a stream of numbers. Returns additive
identity (0
) when the stream is empty. Note that this is not numerically
stable for floating point numbers.
>>>
sum = Scanl.cumulativeScan Scanl.incrSum
Same as following but numerically stable:
>>>
sum = Scanl.mkScanl (+) 0
>>>
sum = fmap Data.Monoid.getSum $ Scanl.foldMap Data.Monoid.Sum
product :: (Monad m, Num a, Eq a) => Scanl m a a Source #
Determine the product of all elements of a stream of numbers. Returns
multiplicative identity (1
) when the stream is empty. The fold terminates
when it encounters (0
) in its input.
Same as the following but terminates on multiplication by 0
:
>>>
product = fmap Data.Monoid.getProduct $ Scanl.foldMap Data.Monoid.Product
mean :: (Monad m, Fractional a) => Scanl m a a Source #
Compute a numerically stable arithmetic mean of all elements in the input stream.
rollingHash :: (Monad m, Enum a) => Scanl m a Int64 Source #
Compute an Int
sized polynomial rolling hash of a stream.
>>>
rollingHash = Scanl.rollingHashWithSalt Scanl.defaultSalt
rollingHashWithSalt :: (Monad m, Enum a) => Int64 -> Scanl m a Int64 Source #
Compute an Int
sized polynomial rolling hash
H = salt * k ^ n + c1 * k ^ (n - 1) + c2 * k ^ (n - 2) + ... + cn * k ^ 0
Where c1
, c2
, cn
are the elements in the input stream and k
is a
constant.
This hash is often used in Rabin-Karp string search algorithm.
toList :: Monad m => Scanl m a [a] Source #
Scans the input stream building a list.
Warning! working on large lists accumulated as buffers in memory could be very inefficient, consider using Streamly.Data.Array instead.
>>>
toList = Scanl.mkScanr (:) []
toListRev :: Monad m => Scanl m a [a] Source #
Buffers the input stream to a list in the reverse order of the input.
Definition:
>>>
toListRev = Scanl.mkScanl (flip (:)) []
Warning! working on large lists accumulated as buffers in memory could be very inefficient, consider using Streamly.Array instead.
toSet :: (Monad m, Ord a) => Scanl m a (Set a) Source #
Scan the input adding it to a set.
Definition:
>>>
toSet = Scanl.mkScanl (flip Set.insert) Set.empty
toIntSet :: Monad m => Scanl m Int IntSet Source #
Scan the input adding it to an int set. For integer inputs this performs
better than toSet
.
Definition:
>>>
toIntSet = Scanl.mkScanl (flip IntSet.insert) IntSet.empty
topBy :: (MonadIO m, Unbox a) => (a -> a -> Ordering) -> Int -> Scanl m a (MutArray a) Source #
Get the top n
elements using the supplied comparison function.
To get bottom n elements instead:
>>>
bottomBy cmp = Scanl.topBy (flip cmp)
Example:
>>>
stream = Stream.fromList [2::Int,7,9,3,1,5,6,11,17]
>>>
Stream.toList (Stream.scanl (Scanl.topBy compare 3) stream) >>= mapM MutArray.toList
[[],[17],[17,11],[17,11,9],[17,11,9],[17,11,9],[17,11,9],[17,11,9],[17,11,9],[17,11,9]]
Pre-release
Non-Empty Accumulators
Accumulators that do not have a default value, therefore, return
Nothing
on an empty stream.
latest :: Monad m => Scanl m a (Maybe a) Source #
Returns the latest element of the input stream, if any.
>>>
latest = Scanl.mkScanl1 (\_ x -> x)
>>>
latest = fmap getLast $ Scanl.foldMap (Last . Just)
maximumBy :: Monad m => (a -> a -> Ordering) -> Scanl m a (Maybe a) Source #
Determine the maximum element in a stream using the supplied comparison function.
maximum :: (Monad m, Ord a) => Scanl m a (Maybe a) Source #
Determine the maximum element in a stream.
Definitions:
>>>
maximum = Scanl.maximumBy compare
>>>
maximum = Scanl.mkScanl1 max
Same as the following but without a default maximum. The Max
Monoid uses
the minBound
as the default maximum:
>>>
maximum = fmap Data.Semigroup.getMax $ Scanl.foldMap Data.Semigroup.Max
minimumBy :: Monad m => (a -> a -> Ordering) -> Scanl m a (Maybe a) Source #
Computes the minimum element with respect to the given comparison function
minimum :: (Monad m, Ord a) => Scanl m a (Maybe a) Source #
Determine the minimum element in a stream using the supplied comparison function.
Definitions:
>>>
minimum = Scanl.minimumBy compare
>>>
minimum = Scanl.mkScanl1 min
Same as the following but without a default minimum. The Min
Monoid uses the
maxBound
as the default maximum:
>>>
maximum = fmap Data.Semigroup.getMin $ Scanl.foldMap Data.Semigroup.Min
Filtering Scanners
Accumulators that are usually run as a scan using the potscanlMaybe
combinator.
findIndices :: Monad m => (a -> Bool) -> Scanl m a (Maybe Int) Source #
Returns the index of the latest element if the element satisfies the given predicate.
elemIndices :: (Monad m, Eq a) => a -> Scanl m a (Maybe Int) Source #
Returns the index of the latest element if the element matches the given value.
Definition:
>>>
elemIndices a = Scanl.findIndices (== a)
deleteBy :: Monad m => (a -> a -> Bool) -> a -> Scanl m a (Maybe a) Source #
Returns the latest element omitting the first occurrence that satisfies the given equality predicate.
Example:
>>>
input = Stream.fromList [1,3,3,5]
>>>
Stream.toList $ Stream.postscanlMaybe (Scanl.deleteBy (==) 3) input
[1,3,5]
uniqBy :: Monad m => (a -> a -> Bool) -> Scanl m a (Maybe a) Source #
Return the latest unique element using the supplied comparison function.
Returns Nothing
if the current element is same as the last element
otherwise returns Just
.
Example, strip duplicate path separators:
>>>
input = Stream.fromList "//a//b"
>>>
f x y = x == '/' && y == '/'
>>>
Stream.toList $ Stream.postscanlMaybe (Scanl.uniqBy f) input
"/a/b"
Space: O(1)
Pre-release
Terminating Folds
Transformations
Transformations are modifiers of scans. In the type Scan m a b
, a
is the input type and b
is the output type. Transformations can be
applied either on the input side (contravariant) or on the output side
(covariant). Therefore, transformations have one of the following
general shapes:
... -> Scanl m a b -> Scanl m c b
(input transformation)... -> Scanl m a b -> Scanl m a c
(output transformation)
The input side transformations are more interesting for scans. Most of
the following sections describe the input transformation operations on a
scan. When an operation makes sense on both input and output side we use
the prefix l
(for left) for input side operations and the prefix r
(for right) for output side operations.
Mapping on output
The Functor
instance of a scan maps on the output of the scan:
>>>
Stream.toList $ Stream.scanl (fmap show Scanl.sum) (Stream.enumerateFromTo 1 10)
["0","1","3","6","10","15","21","28","36","45","55"]
rmapM :: Monad m => (b -> m c) -> Scanl m a b -> Scanl m a c Source #
Map a monadic function on the output of a fold.
Mapping on Input
lmap :: (a -> b) -> Scanl m b r -> Scanl m a r Source #
lmap f scan
maps the function f
on the input of the scan.
Definition:
>>>
lmap = Scanl.lmapM return
Example:
>>>
sumSquared = Scanl.lmap (\x -> x * x) Scanl.sum
>>>
Stream.toList $ Stream.scanl sumSquared (Stream.enumerateFromTo 1 10)
[0,1,5,14,30,55,91,140,204,285,385]
lmapM :: Monad m => (a -> m b) -> Scanl m b r -> Scanl m a r Source #
lmapM f scan
maps the monadic function f
on the input of the scan.
Filtering
filter :: Monad m => (a -> Bool) -> Scanl m a r -> Scanl m a r Source #
Include only those elements that pass a predicate.
>>>
Stream.toList $ Stream.scanl (Scanl.filter (> 5) Scanl.sum) $ Stream.fromList [1..10]
[0,0,0,0,0,0,6,13,21,30,40]
>>>
filter p = Scanl.postscanlMaybe (Scanl.filtering p)
>>>
filter p = Scanl.filterM (return . p)
>>>
filter p = Scanl.mapMaybe (\x -> if p x then Just x else Nothing)
filterM :: Monad m => (a -> m Bool) -> Scanl m a r -> Scanl m a r Source #
Like filter
but with a monadic predicate.
>>>
f p x = p x >>= \r -> return $ if r then Just x else Nothing
>>>
filterM p = Scanl.mapMaybeM (f p)
mapMaybe :: Monad m => (a -> Maybe b) -> Scanl m b r -> Scanl m a r Source #
mapMaybe f fold
maps a Maybe
returning function f
on the input of
the fold, filters out Nothing
elements, and return the values extracted
from Just
.
>>>
mapMaybe f = Scanl.lmap f . Scanl.catMaybes
>>>
mapMaybe f = Scanl.mapMaybeM (return . f)
>>>
f x = if even x then Just x else Nothing
>>>
scn = Scanl.mapMaybe f Scanl.toList
>>>
Stream.toList $ Stream.scanl scn (Stream.enumerateFromTo 1 10)
[[],[],[2],[2],[2,4],[2,4],[2,4,6],[2,4,6],[2,4,6,8],[2,4,6,8],[2,4,6,8,10]]
catEithers :: Scanl m a b -> Scanl m (Either a a) b Source #
Remove the either wrapper and flatten both lefts and as well as rights in the output stream.
Definition:
>>>
catEithers = Scanl.lmap (either id id)
Pre-release
Trimming
take :: Monad m => Int -> Scanl m a b -> Scanl m a b Source #
Take at most n
input elements and scan them using the supplied scan. A
negative count is treated as 0.
>>>
Stream.toList $ Stream.scanl (Scanl.take 2 Scanl.toList) $ Stream.fromList [1..10]
[[],[1],[1,2]]
takeEndBy :: Monad m => (a -> Bool) -> Scanl m a b -> Scanl m a b Source #
Take the input, stop when the predicate succeeds taking the succeeding element as well.
Example:
>>>
input = Stream.fromList "hello\nthere\n"
>>>
line = Scanl.takeEndBy (== '\n') Scanl.toList
>>>
Stream.toList $ Stream.scanl line input
["","h","he","hel","hell","hello","hello\n"]
takeEndBy_ :: Monad m => (a -> Bool) -> Scanl m a b -> Scanl m a b Source #
Like takeEndBy
but drops the element on which the predicate succeeds.
Example:
>>>
input = Stream.fromList "hello\nthere\n"
>>>
line = Scanl.takeEndBy_ (== '\n') Scanl.toList
>>>
Stream.toList $ Stream.scanl line input
["","h","he","hel","hell","hello","hello"]
Key-value Scanners
classify :: (MonadIO m, Ord k) => (a -> k) -> Scanl m a b -> Scanl m a (Maybe (k, b)) Source #
Scans the values for each key using the supplied scan.
Once the scan for a key terminates, any future values of the key are ignored.
Equivalent to the following except that the scan is not restarted:
>>>
classify f fld = Scanl.demux f (const fld)
classifyIO :: (MonadIO m, Ord k) => (a -> k) -> Scanl m a b -> Scanl m a (Maybe (k, b)) Source #
Same as classify except that it uses mutable IORef cells in the Map, providing better performance.
Equivalent to the following except that the scan is not restarted:
>>>
classifyIO f fld = Scanl.demuxIO f (const fld)
Transforming the Monad
morphInner :: (forall x. m x -> n x) -> Scanl m a b -> Scanl n a b Source #
Change the underlying monad of a scan. Also known as hoist.
Pre-release
Combinators
Transformations that combine two or more scans.
Scanning
Parallel Distribution
The Applicative
instance distributes the input to both scans.
teeWith :: Monad m => (b -> c -> d) -> Scanl m a b -> Scanl m a c -> Scanl m a d Source #
teeWith k f1 f2
distributes its input to both f1
and f2
until any
one of them terminates. The outputs of the two scans are combined using the
function k
.
Definition:
>>>
teeWith k f1 f2 = fmap (uncurry k) (Scanl.tee f1 f2)
Example:
>>>
avg = Scanl.teeWith (/) Scanl.sum (fmap fromIntegral Scanl.length)
>>>
Stream.toList $ Stream.postscanl avg $ Stream.fromList [1.0..10.0]
[1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5]
Note that nested applications of teeWith do not fuse.
Pre-release
tee :: Monad m => Scanl m a b -> Scanl m a c -> Scanl m a (b, c) Source #
Distribute one copy of the stream to each scan and zip the results.
|-------Scanl m a b--------| ---stream m a---| |---m (b,c) |-------Scanl m a c--------|
Definition:
>>>
tee = Scanl.teeWith (,)
Example:
>>>
t = Scanl.tee Scanl.sum Scanl.length
>>>
Stream.toList $ Stream.scanl t (Stream.enumerateFromTo 1.0 10.0)
[(0.0,0),(1.0,1),(3.0,2),(6.0,3),(10.0,4),(15.0,5),(21.0,6),(28.0,7),(36.0,8),(45.0,9),(55.0,10)]
distribute :: Monad m => [Scanl m a b] -> Scanl m a [b] Source #
Distribute one copy of the stream to each fold and collect the results in a container.
|-------Scanl m a b--------| ---stream m a---| |---m [b] |-------Scanl m a b--------| | | ...
>>>
Stream.toList $ Stream.scanl (Scanl.distribute [Scanl.sum, Scanl.length]) (Stream.enumerateFromTo 1 5)
[[0,0],[1,1],[3,2],[6,3],[10,4],[15,5]]
>>>
distribute = Prelude.foldr (Scanl.teeWith (:)) (Scanl.const [])
This is the consumer side dual of the producer side sequence
operation.
Stops as soon as any of the scans stop.
Partitioning
Direct items in the input stream to different scans using a binary scan selector.
Unzipping
unzip :: Monad m => Scanl m a x -> Scanl m b y -> Scanl m (a, b) (x, y) Source #
Send the elements of tuples in a stream of tuples through two different folds.
|-------Scanl m a x--------| ---------stream of (a,b)--| |----m (x,y) |-------Scanl m b y--------|
Definition:
>>>
unzip = Scanl.unzipWith id
This is the consumer side dual of the producer side zip
operation.
Dynamic Combinators
The scan to be used is generated dynamically based on the input.
Key-value Scanners
demux :: (Monad m, Ord k) => (a -> k) -> (k -> m (Maybe (Scanl m a b))) -> Scanl m a (Maybe (k, b)) Source #
demux getKey getScan
: In a key value stream, scan values corresponding
to each key using a key specific scan. getScan
is invoked to generate a
key specific scan when a key is encountered for the first time in the
stream. If a scan does not exist corresponding to the key then Nothing
is
returned otherwise the result of the scan is returned.
If a scan terminates, another instance of the scan is started upon receiving
an input with that key, getScan
is invoked again whenever the key is
encountered again.
This can be used to scan a stream, splitting it based on different keys.
Since the scan generator function is monadic we can add scans dynamically. For example, we can maintain a Map of keys to scans in an IORef and lookup the scan from that corresponding to a key. This Map can be changed dynamically, scans for new keys can be added or scans for old keys can be deleted or modified.
Compare with classify
, the scan in classify
is a static scan.
Pre-release