Haskell conduit
example app
🏠 Go home
2022-08-14
I was playing around with the conduit streaming Haskell library. I wanted to create an example app which would stream a file containing numbers separated by a new line, it would do a mapping on each number (in my example I just double it, the actual arithmetic operation isn't important) and stream the result to an output file. The point is the whole program should be constant in memory even tho I process a file containing 1 million numbers.
For an introduction I recommend this talk by Michael Snoyman who is the author of the library.
The pipeline
The streaming pipeline itself is pretty simple.
- We need to stream a file (using sourceFile) as
ByteString
. - We aggregate each line (using linesUnboundedAscii).
- In the downstream, convert the
ByteString
toInt
(it is actually unsafe because I'm using the prelude's read function, don't do that in a production app!). - Finally, the very next downstream does the arithmetic transformation and the result gets converted from
Int
back to theByteString
. - The output is streamed to the output file (using sinkFile).
Generate input file
python -c 'print("\n".join(list(map(str, range(0, 1_000_000)))))' > input-data.txt
The final program
We need conduit
, bytestring
and text
dependencies.
executable haskell-conduit-example
import: warnings
main-is: Main.hs
build-depends: base ^>=4.14.3.0,
conduit ^>=1.3.4.2,
bytestring ^>=0.11.3.1,
text ^>=1.2.5
hs-source-dirs: app
default-language: Haskell2010
The final example app can look something like this.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Conduit
import qualified Data.Conduit.Combinators as CC
import qualified Data.ByteString as BS
import Data.Text as T
import Data.Text.Encoding as T
byteStringToInt :: BS.ByteString -> Int
byteStringToInt = read . T.unpack . T.decodeUtf8
intToByteString :: Int -> BS.ByteString
intToByteString = T.encodeUtf8 . T.pack . show
main :: IO ()
main = runConduitRes $
sourceFile "input-data.txt"
.| CC.linesUnboundedAscii
.| mapC byteStringToInt
.| mapC (* 2)
.| mapC ((<> "\n") . intToByteString)
.| sinkFile "output-data.txt"