Haskell conduit example app

🏠 Go home
Haskell Streaming 2022-08-14

I was playing around with the conduit streaming Haskell library. I wanted to create an example app which would stream a file containing numbers separated by a new line, it would do a mapping on each number (in my example I just double it, the actual arithmetic operation isn't important) and stream the result to an output file. The point is the whole program should be constant in memory even tho I process a file containing 1 million numbers.

For an introduction I recommend this talk by Michael Snoyman who is the author of the library.

The pipeline

The streaming pipeline itself is pretty simple.

Generate input file

python -c 'print("\n".join(list(map(str, range(0, 1_000_000)))))' > input-data.txt

The final program

We need conduit, bytestring and text dependencies.

executable haskell-conduit-example
    import:           warnings
    main-is:          Main.hs
    build-depends:    base ^>=4.14.3.0,
                      conduit ^>=1.3.4.2,
                      bytestring ^>=0.11.3.1,
                      text ^>=1.2.5
    hs-source-dirs:   app
    default-language: Haskell2010

The final example app can look something like this.

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Conduit
import qualified Data.Conduit.Combinators as CC
import qualified Data.ByteString as BS
import Data.Text                  as T
import Data.Text.Encoding         as T

byteStringToInt :: BS.ByteString -> Int
byteStringToInt = read . T.unpack . T.decodeUtf8

intToByteString :: Int -> BS.ByteString
intToByteString = T.encodeUtf8 . T.pack . show

main :: IO ()
main = runConduitRes $
  sourceFile "input-data.txt"
  .| CC.linesUnboundedAscii
  .| mapC byteStringToInt
  .| mapC (* 2)
  .| mapC ((<> "\n") . intToByteString)
  .| sinkFile "output-data.txt"