Haskell avoiding stack overflow in folds without sacrificing performance -


the following piece of code experiences stack overflow large inputs:

{-# language derivedatatypeable, overloadedstrings #-} import qualified data.bytestring.lazy.char8 l   gentweets :: l.bytestring -> l.bytestring gentweets text | l.null text = ""                | otherwise = l.intercalate "\n\n" $ gentweets' $ l.words text   gentweets' txt = foldr p [] txt           p word [] = [word]                 p word words@(w:ws) | l.length word + l.length w <= 139 =                                         (word `l.append` " " `l.append` w):ws                                     | otherwise = word:words 

i assume predicate building list of thunks, i'm not sure why, or how fix it.

the equivalent code using foldl' runs fine, takes forever, since appends constantly, , uses ton of memory.

import data.list (foldl')  gentweetsstrict :: l.bytestring -> l.bytestring gentweetsstrict text | l.null text = ""                       | otherwise = l.intercalate "\n\n" $ gentweetsstrict' $ l.words text   gentweetsstrict' txt = foldl' p [] txt           p [] word = [word]                 p words word | l.length word + l.length (last words) <= 139 =                                 init words ++ [last words `l.append` " " `l.append` word]                              | otherwise = words ++ [word] 

what causing first snippet build thunks, , can avoided? possible write second snippet doesn't rely on (++)?

l.length word + l.length (last words) <= 139 

this problem. on each iteration, you're traversing accumulator list, , then

init words ++ [last words `l.append` " " `l.append` word] 

appending @ end. going take long time (proportional length of accumulator list). better solution generate output list lazily, interleaving processing reading input stream (you don't need read whole input output first 140-character tweet).

the following version of program processes relatively large file (/usr/share/dict/words) in under 1 second time, while using o(1) space:

{-# language overloadedstrings, bangpatterns #-}  module main  import qualified data.bytestring.lazy.char8 l import data.int (int64)  gentweets :: l.bytestring -> l.bytestring gentweets text | l.null text = ""                | otherwise   = l.intercalate "\n\n" $ totweets $ l.words text        -- concatenate words 139-character tweets.     totweets :: [l.bytestring] -> [l.bytestring]     totweets []     = []     totweets [w]    = [w]     totweets (w:ws) = go (l.length w, w) ws      -- main loop. notice how output tweet (cur_str) generated     -- possible, enabling l.writefile consume before whole     -- input processed.     go :: (int64, l.bytestring) -> [l.bytestring] -> [l.bytestring]     go (_cur_len, !cur_str) []     = [cur_str]     go (!cur_len, !cur_str) (w:ws)       | lw + cur_len <= 139        = go (cur_len + lw + 1,                                          cur_str `l.append` " " `l.append` w) ws       | otherwise                  = cur_str : go (lw, w) ws               lw = l.length w  -- notice use of lazy i/o. main :: io () main = dict <- l.readfile "/usr/share/dict/words"           l.writefile "tweets" (gentweets dict) 

Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

IIS->Tomcat Redirect: multiple worker with default -