Purely Functional Sliding Window Aggregation Algorithm

byorgey.github.io

70 points by agnishom 4 months ago

mlochbaum 4 months ago

The queue method is popular, but there's a much faster (branch-free) and in my opinion simpler way, known as the van Herk/Gil-Werman algorithm in image processing. It splits the input into windows and pairs up a backward scan on one window with a forward scan on the next. This works for any associative function. I was very surprised when I learned about it that it's not taught more often (the name's not doing it any favors)! And I wrote a tutorial page on it for my SIMD-oriented language, mostly about vectorizing it which I didn't quite finish writing up, but with what I think is a reasonable presentation in the first part: https://github.com/mlochbaum/Singeli/blob/master/doc/minfilt...

I also found an interesting streaming version here recently: https://signalsmith-audio.co.uk/writing/2022/constant-time-p...

EDIT: On closer inspection, this method is equivalent to the one I described, and not the one I'm used to seeing with queues (that starts my tutorial). The stack-reversing step is what forms a backwards scan. The combination of turning it sequential by taking in one element at a time but then expressing this in functional programming makes for a complicated presentation, I think.

farazbabar 4 months ago

This is similar to an approach I use but instead of a queue, I accomplish this using a ring buffer that wraps around and overwrites entries older than window size. We maintain a global window aggregate, subtract ring buffer slot aggregate for entries dropping out and accumulate new entries into new slot aggregate while adding it to the global aggregate. Everything is o(1) including reads, which just returns the global window aggregate.

packetlost 4 months ago

I've implemented something similar! Except it was persistent and intended for non-volatile flash media. Was a lot of fun to implement.
gotoeleven 4 months ago

How does this work for max rather than sum?
- throwaway_1more 4 months ago
  
  Max is not like sum, you can just maintain one value over the window and update from new ones arriving immediately?
  - BoiledCabbage 4 months ago
    
    Yes, but how do you update your max when you drop old values. That's the issue with max forming a monoid and not a group.
    The whole point of the post is that this is easy to implement for sum, but is difficult for max. Posting how someone solves the problem for sum isn't really addressing anything new here.
    
    throwaway_1more 4 months ago
    
    If the value you dropped wasn't the max, no problem, if it was, you recompute over the window. It is amortized.

codebje 4 months ago

That was a well written and easily approachable blog post on what I found to be an interesting topic. Aside from the topic itself, I think I also learned a bit about structuring technical articles.

agnishom 4 months ago

This is a very interesting algorithm which is more or less known in the folklore, but is still relatively obscure. I have used it as a part of temporal logic monitoring procedure: https://github.com/Agnishom/lattice-mtl/blob/master/src/Moni...

Groxx 4 months ago

Not sure I'd call it obscure: https://leetcode.com/problems/maximum-subarray/
I've seen it in tech interviews for years.
- JohnKemeny 4 months ago
  
  The blog post discusses something else than just sum.
  Given an n length array of integers, and an integer k, output the max value for each k sized contiguous subarray.
  sum is much easier than max.
  - yorwba 4 months ago
    
    The Leetcode question for sliding window maximum is https://leetcode.com/problems/sliding-window-maximum/ Which can be solved using a specialized implementation for max.
    But the article even discusses a generic solution for any monoid.
    Either sum or max is easier than sum and max and a bunch of other operations all at the same time.

GistNoesis 4 months ago

I have made a quick c++ implementation for those unfamiliar with Haskell :

https://gist.github.com/unrealwill/5ca4db9beefafaa212465277b...

brianberns 4 months ago

I translated this to F# for my own edification. It's more verbose, but perhaps easier to understand for non-Haskellers.

https://github.com/brianberns/AnnotatedStack

belter 4 months ago

Competitive Programming in Haskell...I can only define this as Masochistic Aesthetics...

tikhonj 4 months ago

Counterpoint: competitions are games, games are about fun, Haskell is fun.
itishappy 4 months ago

Maybe. Anybody have a monoidally annotated queue in C++ to compare?
- GistNoesis 4 months ago
  
  See my above comment https://news.ycombinator.com/item?id=43161105
  - itishappy 4 months ago
    
    Nice, thanks!
javcasas 4 months ago

Uncommon solulutions are done better with uncommon tools.
- belter 4 months ago
  
  Competitive programming demands tight control over execution time and memory attributes best served by languages that offer strict evaluation and low-level data manipulation. Haskell has lazy evaluation, what can lead to unpredictable performance and space leaks. Monads are abstraction layers...
  - javcasas 4 months ago
    
    > Monads are abstraction layers...
    So is everything. BTW, read the article. No monads in there, unless you believe monoids are monads.
    
    belter 4 months ago
    
    My argument is about the practical trade-offs in competitive programming and Haskell being the wrong vehicle for it.
    
    javcasas 4 months ago
    
    Well, it depends on how you measure competitive programming. I think for measuring it in O() terms - as every paper out there on data structure does -, Haskell is quite good for the task.