... is nice :)
Test scene of 10.000 bridges, set up in a 100x100 grid. Red objects have been determined to be visible by core 0, green objects by core 1:
Freezing the scene and zooming out better shows the frustum, and what's going on:
The setup is a basic Parallel-For, where a bunch of threads in a pool sit around until they can steal work off a queue. Stealing work instead of using a dispatcher, or splitting the work up into N batches in advance, has the advantage that if some threads run slower than expected for some reason, they automatically do less work. This is also why I chose this setup instead of eg. one core doing main scene culling and another core doing shadow frustum culling.
Another view:
You can clearly see the streaks of 16 objects that are processed by a thread; you want to grab a number of items from the work-queue to keep the overhead low, but not too many because then you risk one thread gobbling up too much work while everybody else sits idle.
16 is probably too low, but using fancy pants lockfree programming it's stil high enough that the speedup is quite noticable:
Depending on point of view, speedup is between 25% and 65%.