There are exists great mean shift segmentation implementation in EDISON (Edge Detection and Image SegmentatiON system).

But it has some disadvantages:

  • Optimized version (HIGH_SPEEDUP) not optimized for modern CPUs and performs even slower than non-optimized one (NO_SPEEDUP)
  • No optimization for multi-core CPUs
  • No implementation for GPUs

So implementation was modified with following targets:

  • Results should be as close to original version as possible (original version - NO_SPEEDUP)
  • Multithreaded version for multi-core CPUs (OpenMP)
  • Adaptation for GPUs (OpenCL)
  • Possibility to run on multiple GPUs+CPU (just for fun)

Resulted implementation with benchmarks on multiple CPUs and GPUs open-sourced on github.

Example of result:

Unicorn from Blade Runner