Be carefull with subnormal floats
Subnormal floating point numbers are values that are very small (.0f to ~.1f), operations on them are specially treated by CPUs. Below is a timing of simple sumation of floating point values, you can see that once subnormal values are used, performance instantly drops 10 or more times.
http://coliru.stacked-crooked.com/a/6ffe763ab4b5ada0
// Example code for testing subnormal float efficency float f = 2.0f; std::chrono::steady_clock::duration prevdiff; for (int k = 0; k < 10000; ++k) { float sum = .0f; auto start = std::chrono::steady_clock::now(); for (int n = 0; n < 1e7; ++n) { sum += f * 0.1f; // interesting, but simple summation show no performance degradation } auto end = std::chrono::steady_clock::now(); auto diff = end - start; std::cout << k << ", f=" << f << ", isnormal=" << std::isnormal(f) << ", elapsed=" << std::chrono::duration_cast<std::chrono::milliseconds>(diff).count() << "ms" << ", ratio=" << (prevdiff.count() == 0 ? 0 : (diff.count() / prevdiff.count())) << std::endl; prevdiff = diff; if (f < std::numeric_limits<float>::min()) break; f /= 2.0f; }
here is the interesting part from the output:
123, f=1.88079e-37, isnormal=1, elapsed=69ms, ratio=1
124, f=9.40395e-38, isnormal=1, elapsed=799ms, ratio=11 <-------- here performance drop
125, f=4.70198e-38, isnormal=1, elapsed=800ms, ratio=1
126, f=2.35099e-38, isnormal=1, elapsed=800ms, ratio=0
127, f=1.17549e-38, isnormal=1, elapsed=796ms, ratio=0
128, f=5.87747e-39, isnormal=0, elapsed=985ms, ratio=1
whats interesting is that this drop happens before std::isnormal returns true. You can use _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); to make CPU not do computations using subnormal values.
This post is based on following SO : http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x.
Leave a Reply