C++ rounding style
According to Cppreference - float_round_style, there are 4 available rounding styles used in C++ floating-point arithmetics.
Name | Definition |
---|---|
std::round_indeterminate |
Rounding style cannot be determined. |
std::round_toward_zero |
Rounding toward zero. |
std::round_to_nearest |
Rounding toward nearest representable value, i.e. round half to even. It is also known as convergent rounding. |
std::round_toward_infinity |
Rounding toward positive infinity. |
std::round_toward_neg_infinity |
Rounding toward negative infinity. |
I’m not sure why there is no std::round_away_from_zero
i.e. rounding toward ±infinity, which is what std::round
does. IEEE_754 defines all 5 rounding rules.
According to Cppreference - FE_round, floating-point to integer implicit conversion and casts always round toward zero. Meanwhile, integer to floating-point casts usually round to nearest.
Results of floating-point arithmetic operators in expressions executed at compile time always round to nearest.
The rounding style of the library functions std::nearbyint
, std::rint
, std::lrint
can be set, but the rounding style of std::round
, std::lround
, std::llround
, std::ceil
, std::floor
, std::trunc
cannot be set.
Sometimes one may see std::floor(x + 0.5)
being used as a rounding function, which does rounding toward positive infinity. Meanwhile, std::ceil(x - 0.5)
does rounding toward negative infinity.
Also, for std::fmod(x, y)
which returns x - n * y
, where n
is rounded toward zero. Meanwhile, for std::remainder(x, y)
, where n
is rounded to nearest, ties to even.
By the way, NumPy does rounding to the nearest even value.
For Xilinx FPGAs, HLS ap_fixed
datatype provides:
Name | Definition |
---|---|
AP_RND |
Rounding toward positive infinity. |
AP_RND_ZERO |
Rounding toward zero. |
AP_RND_MIN_INF |
Rounding toward negative infinity. |
AP_RND_INF |
Rounding toward ±infinity. |
AP_RND_CONV |
Convergent rounding. |
AP_TRN |
Truncation to negative infinity (default). |
AP_TRN_ZERO |
Truncation to zero. |
According to the UG902 manual, quantization and overflow modes that do more than the default behavior of standard hardware arithmetic (wrap and truncate) result in operators with more associated hardware. It costs logic (LUTs) to implement the more advanced modes, such as round to minus infinity or saturate symmetrically.