Optimizing std::bitset with noexcept - c++

I have some code which needs to check if a bit in a bit field is set. I've been impressed with how optimized std::bitset.count compiles down, so thought this would be a good use of it as well. I'm surprised that when I use bitset.test, the compiler is still checking for exceptions, even when I add noexcept.
I have other checks to ensure that I won't test a bit which is out of range. Is there anything else I can do to optimize this code?
bool bitset_test(int n, size_t m) noexcept {
// The caller already verifies m is within range
return std::bitset<64>( n ).test( m );
}
Compiled output: https://godbolt.org/g/uRggXD

Is there anything else I can do to optimize this code?
Yes! Just like other containers, std::bitset has a function to get a specific bit with and without bounds checking. test is the bounds checking function, which throws.
But if you have other tests to make sure that m is not too high, then you can use std::bitset's operator[], which doesn't have any exception handling in the resulting assembly, because it assumes that you are passing a correct value:
bool bitset_test(int n, size_t m) noexcept {
return std::bitset<64>(n)[m];
}
As others have mentioned, the reason why the compiler still generates exceptions in the resulting assembly is because the compiler still has to call std::terminate if an exception is thrown, so it can't just ignore the exception.

Related

Optimization barrier for microbenchmarks in MSVC: tell the optimizer you clobber memory?

Chandler Carruth introduced two functions in his CppCon2015 talk that can be used to do some fine-grained inhibition of the optimizer. They are useful to write micro-benchmarks that the optimizer won't simply nuke into meaninglessness.
void clobber() {
asm volatile("" : : : "memory");
}
void escape(void* p) {
asm volatile("" : : "g"(p) : "memory");
}
These use inline assembly statements to change the assumptions of the optimizer.
The assembly statement in clobber states that the assembly code in it can read and write anywhere in memory. The actual assembly code is empty, but the optimizer won't look into it because it's asm volatile. It believes it when we tell it the code might read and write everywhere in memory. This effectively prevents the optimizer from reordering or discarding memory writes prior to the call to clobber, and forces memory reads after the call to clobber†.
The one in escape, additionally makes the pointer p visible to the assembly block. Again, because the optimizer won't look into the actual inline assembly code that code can be empty, and the optimizer will still assume that the block uses the address pointed by the pointer p. This effectively forces whatever p points to be in memory and not not in a register, because the assembly block might perform a read from that address.
(This is important because the clobber function won't force reads nor writes for anything that the compilers decides to put in a register, since the assembly statement in clobber doesn't state that anything in particular must be visible to the assembly.)
All of this happens without any additional code being generated directly by these "barriers". They are purely compile-time artifacts.
These use language extensions supported in GCC and in Clang, though. Is there a way to have similar behaviour when using MSVC?
† To understand why the optimizer has to think this way, imagine if the assembly block were a loop adding 1 to every byte in memory.
Given your approximation of escape(), you should also be fine with the following approximation of clobber() (note that this is a draft idea, deferring some of the solution to the implementation of the function nextLocationToClobber()):
// always returns false, but in an undeducible way
bool isClobberingEnabled();
// The challenge is to implement this function in a way,
// that will make even the smartest optimizer believe that
// it can deliver a valid pointer pointing anywhere in the heap,
// stack or the static memory.
volatile char* nextLocationToClobber();
const bool clobberingIsEnabled = isClobberingEnabled();
volatile char* clobberingPtr;
inline void clobber() {
if ( clobberingIsEnabled ) {
// This will never be executed, but the compiler
// cannot know about it.
clobberingPtr = nextLocationToClobber();
*clobberingPtr = *clobberingPtr;
}
}
UPDATE
Question: How would you ensure that isClobberingEnabled returns false "in an undeducible way"? Certainly it would be trivial to place the definition in another translation unit, but the minute you enable LTCG, that strategy is defeated. What did you have in mind?
Answer: We can take advantage of a hard-to-prove property from the number theory, for example, Fermat's Last Theorem:
bool undeducible_false() {
// It took mathematicians more than 3 centuries to prove Fermat's
// last theorem in its most general form. Hardly that knowledge
// has been put into compilers (or the compiler will try hard
// enough to check all one million possible combinations below).
// Caveat: avoid integer overflow (Fermat's theorem
// doesn't hold for modulo arithmetic)
std::uint32_t a = std::clock() % 100 + 1;
std::uint32_t b = std::rand() % 100 + 1;
std::uint32_t c = reinterpret_cast<std::uintptr_t>(&a) % 100 + 1;
return a*a*a + b*b*b == c*c*c;
}
I have used the following in place of escape.
#ifdef _MSC_VER
#pragma optimize("", off)
template <typename T>
inline void escape(T* p) {
*reinterpret_cast<char volatile*>(p) =
*reinterpret_cast<char const volatile*>(p); // thanks, #milleniumbug
}
#pragma optimize("", on)
#endif
It's not perfect but it's close enough, I think.
Sadly, I don't have a way to emulate clobber.

Is there optimization for constant function return value inside a loop? [duplicate]

This question already has answers here:
Is Loop Hoisting still a valid manual optimization for C code?
(8 answers)
Closed 7 years ago.
My question is specifically about the gcc compiler.
Often, inside a loop, I have to use a value returned by a function that is constant during the whole loop.
I would like to know if it's better to prior store this constant return value inside a variable (let's imagine a long loop), or if a compiler like gcc is able to perform some optimizations to cache the constant value, because it would recognize it as constant threw the loop.
For example when I loop over chars in a string, I often write something like that:
bool find_something(string s, char something)
{
size_t sz = s.size();
for (size_t i = 0; i != sz; i++)
if (s[i] == something) return true;
return false;
}
but with a clever compiler, I could use the following (which is shorter and more clear):
bool find_something(string s, char something)
{
for (size_t i = 0; i != s.size(); i++)
if (s[i] == something) return true;
return false;
}
then the compiler could detect that the code inside the loop doesn't perform any change on the string object, and then would build a code that would cache the value returned by s.size(), instead of making a (slower) function call for each iteration.
Is there such optimization with gcc?
Generally there's nothing in your example that makes it impossible for the compiler to move the .size() computation before the loop. And in fact GCC 5.2.0 will produce exactly the same code for both implementations that you showed.
I would however strongly suggest against relying on optimizations like that (in really performance critical code) because a small change somewhere (GCCs optimizer, the implementation details of std::string, ...) could break GCCs ability to do this optimization.
However I don't see a point in writing the more verbose version in the usual 90% of code that's not really performance-critical.
Given a current C++ compiler though I would go for the even more concise:
bool find_something(std::string s, char something)
{
for (ch : s)
if (ch == something) return true;
return false;
}
Which BTW also yields very similar machine code with GCC 5.2.0.
The compiler has to know the object is not modified in a different thread. It can tell that the function won't change if the object doesn't change, but it can't tell that the object won't change from some other stimulus.
The compiler will unwind the call to the member with size if you include some form of whole-program optimization
It depends on whether or not the compiler can identify the function call as constant. Consider the following function that may reside in an external library which cannot be analyzed by the compiler.
int odd_size(string s) {
static int a = 0;
return a++;
}
This function will return distinct values regardless of the input parameter. The compiler can therfore not assume a constant return value even if the passed string object remains constant. No optimization will be applied.
On the other hand, if the compiler detects a constant function call, which may be the case in your example, it probably moves the constant expression out of the loop.
Older versions of gcc had an explicit option -floop-optimize which was responsible for that task. From the gcc-3.4.5 documentation:
-floop-optimize
Perform loop optimizations: move constant expressions out of loops, simplify exit test conditions and optionally do strength-reduction and loop unrolling as well.
Enabled at levels -O, -O2, -O3, -Os.
I cannot find this option in current versions of gcc, but I'm quite sure that they include this type of optimization as well.

use of assert and static assert functions

I am trying to understand the use of static_assert and assert and what the difference is between them, however there are very little sources/explantions about this
here is some code
// ConsoleApplication3.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "conio.h"
#include "cassert"
#include "iostream"
int main()
{
assert(2+2==4);
std::cout << "Execution continues past the first assert\n";
assert(2+2==5);
std::cout << "Execution continues past the second assert\n";
_getch();
}
comments on redundency will be appreciated(since I am learning "how to c++")
output in cmd
Execution continues past the first assert
Assertion failed: 2+2==5, file c:\users\charles\documents\visual studio 2012\pro
jects\consoleapplication3\consoleapplication3\consoleapplication3.cpp, line 14
I have been trying to find out the different mehtods and uses of it, but as far as I understand it is a runtime check and another "type" of if statement
could someone just clarify the use and explain what each one does and they difference?
You can think of assertions as sanity checks. You know that some condition should be true unless you've screwed something up, so the assertion should pass. If you have indeed screwed something up, the assertion will fail and you'll be told that something is wrong. It's just there to ensure the validity of your code.
A static_assert can be used when the condition is a constant expression. This basically means that the compiler is able to evaluate the assertion before the program ever actually runs. You will be alerted that a static_assert has failed at compile-time, whereas a normal assert will only fail at run time. In your example, you could have used a static_assert, because the expressions 2+2==4 and 2+2==5 are both constant expressions.
static_asserts are useful for checking compile-time constructs such as template parameters. For example, you could assert that a given template argument T must be a POD type with something like:
static_assert(std::is_pod<T>::value, "T must be a POD type");
Note that you generally only want run-time assertions to be checked during debugging, so you can disable assert by #defineing NDEBUG.
assert() is a macro whose expansion depends on whether macro NDEBUG is defined. If so, assert() doesn't expand to anything - it's a no-op. When NDEBUG is not defined, assert(x) expands into a runtime check, something like this:
if (!x) cause_runtime_to_abort()
It is common to define NDEBUG in "release" builds and leave it undefined in "debug" builds. That way, assert()s are only executed in debug code and do not make it into release code at all. You normally use assert() to check things which should always be true - a function's preconditions and postconditions or a class's invariants, for example. A failed assert(x) should mean "the programmer thought that x holds, but a bug somewhere in the code (or in their reasoning) made that untrue."
static_assert() (which was introduced in C++11) is a keyword - similar to e.g. typedef. It can only be used on compile-time expressions, and if it fails, it results in a compilation error. It does not result in any object code and is not executed at all.
static_assert() is primarily useful in templates, if you want to prevent instantiating a template incorrectly. For example:
template <class IntType>
IntType foo(IntType x)
{
static_assert(std::is_integral<IntType>::value, "foo() may be used with integral types only.");
// rest of the code
}
That way, trying to call foo() with e.g. a float will result in a compile-time error with a sensible message.
Occasionally, static_assert() can also be useful outside of templates, e.g. like this:
static_assert(sizeof(void*) > 4, "This code does not work in 32 bits");
a static_assert is evaluated at compile time, an assert is evaluated an runtime.
I doubt you cannot find any sources for this, but I’ll nevertheless give some explanation.
assert is for runtime checks which should never ever fail. If they fail, the program will terminate ungracefully. You can disable assertions for a release build by specifying a compile time flag. As assertions should never fail (they are assertions after all), in a working program, that should not make any difference. However, the expressions inside an assert must not have side effects, because there is no guarantee that they are executed. Example:
unsigned int add_non_negative_numbers(int a, int b)
{
// good
assert(a > 0);
assert(b > 0);
return (unsigned int)a + (unsigned int)b;
}
void checked_read(int fd, char *buffer, int count)
{
// BAD: if assertions are disabled, read() will _not_ be called
assert(read(fd, buffer, count) == count);
/* better: not perfect though, because read can always fail
int read_bytes = read(fd, buffer, count);
assert(read_bytes == count);
*/
}
static_assert is new and can be used to perform compile time checks. They will produce a compiler error when they fail and block the program from compiling:
static_assert(sizeof(char) == 1, "Compiler violates the standard");

What does static_assert do, and what would you use it for?

Could you give an example where static_assert(...) ('C++11') would solve the problem in hand elegantly?
I am familiar with run-time assert(...). When should I prefer static_assert(...) over regular assert(...)?
Also, in boost there is something called BOOST_STATIC_ASSERT, is it the same as static_assert(...)?
Static assert is used to make assertions at compile time. When the static assertion fails, the program simply doesn't compile. This is useful in different situations, like, for example, if you implement some functionality by code that critically depends on unsigned int object having exactly 32 bits. You can put a static assert like this
static_assert(sizeof(unsigned int) * CHAR_BIT == 32);
in your code. On another platform, with differently sized unsigned int type the compilation will fail, thus drawing attention of the developer to the problematic portion of the code and advising them to re-implement or re-inspect it.
For another example, you might want to pass some integral value as a void * pointer to a function (a hack, but useful at times) and you want to make sure that the integral value will fit into the pointer
int i;
static_assert(sizeof(void *) >= sizeof i);
foo((void *) i);
You might want to asset that char type is signed
static_assert(CHAR_MIN < 0);
or that integral division with negative values rounds towards zero
static_assert(-5 / 2 == -2);
And so on.
Run-time assertions in many cases can be used instead of static assertions, but run-time assertions only work at run-time and only when control passes over the assertion. For this reason a failing run-time assertion may lay dormant, undetected for extended periods of time.
Of course, the expression in static assertion has to be a compile-time constant. It can't be a run-time value. For run-time values you have no other choice but use the ordinary assert.
Off the top of my head...
#include "SomeLibrary.h"
static_assert(SomeLibrary::Version > 2,
"Old versions of SomeLibrary are missing the foo functionality. Cannot proceed!");
class UsingSomeLibrary {
// ...
};
Assuming that SomeLibrary::Version is declared as a static const, rather than being #defined (as one would expect in a C++ library).
Contrast with having to actually compile SomeLibrary and your code, link everything, and run the executable only then to find out that you spent 30 minutes compiling an incompatible version of SomeLibrary.
#Arak, in response to your comment: yes, you can have static_assert just sitting out wherever, from the look of it:
class Foo
{
public:
static const int bar = 3;
};
static_assert(Foo::bar > 4, "Foo::bar is too small :(");
int main()
{
return Foo::bar;
}
$ g++ --std=c++0x a.cpp
a.cpp:7: error: static assertion failed: "Foo::bar is too small :("
I use it to ensure my assumptions about compiler behaviour, headers, libs and even my own code are correct. For example here I verify that the struct has been correctly packed to the expected size.
struct LogicalBlockAddress
{
#pragma pack(push, 1)
Uint32 logicalBlockNumber;
Uint16 partitionReferenceNumber;
#pragma pack(pop)
};
BOOST_STATIC_ASSERT(sizeof(LogicalBlockAddress) == 6);
In a class wrapping stdio.h's fseek(), I have taken some shortcuts with enum Origin and check that those shortcuts align with the constants defined by stdio.h
uint64_t BasicFile::seek(int64_t offset, enum Origin origin)
{
BOOST_STATIC_ASSERT(SEEK_SET == Origin::SET);
You should prefer static_assert over assert when the behaviour is defined at compile time, and not at runtime, such as the examples I've given above. An example where this is not the case would include parameter and return code checking.
BOOST_STATIC_ASSERT is a pre-C++0x macro that generates illegal code if the condition is not satisfied. The intentions are the same, albeit static_assert is standardised and may provide better compiler diagnostics.
BOOST_STATIC_ASSERT is a cross platform wrapper for static_assert functionality.
Currently I am using static_assert in order to enforce "Concepts" on a class.
example:
template <typename T, typename U>
struct Type
{
BOOST_STATIC_ASSERT(boost::is_base_of<T, Interface>::value);
BOOST_STATIC_ASSERT(std::numeric_limits<U>::is_integer);
/* ... more code ... */
};
This will cause a compile time error if any of the above conditions are not met.
One use of static_assert might be to ensure that a structure (that is an interface with the outside world, such as a network or file) is exactly the size that you expect. This would catch cases where somebody adds or modifies a member from the structure without realising the consequences. The static_assert would pick it up and alert the user.
In absence of concepts one can use static_assert for simple and readable compile-time type checking, for example, in templates:
template <class T>
void MyFunc(T value)
{
static_assert(std::is_base_of<MyBase, T>::value,
"T must be derived from MyBase");
// ...
}
This doesn't directly answers the original question, but makes an interesting study into how to enforce these compile time checks prior to C++11.
Chapter 2 (Section 2.1) of Modern C++ Design by Andrei Alexanderscu implements this idea of Compile-time assertions like this
template<int> struct CompileTimeError;
template<> struct CompileTimeError<true> {};
#define STATIC_CHECK(expr, msg) \
{ CompileTimeError<((expr) != 0)> ERROR_##msg; (void)ERROR_##msg; }
Compare the macro STATIC_CHECK() and static_assert()
STATIC_CHECK(0, COMPILATION_FAILED);
static_assert(0, "compilation failed");
To add on to all the other answers, it can also be useful when using non-type template parameters.
Consider the following example.
Let's say you want to define some kind of function whose particular functionality can be somewhat determined at compile time, such as a trivial function below, which returns a random integer in the range determined at compile time. You want to check, however, that the minimum value in the range is less than the maximum value.
Without static_assert, you could do something like this:
#include <iostream>
#include <random>
template <int min, int max>
int get_number() {
if constexpr (min >= max) {
throw std::invalid_argument("Min. val. must be less than max. val.\n");
}
srand(time(nullptr));
static std::uniform_int_distribution<int> dist{min, max};
std::mt19937 mt{(unsigned int) rand()};
return dist(mt);
}
If min < max, all is fine and the if constexpr branch gets rejected at compile time. However, if min >= max, the program still compiles, but now you have a function that, when called, will throw an exception with 100% certainty. Thus, in the latter case, even though the "error" (of min being greater than or equal to max) was present at compile-time, it will only be discovered at run-time.
This is where static_assert comes in.
Since static_assert is evaluated at compile-time, if the boolean constant expression it is testing is evaluated to be false, a compile-time error will be generated, and the program will not compile.
Thus, the above function can be improved as so:
#include <iostream>
#include <random>
template <int min, int max>
int get_number() {
static_assert(min < max, "Min. value must be less than max. value.\n");
srand(time(nullptr));
static std::uniform_int_distribution<int> dist{min, max};
std::mt19937 mt{(unsigned int) rand()};
return dist(mt);
}
Now, if the function template is instantiated with a value for min that is equal to or greater than max, then static_assert will evaluate its boolean constant expression to be false, and will throw a compile-time error, thus alerting you to the error immediately, without giving the opportunity for an exception at runtime.
(Note: the above method is just an example and should not be used for generating random numbers, as repeated calls in quick succession to the function will generate the same numbers due to the seed value passed to the std::mt19937 constructor through rand() being the same (due to time(nullptr) returning the same value) - also, the range of values generated by std::uniform_int_distribution is actually a closed interval, so the same value can be passed to its constructor for upper and lower bounds (though there wouldn't be any point))
The static_assert can be used to forbid the use of the delete keyword this way:
#define delete static_assert(0, "The keyword \"delete\" is forbidden.");
Every modern C++ developer may want to do that if he or she wants to use a conservative garbage collector by using only classes and structs that overload the operator new to invoke a function that allocates memory on the conservative heap of the conservative garbage collector that can be initialized and instantiated by invoking some function that does this in the beginning of the main function.
For example every modern C++ developer that wants to use the Boehm-Demers-Weiser conservative garbage collector will in the beginning of the main function write:
GC_init();
And in every class and struct overload the operator new this way:
void* operator new(size_t size)
{
return GC_malloc(size);
}
And now that the operator delete is not needed anymore, because the Boehm-Demers-Weiser conservative garbage collector is responsible to both free and deallocate every block of memory when it is not needed anymore, the developer wants to forbid the delete keyword.
One way is overloading the delete operator this way:
void operator delete(void* ptr)
{
assert(0);
}
But this is not recommended, because the modern C++ developer will know that he/she mistakenly invoked the delete operator on run time, but this is better to know this soon on compile time.
So the best solution to this scenario in my opinion is to use the static_assert as shown in the beginning of this answer.
Of course that this can also be done with BOOST_STATIC_ASSERT, but I think that static_assert is better and should be preferred more always.

Which, if any, C++ compilers do tail-recursion optimization?

It seems to me that it would work perfectly well to do tail-recursion optimization in both C and C++, yet while debugging I never seem to see a frame stack that indicates this optimization. That is kind of good, because the stack tells me how deep the recursion is. However, the optimization would be kind of nice as well.
Do any C++ compilers do this optimization? Why? Why not?
How do I go about telling the compiler to do it?
For MSVC: /O2 or /Ox
For GCC: -O2 or -O3
How about checking if the compiler has done this in a certain case?
For MSVC, enable PDB output to be able to trace the code, then inspect the code
For GCC..?
I'd still take suggestions for how to determine if a certain function is optimized like this by the compiler (even though I find it reassuring that Konrad tells me to assume it)
It is always possible to check if the compiler does this at all by making an infinite recursion and checking if it results in an infinite loop or a stack overflow (I did this with GCC and found out that -O2 is sufficient), but I want to be able to check a certain function that I know will terminate anyway. I'd love to have an easy way of checking this :)
After some testing, I discovered that destructors ruin the possibility of making this optimization. It can sometimes be worth it to change the scoping of certain variables and temporaries to make sure they go out of scope before the return-statement starts.
If any destructor needs to be run after the tail-call, the tail-call optimization can not be done.
All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:
int bar(int, int);
int foo(int n, int acc) {
return (n == 0) ? acc : bar(n - 1, acc + 2);
}
int bar(int n, int acc) {
return (n == 0) ? acc : foo(n - 1, acc + 1);
}
Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:
For MSVC, use /O2 or /Ox.
For GCC, Clang and ICC, use -O3
An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.
As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.
As well as the obvious (compilers don't do this sort of optimization unless you ask for it), there is a complexity about tail-call optimization in C++: destructors.
Given something like:
int fn(int j, int i)
{
if (i <= 0) return j;
Funky cls(j,i);
return fn(j, i-1);
}
The compiler can't (in general) tail-call optimize this because it needs
to call the destructor of cls after the recursive call returns.
Sometimes the compiler can see that the destructor has no externally visible side effects (so it can be done early), but often it can't.
A particularly common form of this is where Funky is actually a std::vector or similar.
gcc 4.3.2 completely inlines this function (crappy/trivial atoi() implementation) into main(). Optimization level is -O1. I notice if I play around with it (even changing it from static to extern, the tail recursion goes away pretty fast, so I wouldn't depend on it for program correctness.
#include <stdio.h>
static int atoi(const char *str, int n)
{
if (str == 0 || *str == 0)
return n;
return atoi(str+1, n*10 + *str-'0');
}
int main(int argc, char **argv)
{
for (int i = 1; i != argc; ++i)
printf("%s -> %d\n", argv[i], atoi(argv[i], 0));
return 0;
}
Most compilers don't do any kind of optimisation in a debug build.
If using VC, try a release build with PDB info turned on - this will let you trace through the optimised app and you should hopefully see what you want then. Note, however, that debugging and tracing an optimised build will jump you around all over the place, and often you cannot inspect variables directly as they only ever end up in registers or get optimised away entirely. It's an "interesting" experience...
As Greg mentions, compilers won't do it in debug mode. It's ok for debug builds to be slower than a prod build, but they shouldn't crash more often: and if you depend on a tail call optimization, they may do exactly that. Because of this it is often best to rewrite the tail call as an normal loop. :-(