Related
Here's two example. Set float values on a vector every time within a loop:
static constexpr float kFMScale = 10.0f / k2PIf; // somewhere...
for (int i = 0; i < numValues; i++) {
paramKnob.v = _mm_mul_ps(_mm_sub_ps(paramKnob.v - _mm_set1_ps(0.5f)), _mm_set1_ps(2.0f));
paramKnob.v = _mm_mul_ps(paramKnob.v, _mm_set1_ps(kFMScale));
// code...
}
or do it once and "reload" the same register every time:
static constexpr float kFMScale = 10.0f / k2PIf; // somewhere...
__m128 v05 = _mm_set1_ps(0.5f);
__m128 v2 = _mm_set1_ps(2.0f);
__m128 vFMScale = _mm_set1_ps(kFMScale);
for (int i = 0; i < numValues; i++) {
paramKnob.v = _mm_mul_ps(_mm_sub_ps(paramKnob.v, v05), v2);
paramKnob.v = _mm_mul_ps(paramKnob.v, vFMScale);
// code...
}
which one is generally the best and suited approch? I'll bet on the second, but vectorization most of the time fool me.
And what if I use them as const in the whole project instead of "within a block"? "load everywhere" instead of "set everywhere" would be better?
i think its all about cache missing rather than latency/throughput used by the operations.
I'm on a windows/64 bit machine, using FLAGS += -O3 -march=nocona -funsafe-math-optimizations
In general, for best performance you want to make the loop body as concise as possible and move any code invariant to the iteration out of the loop body. Whether a given compiler is able to do it for you depends on its ability to perform this kind of optimization and on the currently enabled optimization level.
Modern gcc and clang versions are typically smart enough to convert _mm_set1_ps and similar intrinsics with constant arguments to in-memory constants, so even without hoisting this results in a fairly efficient binary code. On the other hand, MSVC is not very smart in SIMD optimizations.
As a rule of thumb I would still recommend to move constant generation (except for all-zero or all-ones constants) out of the loop body. There are several reasons to do so, even if your compiler is capable to do so on its own:
You are not relying on the compiler to do the optimization, which makes your code more portable across compilers. The code will also perform more similarly between different optimization levels, which may be useful.
Moving the constant out of the loop body may convince the compiler to pre-load the constant before entering the loop instead of referencing it in-memory inside the loop. Again, this is subject to compiler's optimization capabilities and current optimization level.
The constants can be reused in multiple places, including multiple functions. This reduces the binary size and makes your code more cache-friendly in run time. (Some linkers are able to merge equivalent constants when linking the binary, but this feature is not universally supported, it is subject to the optimization level, and in some cases makes the code non-compliant to the C/C++ standards, which can affect code correctness. For this reason, this feature is normally disabled by default, even when supported.)
If you are going to declare a constant in namespace scope, it is highly recommended to use an aligned array and/or a union to ensure static initialization. Compilers may not be able to perform static initialization if you use intrinsics to initialize the constant. You can use something like the following:
template< typename T >
union mm_constant128
{
T as_array[sizeof(__m128i) / sizeof(T)];
__m128i as_m128i;
__m128 as_m128;
__m128d as_m128d;
operator __m128i () const noexcept { return as_m128i; }
operator __m128 () const noexcept { return as_m128; }
operator __m128d () const noexcept { return as_m128d; }
};
constexpr mm_constant128< float > k05 = {{ 0.5f, 0.5f, 0.5f, 0.5f }};
void foo()
{
// Use as follows:
_mm_sub_ps(paramKnob.v, k05)
}
If you are targeting a specific compiler at a given optimization level, you can inspect the generated assembler code to see whether the compiler is doing good enough optimization on its own.
Or do this:
Declare a compile-time constant __m128 outside your function, preferably in a "constants" namespace.
inline constexpr __m128 _val = {0.5f,0.5f,0.5f,0.5f};
//Your function here
This will save you from using the _mm_set1_ps function, which does not evaluate to 1 instruction, it is a rather inefficient sequence of instructions.
Since templates are resolved at compile-time, there will be little to no performance difference between this and the other answer. However, I think this code is significantly simpler.
Consider the following code snippet
#include <vector>
#include <cstdlib>
void __attribute__ ((noinline)) calculate1(double& a, int x) { a += x; };
void __attribute__ ((noinline)) calculate2(double& a, int x) { a *= x; };
void wrapper1(double& a, int x) { calculate1(a, x); }
void wrapper2(double& a, int x) { calculate2(a, x); }
typedef void (*Func)(double&, int);
int main()
{
std::vector<std::pair<double, Func>> pairs = {
std::make_pair(0, (rand() % 2 ? &wrapper1 : &wrapper2)),
std::make_pair(0, (rand() % 2 ? &wrapper1 : &wrapper2)),
};
for (auto& [a, wrapper] : pairs)
(*wrapper)(a, 5);
return pairs[0].first + pairs[1].first;
}
With -O3 optimization the latest gcc and clang versions do not optimize the pointers to wrappers to pointers to underlying functions. See assembly here at line 22:
mov ebp, OFFSET FLAT:wrapper2(double&, int) # tmp118,
which results later in call + jmp, instead of just call had the compiler put a pointer to the calculate1 instead.
Note that I specifically asked for no-inlined calculate functions to illustrate; doing it without noinline results in another flavour of non-optimization where compiler will generate two identical functions to be called by pointer (so still won't optimize, just in a different fashion).
What am I missing here? Is there any way to guide the compiler short of manually plugging in the correct functions (without wrappers)?
Edit 1. Following suggestions in the comments, here is a disassembly with all functions declared static, with exactly the same result (call + jmp instead of call).
Edit 2. Much simpler example of the same pattern:
#include <vector>
#include <cstdlib>
typedef void (*Func)(double&, int);
static void __attribute__ ((noinline)) calculate(double& a, int x) { a += x; };
static void wrapper(double& a, int x) { calculate(a, x); }
int main() {
double a = 5.0;
Func f;
if (rand() % 2)
f = &wrapper; // f = &calculate;
else
f = &wrapper;
f(a, 0);
return 0;
}
gcc 8.2 successfully optimizes this code by throwing pointer to wrapper away and storing &calculate directly in its place (https://gcc.godbolt.org/z/nMIBeo). However changing the line as per comment (that is, performing part of the same optimization manually) breaks the magic and results in pointless jmp.
You seem to be suggesting that &calculate1 should be stored in the vector instead of &wrapper1. In general this is not possible: later code might try to compare the stored pointer against &calculate1 and that must compare false.
I further assume that your suggestion is that the compiler might try to do some static analysis and determine that the function pointers values in the vector are never compared for equality with other function pointers, and in fact that none of the other operations done on the vector elements would produce a change in observable behaviour; and therefore in this exact program it could store &calculate1 instead.
Usually the answer to "why does the compiler not perform some particular optimization" is that nobody has conceived of and implemented that idea. Another common reason is that the static analysis involved is, in the general case, quite difficult and might lead to a slowdown in compilation with no benefit in real programs where the analysis could not be guaranteed to succeed.
You are making a lot of assumptions here. Firstly, your syntax. The second is that compilers are perfect in the eye of the beholder and catch everything. The reality is that it is easy to find and hand optimize compiler output, it is not difficult to write small functions to trip up a compiler that you are well in tune with or write a decent size application and there will be places where you can hand tune. This is all known and expected. Then opinion comes in where on my machine my blah is faster than blah so it should have made these instructions instead.
gcc is not a great compiler for performance, on some targets it has been getting worse for a number of major revs. It is pretty good at what it does, better than pretty good, it deals with a number of pre processors/languages has a common middle and a number of backends. Some backends get better optimization applied front to back others are just hanging on for the ride.There were a number of other compilers that could produce code that could easily outperform gcc.
These were mostly pay-for compilers. More than an individual would pay out of pocket: used car prices, sometimes recurring annually.
There are things that gcc can optimize that are simply amazing and times that it totally goes in the wrong direction. Same goes for clang, often they do similar jobs with similar output, sometimes do some impressive things sometimes just go off into the weeds. I now find it more fun to manipulate the optimizer to make it do good or bad things rater than worry about why didn't it do what I "think" it should have done on a particular occasion. If I need that code faster I take the compiled output and hand fix it and use it as an assembly function.
You get what you pay for with gcc, if you were to look deep in its bowels you will find it is barely held together with duct tape and bailing wire (llvm is catching up). But for a free tool it does a simply amazing job, it is so widely used that you can get free support just about anywhere. We are sadly well into a time where folks think that because gcc interprets the language in a certain way that is how the language is defined and sadly that is not remotely true. But so many folks don't try other compilers to find out what "implementation defined" really means.
Last and most important, it's open source, if you want to "fix" an optimization then just do it. Keep that fix for yourself, post it, or try to push it upstream.
In C++, if value of a variable never gets changed once assigned in whole program VS If making that variable as const , In which case executable code is faster?
How compiler optimize executable code in case 1?
A clever compiler can understand that the value of a variable is never changed, thus optimizing the related code, even without the explicit const keyword by the programmer.
As for your second, question, when you mark a variable as const, then the follow might happen: the "compiler can optimize away this const by not providing storage to this variable rather add it in symbol table. So, subsequent read just need indirection into the symbol table rather than instructions to fetch value from memory". Read more in What kind of optimization does const offer in C/C++? (if any).
I said might, because const does not mean that this is a constant expression for sure, which can be done by using constexpr instead, as I explain bellow.
In general, you should think about safer code, rather than faster code when it comes to using the const keyword. So unless, you do it for safer and more readable code, then you are likely a victim of premature optimization.
Bonus:
C++ offers the constexpr keyword, which allows the programmer to mark a variable as what the Standard calls constant expressions. A constant expression is more than merely constant.
Read more in Difference between `constexpr` and `const` and When should you use constexpr capability in C++11?
PS: Constness prevents moving, so using const too liberally may turn your code to execute slower.
In which case executable code is faster?
The code is faster in case on using const, because compiler has more room for optimization. Consider this snippet:
int c = 5;
[...]
int x = c + 5;
If c is constant, it will simply assign 10 to x. If c is not a constant, it depend on compiler if it will be able to deduct from the code that c is de-facto constant.
How compiler optimize executable code in case 1?
Compiler has harder time to optimize the code in case the variable is not constant. The broader the scope of the variable, the harder for the compiler to make sure the variable is not changing.
For simple cases, like a local variables, the compiler with basic optimizations will be able to deduct that the variable is a constant. So it will treat it like a constant.
if (...) {
int c = 5;
[...]
int x = c + 5;
}
For broader scopes, like global variables, external variables etc., if the compiler is not able to analyze the whole scope, it will treat it like a normal variable, i.e. allocate some space, generate load and store operations etc.
file1.c
int c = 5;
file2.c
extern int c;
[...]
int x = c + 5;
There are more aggressive optimization options, like link time optimizations, which might help in such cases. But still, performance-wise, the const keyword helps, especially for variables with wide scopes.
EDIT:
Simple example
File const.C:
const int c = 5;
volatile int x;
int main(int argc, char **argv)
{
x = c + 5;
}
Compilation:
$ g++ const.C -O3 -g
Disassembly:
5 {
6 x = c + 5;
0x00000000004003e0 <+0>: movl $0xa,0x200c4a(%rip) # 0x601034 <x>
7 }
So we just move 10 (0xa) to x.
File nonconst.C:
int c = 5;
volatile int x;
int main(int argc, char **argv)
{
x = c + 5;
}
Compilation:
$ g++ nonconst.C -O3 -g
Disassembly:
5 {
6 x = c + 5;
0x00000000004003e0 <+0>: mov 0x200c4a(%rip),%eax # 0x601030 <c>
0x00000000004003e6 <+6>: add $0x5,%eax
0x00000000004003e9 <+9>: mov %eax,0x200c49(%rip) # 0x601038 <x>
7 }
We load c, add 5 and store to x.
So as you can see even with quite aggressive optimization (-O3) and the shortest program you can write, the effect of const is quite obvious.
g++ version 5.4.1
I'm doing a bit of hands on research surrounding the speed benefits of making a function inline. I don't have the book with me, but one text I was reading, was suggesting a fairly large overhead cost to making function calls; and when ever executable size is either negligible, or can be spared, a function should be declared inline, for speed.
I've written the following code to test this theory, and from what I can tell, there is no speed benifit from declaring a function as inline. Both functions, when called 4294967295 times, on my computer, execute in 196 seconds.
My question is, what would be your thoughts as to why this is happening? Is it modern compiler optimization? Would it be the lack of large calculations taking place in the function?
Any insight on the matter would be appreciated. Thanks in advance friends.
#include < iostream >
#include < time.h >
// RESEARCH Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
// Two functions that preform an identacle arbitrary floating point calculation
// one function is inline, the other is not.
double test(double a, double b, double c);
double inlineTest(double a, double b, double c);
double test(double a, double b, double c){
a = (3.1415 / 1.2345) / 4 + 5;
b = 9.999 / a + (a * a);
c = a *=b;
return c;
}
inline
double inlineTest(double a, double b, double c){
a = (3.1415 / 1.2345) / 4 + 5;
b = 9.999 / a + (a * a);
c = a *=b;
return c;
}
// ENTRY POINT Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
int main(){
const unsigned int maxUINT = -1;
clock_t start = clock();
//============================ NON-INLINE TEST ===============================//
for(unsigned int i = 0; i < maxUINT; ++i)
test(1.1,2.2,3.3);
clock_t end = clock();
std::cout << maxUINT << " calls to non inline function took "
<< (end - start)/CLOCKS_PER_SEC << " seconds.\n";
start = clock();
//============================ INLINE TEST ===================================//
for(unsigned int i = 0; i < maxUINT; ++i)
test(1.1,2.2,3.3);
end = clock();
std::cout << maxUINT << " calls to inline function took "
<< (end - start)/CLOCKS_PER_SEC << " seconds.\n";
getchar(); // Wait for input.
return 0;
} // Main.
Assembly Output
PasteBin
The inline keyword is basically useless. It is a suggestion only. The compiler is free to ignore it and refuse to inline such a function, and it is also free to inline a function declared without the inline keyword.
If you are really interested in doing a test of function call overhead, you should check the resultant assembly to ensure that the function really was (or wasn't) inlined. I'm not intimately familiar with VC++, but it may have a compiler-specific method of forcing or prohibiting the inlining of a function (however the standard C++ inline keyword will not be it).
So I suppose the answer to the larger context of your investigation is: don't worry about explicit inlining. Modern compilers know when to inline and when not to, and will generally make better decisions about it than even very experienced programmers. That's why the inline keyword is often entirely ignored. You should not worry about explicitly forcing or prohibiting inlining of a function unless you have a very specific need to do so (as a result of profiling your program's execution and finding that a bottleneck could be solved by forcing an inline that the compiler has for some reason not done).
Re: the assembly:
; 30 : const unsigned int maxUINT = -1;
; 31 : clock_t start = clock();
mov esi, DWORD PTR __imp__clock
push edi
call esi
mov edi, eax
; 32 :
; 33 : //============================ NON-INLINE TEST ===============================//
; 34 : for(unsigned int i = 0; i < maxUINT; ++i)
; 35 : blank(1.1,2.2,3.3);
; 36 :
; 37 : clock_t end = clock();
call esi
This assembly is:
Reading the clock
Storing the clock value
Reading the clock again
Note what's missing: calling your function a whole bunch of times
The compiler has noticed that you don't do anything with the result of the function and that the function has no side-effects, so it is not being called at all.
You can likely get it to call the function anyway by compiling with optimizations off (in debug mode).
Both the functions could be inlined. The definition of the non-inline function is in the same compilation unit as the usage point, so the compiler is within its rights to inline it even without you asking.
Post the assembly and we can confirm it for you.
EDIT: the MSVC compiler pragma for banning inlining is:
#pragma auto_inline(off)
void myFunction() {
// ...
}
#pragma auto_inline(on)
Two things could be happening:
The compiler may either be inlining both or neither functions. Check your compiler documentation for how to control that.
Your function may be complex enough that the overhead of doing the function call isn't big enough to make a big difference in the tests.
Inlining is great for very small functions but it's not always better. Code bloat can prevent the CPU from caching code.
In general inline getter/setter functions and other one liners. Then during performance tuning you can try inlining functions if you think you'll get a boost.
Your code as posted contains a couple oddities.
1) The math and output of your test functions are completely independent of the function parameters. If the compiler is smart enough to detect that those functions always return the same value, that might give it incentive to optimize them out entirely inline or not.
2) Your main function is calling test for both the inline and non-inline tests. If this is the actual code that you ran, then that would have a rather large role to play in why you saw the same results.
As others have suggested, you would do well to examine the actual assembly code generated by the compiler to determine that you're actually testing what you intended to.
Um, shouldn't
//============================ INLINE TEST ===================================//
for(unsigned int i = 0; i < maxUINT; ++i)
test(1.1,2.2,3.3);
be
//============================ INLINE TEST ===================================//
for(unsigned int i = 0; i < maxUINT; ++i)
inlineTest(1.1,2.2,3.3);
?
But if that was just a typo, would recommend that look at a dissassembler or reflector to see if the code is actually inline or still stack-ed.
If this test took 196 seconds for each loop, then you must not have turned optimizations on; with optimizations off, generally compilers don't inline anything.
With optimization on, however, the compiler is free to notice that your test function can be completely evaluated at compile time, and crush it down to "return [constant]" -- at which point, it may well decide to inline both functions since they're so trivial, and then notice that the loops are pointless since the function value is not used, and squash that out too! This is basically what I got when I tried it.
So either way, you're not testing what you thought you tested.
Function call overhead ain't what it used to be, compared to the overhead of blowing out the level-1 instruction cache, which is what aggressive inlining does to you. You can easily find reports online of gcc's -Os option (optimize for size) being a better default choice for large projects than -O2, and the big reason for that is that -O2 inlines a lot more aggressively. I would expect it is much the same with MSVC.
The only way I know of to guarantee a function is inline is to #define it
For example:
#define RADTODEG(x) ((x) * 57.29578)
That said, the only time I would bother with such a function would be in an embedded system. On a desktop/server the performance difference is negligible.
Run it in a debugger and have a look at the generated code to see if your function is always or never inlined. I think it's always a good idea to have a look at the assembler code when you want more knowledge about the optimization the compiler does.
Apologies for a small flame ...
Compilers think in assembly language. You should too. Whatever else you do, just step through the code at the assembler level. Then you'll know exactly what the compiler did.
Don't think of performance in absolute terms like "fast" or "slow". It's all relative, percentage-wise. The way software is made fast is by removing, in successive steps, things that take too large a percent of the time.
Here's the flame: If a compiler can do a pretty good job of inlining functions that clearly need it, and if it can do a really good job of managing registers, I think that's just what it should do. If it can do a reasonable job of unrolling loops that clearly could use it, I can live with that. If it's knocking itself out trying to outsmart me by removing function calls that I clearly wrote and intended to be called, or scrambling my code sanctimoniously trying to save a JMP when that JMP occupies 0.000001% of running time (the way Fortran does), I get annoyed, frankly.
There seems to be a notion in the compiler world that there's no such thing as an unhelpful optimization. No matter how smart the compiler is, real optimization is the programmer's job, and nobody else's.
In a C++ project I'm working on, I have a flag kind of value which can have four values. Those four flags can be combined. Flags describe the records in database and can be:
new record
deleted record
modified record
existing record
Now, for each record I wish to keep this attribute, so I could use an enum:
enum { xNew, xDeleted, xModified, xExisting }
However, in other places in code, I need to select which records are to be visible to the user, so I'd like to be able to pass that as a single parameter, like:
showRecords(xNew | xDeleted);
So, it seems I have three possible appoaches:
#define X_NEW 0x01
#define X_DELETED 0x02
#define X_MODIFIED 0x04
#define X_EXISTING 0x08
or
typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;
or
namespace RecordType {
static const uint8 xNew = 1;
static const uint8 xDeleted = 2;
static const uint8 xModified = 4;
static const uint8 xExisting = 8;
}
Space requirements are important (byte vs int) but not crucial. With defines I lose type safety, and with enum I lose some space (integers) and probably have to cast when I want to do a bitwise operation. With const I think I also lose type safety since a random uint8 could get in by mistake.
Is there some other cleaner way?
If not, what would you use and why?
P.S. The rest of the code is rather clean modern C++ without #defines, and I have used namespaces and templates in few spaces, so those aren't out of question either.
Combine the strategies to reduce the disadvantages of a single approach. I work in embedded systems so the following solution is based on the fact that integer and bitwise operators are fast, low memory & low in flash usage.
Place the enum in a namespace to prevent the constants from polluting the global namespace.
namespace RecordType {
An enum declares and defines a compile time checked typed. Always use compile time type checking to make sure arguments and variables are given the correct type. There is no need for the typedef in C++.
enum TRecordType { xNew = 1, xDeleted = 2, xModified = 4, xExisting = 8,
Create another member for an invalid state. This can be useful as error code; for example, when you want to return the state but the I/O operation fails. It is also useful for debugging; use it in initialisation lists and destructors to know if the variable's value should be used.
xInvalid = 16 };
Consider that you have two purposes for this type. To track the current state of a record and to create a mask to select records in certain states. Create an inline function to test if the value of the type is valid for your purpose; as a state marker vs a state mask. This will catch bugs as the typedef is just an int and a value such as 0xDEADBEEF may be in your variable through uninitialised or mispointed variables.
inline bool IsValidState( TRecordType v) {
switch(v) { case xNew: case xDeleted: case xModified: case xExisting: return true; }
return false;
}
inline bool IsValidMask( TRecordType v) {
return v >= xNew && v < xInvalid ;
}
Add a using directive if you want to use the type often.
using RecordType ::TRecordType ;
The value checking functions are useful in asserts to trap bad values as soon as they are used. The quicker you catch a bug when running, the less damage it can do.
Here are some examples to put it all together.
void showRecords(TRecordType mask) {
assert(RecordType::IsValidMask(mask));
// do stuff;
}
void wombleRecord(TRecord rec, TRecordType state) {
assert(RecordType::IsValidState(state));
if (RecordType ::xNew) {
// ...
} in runtime
TRecordType updateRecord(TRecord rec, TRecordType newstate) {
assert(RecordType::IsValidState(newstate));
//...
if (! access_was_successful) return RecordType ::xInvalid;
return newstate;
}
The only way to ensure correct value safety is to use a dedicated class with operator overloads and that is left as an exercise for another reader.
Forget the defines
They will pollute your code.
bitfields?
struct RecordFlag {
unsigned isnew:1, isdeleted:1, ismodified:1, isexisting:1;
};
Don't ever use that. You are more concerned with speed than with economizing 4 ints. Using bit fields is actually slower than access to any other type.
However, bit members in structs have practical drawbacks. First, the ordering of bits in memory varies from compiler to compiler. In addition, many popular compilers generate inefficient code for reading and writing bit members, and there are potentially severe thread safety issues relating to bit fields (especially on multiprocessor systems) due to the fact that most machines cannot manipulate arbitrary sets of bits in memory, but must instead load and store whole words. e.g the following would not be thread-safe, in spite of the use of a mutex
Source: http://en.wikipedia.org/wiki/Bit_field:
And if you need more reasons to not use bitfields, perhaps Raymond Chen will convince you in his The Old New Thing Post: The cost-benefit analysis of bitfields for a collection of booleans at http://blogs.msdn.com/oldnewthing/archive/2008/11/26/9143050.aspx
const int?
namespace RecordType {
static const uint8 xNew = 1;
static const uint8 xDeleted = 2;
static const uint8 xModified = 4;
static const uint8 xExisting = 8;
}
Putting them in a namespace is cool. If they are declared in your CPP or header file, their values will be inlined. You'll be able to use switch on those values, but it will slightly increase coupling.
Ah, yes: remove the static keyword. static is deprecated in C++ when used as you do, and if uint8 is a buildin type, you won't need this to declare this in an header included by multiple sources of the same module. In the end, the code should be:
namespace RecordType {
const uint8 xNew = 1;
const uint8 xDeleted = 2;
const uint8 xModified = 4;
const uint8 xExisting = 8;
}
The problem of this approach is that your code knows the value of your constants, which increases slightly the coupling.
enum
The same as const int, with a somewhat stronger typing.
typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;
They are still polluting the global namespace, though.
By the way... Remove the typedef. You're working in C++. Those typedefs of enums and structs are polluting the code more than anything else.
The result is kinda:
enum RecordType { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } ;
void doSomething(RecordType p_eMyEnum)
{
if(p_eMyEnum == xNew)
{
// etc.
}
}
As you see, your enum is polluting the global namespace.
If you put this enum in an namespace, you'll have something like:
namespace RecordType {
enum Value { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } ;
}
void doSomething(RecordType::Value p_eMyEnum)
{
if(p_eMyEnum == RecordType::xNew)
{
// etc.
}
}
extern const int ?
If you want to decrease coupling (i.e. being able to hide the values of the constants, and so, modify them as desired without needing a full recompilation), you can declare the ints as extern in the header, and as constant in the CPP file, as in the following example:
// Header.hpp
namespace RecordType {
extern const uint8 xNew ;
extern const uint8 xDeleted ;
extern const uint8 xModified ;
extern const uint8 xExisting ;
}
And:
// Source.hpp
namespace RecordType {
const uint8 xNew = 1;
const uint8 xDeleted = 2;
const uint8 xModified = 4;
const uint8 xExisting = 8;
}
You won't be able to use switch on those constants, though. So in the end, pick your poison...
:-p
Have you ruled out std::bitset? Sets of flags is what it's for. Do
typedef std::bitset<4> RecordType;
then
static const RecordType xNew(1);
static const RecordType xDeleted(2);
static const RecordType xModified(4);
static const RecordType xExisting(8);
Because there are a bunch of operator overloads for bitset, you can now do
RecordType rt = whatever; // unsigned long or RecordType expression
rt |= xNew; // set
rt &= ~xDeleted; // clear
if ((rt & xModified) != 0) ... // test
Or something very similar to that - I'd appreciate any corrections since I haven't tested this. You can also refer to the bits by index, but it's generally best to define only one set of constants, and RecordType constants are probably more useful.
Assuming you have ruled out bitset, I vote for the enum.
I don't buy that casting the enums is a serious disadvantage - OK so it's a bit noisy, and assigning an out-of-range value to an enum is undefined behaviour so it's theoretically possible to shoot yourself in the foot on some unusual C++ implementations. But if you only do it when necessary (which is when going from int to enum iirc), it's perfectly normal code that people have seen before.
I'm dubious about any space cost of the enum, too. uint8 variables and parameters probably won't use any less stack than ints, so only storage in classes matters. There are some cases where packing multiple bytes in a struct will win (in which case you can cast enums in and out of uint8 storage), but normally padding will kill the benefit anyhow.
So the enum has no disadvantages compared with the others, and as an advantage gives you a bit of type-safety (you can't assign some random integer value without explicitly casting) and clean ways of referring to everything.
For preference I'd also put the "= 2" in the enum, by the way. It's not necessary, but a "principle of least astonishment" suggests that all 4 definitions should look the same.
Here are couple of articles on const vs. macros vs. enums:
Symbolic Constants
Enumeration Constants vs. Constant Objects
I think you should avoid macros especially since you wrote most of your new code is in modern C++.
If possible do NOT use macros. They aren't too much admired when it comes to modern C++.
With defines I lose type safety
Not necessarily...
// signed defines
#define X_NEW 0x01u
#define X_NEW (unsigned(0x01)) // if you find this more readable...
and with enum I lose some space (integers)
Not necessarily - but you do have to be explicit at points of storage...
struct X
{
RecordType recordType : 4; // use exactly 4 bits...
RecordType recordType2 : 4; // use another 4 bits, typically in the same byte
// of course, the overall record size may still be padded...
};
and probably have to cast when I want to do bitwise operation.
You can create operators to take the pain out of that:
RecordType operator|(RecordType lhs, RecordType rhs)
{
return RecordType((unsigned)lhs | (unsigned)rhs);
}
With const I think I also lose type safety since a random uint8 could get in by mistake.
The same can happen with any of these mechanisms: range and value checks are normally orthogonal to type safety (though user-defined-types - i.e. your own classes - can enforce "invariants" about their data). With enums, the compiler's free to pick a larger type to host the values, and an uninitialised, corrupted or just miss-set enum variable could still end up interpretting its bit pattern as a number you wouldn't expect - comparing unequal to any of the enumeration identifiers, any combination of them, and 0.
Is there some other cleaner way? / If not, what would you use and why?
Well, in the end the tried-and-trusted C-style bitwise OR of enumerations works pretty well once you have bit fields and custom operators in the picture. You can further improve your robustness with some custom validation functions and assertions as in mat_geek's answer; techniques often equally applicable to handling string, int, double values etc..
You could argue that this is "cleaner":
enum RecordType { New, Deleted, Modified, Existing };
showRecords([](RecordType r) { return r == New || r == Deleted; });
I'm indifferent: the data bits pack tighter but the code grows significantly... depends how many objects you've got, and the lamdbas - beautiful as they are - are still messier and harder to get right than bitwise ORs.
BTW /- the argument about thread safety's pretty weak IMHO - best remembered as a background consideration rather than becoming a dominant decision-driving force; sharing a mutex across the bitfields is a more likely practice even if unaware of their packing (mutexes are relatively bulky data members - I have to be really concerned about performance to consider having multiple mutexes on members of one object, and I'd look carefully enough to notice they were bit fields). Any sub-word-size type could have the same problem (e.g. a uint8_t). Anyway, you could try atomic compare-and-swap style operations if you're desperate for higher concurrency.
Enums would be more appropriate as they provide "meaning to the identifiers" as well as type safety. You can clearly tell "xDeleted" is of "RecordType" and that represent "type of a record" (wow!) even after years. Consts would require comments for that, also they would require going up and down in code.
Even if you have to use 4 byte to store an enum (I'm not that familiar with C++ -- I know you can specify the underlying type in C#), it's still worth it -- use enums.
In this day and age of servers with GBs of memory, things like 4 bytes vs. 1 byte of memory at the application level in general don't matter. Of course, if in your particular situation, memory usage is that important (and you can't get C++ to use a byte to back the enum), then you can consider the 'static const' route.
At the end of the day, you have to ask yourself, is it worth the maintenance hit of using 'static const' for the 3 bytes of memory savings for your data structure?
Something else to keep in mind -- IIRC, on x86, data structures are 4-byte aligned, so unless you have a number of byte-width elements in your 'record' structure, it might not actually matter. Test and make sure it does before you make a tradeoff in maintainability for performance/space.
If you want the type safety of classes, with the convenience of enumeration syntax and bit checking, consider Safe Labels in C++. I've worked with the author, and he's pretty smart.
Beware, though. In the end, this package uses templates and macros!
Do you actually need to pass around the flag values as a conceptual whole, or are you going to have a lot of per-flag code? Either way, I think having this as class or struct of 1-bit bitfields might actually be clearer:
struct RecordFlag {
unsigned isnew:1, isdeleted:1, ismodified:1, isexisting:1;
};
Then your record class could have a struct RecordFlag member variable, functions can take arguments of type struct RecordFlag, etc. The compiler should pack the bitfields together, saving space.
I probably wouldn't use an enum for this kind of a thing where the values can be combined together, more typically enums are mutually exclusive states.
But whichever method you use, to make it more clear that these are values which are bits which can be combined together, use this syntax for the actual values instead:
#define X_NEW (1 << 0)
#define X_DELETED (1 << 1)
#define X_MODIFIED (1 << 2)
#define X_EXISTING (1 << 3)
Using a left-shift there helps to indicate that each value is intended to be a single bit, it is less likely that later on someone would do something wrong like add a new value and assign it something a value of 9.
Based on KISS, high cohesion and low coupling, ask these questions -
Who needs to know? my class, my library, other classes, other libraries, 3rd parties
What level of abstraction do I need to provide? Does the consumer understand bit operations.
Will I have have to interface from VB/C# etc?
There is a great book "Large-Scale C++ Software Design", this promotes base types externally, if you can avoid another header file/interface dependancy you should try to.
If you are using Qt you should have a look for QFlags.
The QFlags class provides a type-safe way of storing OR-combinations of enum values.
I would rather go with
typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;
Simply because:
It is cleaner and it makes the code readable and maintainable.
It logically groups the constants.
Programmer's time is more important, unless your job is to save those 3 bytes.
Not that I like to over-engineer everything but sometimes in these cases it may be worth creating a (small) class to encapsulate this information.
If you create a class RecordType then it might have functions like:
void setDeleted();
void clearDeleted();
bool isDeleted();
etc... (or whatever convention suits)
It could validate combinations (in the case where not all combinations are legal, eg if 'new' and 'deleted' could not both be set at the same time). If you just used bit masks etc then the code that sets the state needs to validate, a class can encapsulate that logic too.
The class may also give you the ability to attach meaningful logging info to each state, you could add a function to return a string representation of the current state etc (or use the streaming operators '<<').
For all that if you are worried about storage you could still have the class only have a 'char' data member, so only take a small amount of storage (assuming it is non virtual). Of course depending on the hardware etc you may have alignment issues.
You could have the actual bit values not visible to the rest of the 'world' if they are in an anonymous namespace inside the cpp file rather than in the header file.
If you find that the code using the enum/#define/ bitmask etc has a lot of 'support' code to deal with invalid combinations, logging etc then encapsulation in a class may be worth considering. Of course most times simple problems are better off with simple solutions...