InterlockedIncrement usage - c++

While reading about the function InterlockedIncrement I saw the remark that the variable passed must be aligned on a 32-bit boundary. Normally I have seen the code which uses the InterlockedIncrement like this:
class A
{
public:
A();
void f();
private:
volatile long m_count;
};
A::A() : m_count(0)
{
}
void A::f()
{
::InterlockedIncrement(&m_count);
}
Does the above code work properly in multi-processor systems or should I take some more care for this?

It depends on your compiler settings. However, by default, anything eight bytes and under will be aligned on a natural boundary. Thus an "int" we be aligned on a 32-bit boundary.
Also, the "#pragma pack" directive can be used to change alignment inside a compile unit.
I would like to add that the answer assumes Microsoft C/C++ compiler. Packing rules might differ from compiler to compiler. But in general, I would assume that most C/C++ compilers for Windows use the same packing defaults just to make working with Microsoft SDK headers a bit easier.

The code looks fine (variables will be properly aligned unless you specifically do something to break that - usually involving casting or 'packed' structures).

Yes, this will work fine. Compilers usually do align unless instructed otherwise.

Strictly speaking, it really depends on your usage of A - for instance, if you pack an "A" object within a shell ITEMIDLIST, or a struct with a bad "pragma pack" the data may not be properly aligned.

Related

C++ Statement Reordering

This is a question about Chandler's answer here (I didn't have a high enough rep to comment): Enforcing statement order in C++
In his answer, suppose foo() has no input or output. It's a black box that does work that is observable eventually, but won't be needed immediately (e.g. executes some callback). So we don't have input/output data locally handy to tell the compiler not to optimize. But I know that foo() will modify the memory somewhere, and the result will be observable eventually. Will the following prevent statement reordering and get the correct timing in this case?
#include <chrono>
#include <iostream>
//I believe this tells the compiler that all memory everywhere will be clobbered?
//(from his cppcon talk: https://youtu.be/nXaxk27zwlk?t=2441)
__attribute__((always_inline)) inline void DoNotOptimize() {
asm volatile("" : : : "memory");
}
// The compiler has full knowledge of the implementation.
static int ugly_global = 1; //we print this to screen sometime later
static void foo(void) { ugly_global *= 2; }
auto time_foo() {
using Clock = std::chrono::high_resolution_clock;
auto t1 = Clock::now(); // Statement 1
DoNotOptimize();
foo(); // Statement 2
DoNotOptimize();
auto t2 = Clock::now(); // Statement 3
return t2 - t1;
}
Will the following prevent statement reordering and get the correct timing in this case?
It should not be necessary because the calls to Clock::now should, at the language-definition level, enforce enough ordering. (That is, the C++11 standard says that the high resolution clock ought to get as much information as the system can give here, in the way that is most useful here. See "secondary question" below.)
But there is a more general case. It's worth thinking about the question: How does whoever provides the C++ library implementation actually write this function? Or, take C++ itself out of the equation. Given a language standard, how does an implementor—a person or group writing an implementation of that language—get you what you need? Fundamentally, we need to make a distinction between what the language standard requires and how an implementation provider goes about implementing the requirements.
The language itself may be expressed in terms of an abstract machine, and the C and C++ languages are. This abstract machine is pretty loosely defined: it executes some kind of instructions, which access data, but in many cases we don't know how it does these things, or even how big the various data items are (with some exceptions for fixed-size integers like int64_t), and os on. The machine may or may not have "registers" that hold things in ways that cannot be addressed as well as memory that can be addressed and whose addresses can be recorded in pointers:
p = &var
makes the value store in p (in memory or a register) such that using *p accesses the value stored in var (in memory or a register—some machines, especially back in the olden days, have / had addressable registers).1
Nonetheless, despite all of this abstraction, we want to run real code on real machines. Real machines have real constraints: some instructions might require particular values in particular registers (think about all the bizarre stuff in the x86 instruction sets, or wide-result integer multipliers and dividers that use special-purpose registers, as on some MIPS processors), or cause CPU sychronizations, or whatever.
GCC in particular invented a system of constraints to express what you could or could not do on the machine itself, using the machine's instruction set. Over time, this evolved into user-accessible asm constructs with input, output, and clobber sections. The particular one you show:
__attribute__((always_inline)) inline void DoNotOptimize() {
asm volatile("" : : : "memory");
}
expresses the idea that "this instruction" (asm; the actual provided instruction is blank) "cannot be moved" (volatile) "and clobbers all of the computer's memory, but no registers" ("memory" as the clobber section).
This is not part of either C or C++ as a language. It's just a compiler construction, supported by GCC and now supported by clang as well. But it suffices to force the compiler to issue all stores-to-memory before the asm, and reload values from memory as needed after the asm, in case they changed when the computer executed the (nonexistent) instruction included in the asm line. There's no guarantee that this will work, or even compile at all, in some other compiler, but as long as we're the implementor, we choose the compiler we're implementing for/with.
C++ as a language now has support for ordered memory operations, which an implementor must implement. The implementor can use these asm volatile constructs to achieve the right result, provided they do actually achieve the right result. For instance, if we need to cause the machine itself to synchronize—to emit a memory barrier—we can stick the appropriate machine instruction, such as mfence or membar #sync or whatever it may be, in the asm's instruction-section clause. See also compiler reordering vs memory reordering as Klaus mentioned in a comment.
It is up to the implementor to find an appropriately effective trick, compiler-specific or not, to get the right semantics while minimizing any runtime slowdown: for instance, we might want to use lfence rather than mfence if that's sufficient, or membar #LoadLoad, or whatever the right thing is for the machine. If our implementation of Clock::now requires some sort of fancy inline asm, we write one. If not, we don't. We make sure that we produce what's required—and then all users of the system can just use it, without needing to know what sort of grubby implementation tricks we had to invoke.
There's a secondary question here: does the language specification really constrain the implementor the way we think/hope it does? Chris Dodd's comment says he thinks so, and he's usually right on these kinds of questions. A couple of other commenters think otherwise, but I'm with Chris Dodd on this one. I think it is not necessary. You can always compile to assembly, or disassemble the compiled program, to check, though!
If the compiler didn't do the right thing, that asm would force it to do the right thing, in GCC and clang. It probably wouldn't work in other compilers.
1On the KA-10 in particular, the registers were just the first sixteen words of memory. As the Wikipedia page notes, this meant you could put instructions into there and call them. Because the first 16 words were the registers, these instructions ran much faster than other instructions.

Do GCC/Clang allow to access static member through null pointer?

#include <iostream>
struct Foo { static auto foo() -> int { return 123; } };
int main() {
std::cout << static_cast<Foo*>(nullptr)->foo() << std::endl;
return 0;
}
I DO KNOW that this is not allowed by standard. But, how about specific compiler?
I only care GCC(G++) and Clang.
Is there any guarantee that these two compiler allow this as compiler feature/specification/extension?
I can find this neither in the list of gcc's C++ extensions nor in the corresponding list for clang. So I would say that you have no guarantee that this works for either compiler.
There is nothing saying that this will work. It is up to you, if you decide to use it, to ascertain that it really does work for the actual architecture in which you are using it. And it's important to reflect on the fact that code generation, debug traps for null pointer usage (and such) and optimisation in all compilers is "target system dependent" - so it may well work on x86, but not on MIPS, as an example. Or it may stop working in one way or another in version X+0.0.1 of the compiler.
Having said that, I'd expect this particular example to work perfectly fine on any architecture, because this is not used anywhere.
Note that just because something is undefined, doesn't necessarily mean that it will crash, fail or even "not work exactly as you expect it to". It does, however, allow the compiler to generate whatever code the compiler would like, including something that blows up your computer, formats your hard-drive, etc, etc.

Size of C++ types with different compilers

I would like to avoid to fall into the XY trap so here is the original problem:
We have a small program which creates a shared memory segment on the PC. This program creates it by reading its structure from its header file (bunch of individual and nested struct definition). Basically just a .h and a .cpp file. This program will be compiled by g++.
We would like to create another program, a shared memory viewer, which displays the layout of this memory in a tree view. For that, we have to parse the previously mentioned header file and computing the offsets to read/manipulate the content of the specific part of the shared memory. We do not want to write a parser if it is not necessary especially because the header file contains additional declarations and definitions too. This program will be compiled by the same version of g++ as the previous program.
Originally, we wanted to use gccxml in the second program to parse the header file but it is based on 4.2 gcc and is cannot parse the included header files which contain C++11 code. Another idea is to use libclang to get the structure of that header file. libclang contains size information too, but I do not know if the size of the types and padding/alignment is the same in case of g++ and clang.
My question is: can you assume that the size of the C++ types and the padding/alignment of the structs will be the same when you compile the code with clang and g++? The environment (PC, OS) is the same. I am afraid we cannot, because the C++ standard does not specify the exact sizes of the types.
Do you know another solution to the original problem?
Short answer: Since clang has as a goal to "be compatible with gcc" (for both C and C++), I would say that you can expect it to generate same offsets and sizes for the same code.
Long answer:
Assuming you are using only basic types (int, short, double, char and pointers to those types), and we're restricting to gcc and clang (and their C++ versions), keeping to the same OS and same bitness (32- or 64-bit on "both sides"), then subject to actual bugs in the compiler, it should have the same structure layout.
Of course, that is a long list of restrictions, and of course the "subject to actual bugs" is a never-ending concern in these cases.
You can make your case a bit easier if you use defined size types, such as uint32_t rather than int - conversely, if you put a class member in the structure, that has virtual members, you'd be seriously in trouble - but that doesn't work very well with shared memory anyway, as it's not guaranteed to be at the same place in different applications.
Be wary of STL functionality - you may not get the same C++ library for the two compilers (you may, or may not, depending on how you installed it).
I would double check, by adding some code to print the offset and size of important members (and run with both compilers, of course) - don't forget to do this for the members deep inside some struct, since it could well be that the overall size of a struct could be identical and the content could be at different offsets.
(As others have said, I have seen projects where some code is generated with a script that prints the offsets of the struct members, and this is used as input for other programs in the project)
Actually, in this particular case, you should be fine.
The memory layout of data-structures is part of the ABI (Application Binary Interface), and gcc and clang both follow the Itanium ABI on x86 (and x86_64). Therefore, baring bugs, and provided they both compile for x86 or x86_64, they should end up with binary compatible types.
In the general case, you would typically cheat:
Use packed data structure: struct X { ... } __attribute__((packed)) __attribute__((aligned (8))); and you completely control the structure memory layout
As mentioned by Alf, have one compiler spew the offset of each member and use that to feed the generation of structures for the second compiler
Other ?
Size of data types vary from platform to platform. Instead of hardcoding, use sizeof operator to find out appropriate size applicable for the target platform, for example,
sizeof(int)
sizeof(char)
sizeof(double)
etc.
If you use fixed width integer types (http://en.cppreference.com/w/cpp/types/integer) in a C-style struct and arrange members in decreasing order of size (i.e. largest members first), it should be pretty safe.
I think I understand your issue. This is what Chrome does
COMPILE_ASSERT(sizeof(double) == 8, Double_size_not_8);
It assumes the sizes will match but checks just to make sure.
COMPILE_ASSERT is a macro. You can find the definition here but the short version is it's just what it says. An assert that happens at compile time.
If the sizes did not match then one way to deal with it is to define your header in bytes only. Instead of for example
struct SomeBinaryFileHeader {
int version;
int width;
int height;
};
You might do this
struct SomeBinaryFileHeaderReadWriteVersion {
uint_8 version_0;
uint_8 version_1;
uint_8 version_2;
uint_8 version_3;
uint_8 width_0;
uint_8 width_1;
uint_8 width_2;
uint_8 width_3;
uint_8 height_0;
uint_8 height_1;
uint_8 height_2;
uint_8 height_3;
}
Etc. and then convert from one to the other which will even work across endianness

struct member alignment - is it possible to assume no padding

Imagine a struct made up of 32-bit, 16-bit, and 8-bit member values. Where the ordering of member values is such that each member is on it's natural boundary.
struct Foo
{
uint32_t a;
uint16_t b;
uint8_t c;
uint8_t d;
uint32_t e;
};
Member alignment and padding rules are documented for Visual C++. sizeof(Foo) on VC++ the above struct is predictably "12".
Now, I'm pretty sure the rule is that no assumption should be made about padding and alignment, but in practice, do other compilers on other operating systems make similar guarantees?
If not, is there an equivalent of "#pragma pack(1)" on GCC?
In practice, on any system where the uintXX_t types exist, you will get the desired alignment with no padding. Don't throw in ugly gcc-isms to try to guarantee it.
Edit: To elaborate on why it may be harmful to use attribute packed or aligned, it may cause the whole struct to be misaligned when used as a member of a larger struct or on the stack. This will definitely hurt performance and, on non-x86 machines, will generate much larger code. It also means it's invalid to take a pointer to any member of the struct, since code that accesses the value through a pointer will not be aware that it could be misaligned and thus could fault.
As for why it's unnecessary, keep in mind that attribute is specific to gcc and gcc-workalike compilers. The C standard does not leave alignment undefined or unspecified. It's implementation-defined which means the implementation is required to further specify and document how it behaves. gcc's behavior is, and always has been, to align each struct member on the next boundary of its natural alignment (the same alignment it would have when used outside of a struct, which is necessarily a number that evenly divides the size of the type). Since attribute is a gcc feature, if you use it you're already assuming a gcc-like compiler, but then by assumption you have the alignment you want already.
In general you are correct that it's not a safe assumption, although you will often get the packing you expect on many systems. You may want to use the packed attribute on your types when you use gcc.
E.g.
struct __attribute__((packed)) Blah { /* ... */ };
On systems that actually offer those types, it is highly likely to work. On, say, a 36-bit system those types would not be available in the first place.
GCC provides an attribute
 
__attribute__ ((packed))
With similar effect.

How to limit the impact of implementation-dependent language features in C++?

The following is an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 4.6:
Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined (§C.2). I point out these dependencies and often recommend avoiding them or taking steps to minimize their impact. Why should you bother? People who program on a variety of systems or use a variety of compilers care a lot because if they don’t, they are forced to waste time finding and fixing obscure bugs. People who claim they don’t care about portability usually do so because they use only a single system and feel they can afford the attitude that ‘‘the language is what my compiler implements.’’ This is a narrow and shortsighted view. If your program is a success, it is likely to be ported, so someone will have to find and fix problems related to implementation-dependent features. In addition, programs often need to be compiled with other compilers for the same system, and even a future release of your favorite compiler may do some things differently from the current one. It is far easier to know and limit the impact of implementation dependencies when a program is written than to try to untangle the mess afterwards.
It is relatively easy to limit the impact of implementation-dependent language features.
My question is: How to limit the impact of implementation-dependent language features? Please mention implementation-dependent language features then show how to limit their impact.
Few ideas:
Unfortunately you will have to use macros to avoid some platform specific or compiler specific issues. You can look at the headers of Boost libraries to see that it can quite easily get cumbersome, for example look at the files:
boost/config/compiler/gcc.hpp
boost/config/compiler/intel.hpp
boost/config/platform/linux.hpp
and so on
The integer types tend to be messy among different platforms, you will have to define your own typedefs or use something like Boost cstdint.hpp
If you decide to use any library, then do a check that the library is supported on the given platform
Use the libraries with good support and clearly documented platform support (for example Boost)
You can abstract yourself from some C++ implementation specific issues by relying heavily on libraries like Qt, which provide an "alternative" in sense of types and algorithms. They also attempt to make the coding in C++ more portable. Does it work? I'm not sure.
Not everything can be done with macros. Your build system will have to be able to detect the platform and the presence of certain libraries. Many would suggest autotools for project configuration, I on the other hand recommend CMake (rather nice language, no more M4)
endianness and alignment might be an issue if you do some low level meddling (i.e. reinterpret_cast and friends things alike (friends was a bad word in C++ context)).
throw in a lot of warning flags for the compiler, for gcc I would recommend at least -Wall -Wextra. But there is much more, see the documentation of the compiler or this question.
you have to watch out for everything that is implementation-defined and implementation-dependend. If you want the truth, only the truth, nothing but the truth, then go to ISO standard.
Well, the variable sizes one mentioned is a fairly well known issue, with the common workaround of providing typedeffed versions of the basic types that have well defined sizes (normally advertised in the typedef name). This is done use preprocessor macros to give different code-visibility on different platforms. E.g.:
#ifdef __WIN32__
typedef int int32;
typedef char char8;
//etc
#endif
#ifdef __MACOSX__
//different typedefs to produce same results
#endif
Other issues are normally solved in the same way too (i.e. using preprocessor tokens to perform conditional compilation)
The most obvious implementation dependency is size of integer types. There are many ways to handle this. The most obvious way is to use typedefs to create ints of the various sizes:
typedef signed short int16_t;
typedef unsigned short uint16_t;
The trick here is to pick a convention and stick to it. Which convention is the hard part: INT16, int16, int16_t, t_int16, Int16, etc. C99 has the stdint.h file which uses the int16_t style. If your compiler has this file, use it.
Similarly, you should be pedantic about using other standard defines such as size_t, time_t, etc.
The other trick is knowing when not to use these typedef. A loop control variable used to index an array, should just take raw int types so the compile will generate the best code for your processor. for (int32_t i = 0; i < x; ++i) could generate a lot of needless code on a 64-bite processor, just like using int16_t's would on a 32-bit processor.
A good solution is to use common headings that define typedeff'ed types as neccessary.
For example, including sys/types.h is an excellent way to deal with this, as is using portable libraries.
There are two approaches to this:
define your own types with a known size and use them instead of built-in types (like typedef int int32 #if-ed for various platforms)
use techniques which are not dependent on the type size
The first is very popular, however the second, when possible, usually results in a cleaner code. This includes:
do not assume pointer can be cast to int
do not assume you know the byte size of individual types, always use sizeof to check it
when saving data to files or transferring them across network, use techniques which are portable across changing data sizes (like saving/loading text files)
One recent example of this is writing code which can be compiled for both x86 and x64 platforms. The dangerous part here is pointer and size_t size - be prepared it can be 4 or 8 depending on platform, when casting or differencing pointer, cast never to int, use intptr_t and similar typedef-ed types instead.
One of the key ways of avoiding dependancy on particular data sizes is to read & write persistent data as text, not binary. If binary data must be used then all read/write operations must be centralised in a few methods and approaches like the typedefs already described here used.
A second rhing you can do is to enable all your your compilers warnings. for example, using the -pedantic flag with g++ will warn you of lots of potential portability problems.
If you're concerned about portability, things like the size of an int can be determined and dealt with without much difficulty. A lot of C++ compilers also support C99 features like the int types: int8_t, uint8_t, int16_t, uint32_t, etc. If yours doesn't support them natively, you can always include <cstdint> or <sys/types.h>, which, more often than not, has those typedefed. <limits.h> has these definitions for all the basic types.
The standard only guarantees the minimum size of a type, which you can always rely on: sizeof(char) < sizeof(short) <= sizeof(int) <= sizeof(long). char must be at least 8 bits. short and int must be at least 16 bits. long must be at least 32 bits.
Other things that might be implementation-defined include the ABI and name-mangling schemes (the behavior of export "C++" specifically), but unless you're working with more than one compiler, that's usually a non-issue.
The following is also an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 10.4.9:
No implementation-independent guarantees are made about the order of construction of nonlocal objects in different compilation units. For example:
// file1.c:
Table tbl1;
// file2.c:
Table tbl2;
Whether tbl1 is constructed before tbl2 or vice versa is implementation-dependent. The order isn’t even guaranteed to be fixed in every particular implementation. Dynamic linking, or even a small change in the compilation process, can alter the sequence. The order of destruction is similarly implementation-dependent.
A programmer may ensure proper initialization by implementing the strategy that the implementations usually employ for local static objects: a first-time switch. For example:
class Zlib {
static bool initialized;
static void initialize() { /* initialize */ initialized = true; }
public:
// no constructor
void f()
{
if (initialized == false) initialize();
// ...
}
// ...
};
If there are many functions that need to test the first-time switch, this can be tedious, but it is often manageable. This technique relies on the fact that statically allocated objects without constructors are initialized to 0. The really difficult case is the one in which the first operation may be time-critical so that the overhead of testing and possible initialization can be serious. In that case, further trickery is required (§21.5.2).
An alternative approach for a simple object is to present it as a function (§9.4.1):
int& obj() { static int x = 0; return x; } // initialized upon first use
First-time switches do not handle every conceivable situation. For example, it is possible to create objects that refer to each other during construction. Such examples are best avoided. If such objects are necessary, they must be constructed carefully in stages.