SPI_IOC_MESSAGE(N) macro giving me fits - c++

I'm having trouble getting a SPI program I'm working on to behave correctly and it seems to be some issue with the SPI_IOC_MESSAGE(N) macro.
Here's sample code that DOESN'T work (ioctl returns EINVAL (22) ):
std::vector<spi_ioc_transfer> tr;
<code that fills tr with 1+ transfers>
// Hand the transmission(s) off to the SPI driver
if (tr.size() > 0)
{
int ret = ioctl(fd, SPI_IOC_MESSAGE(tr.size()), tr.data());
if (ret < 1)
{
int err = errno;
}
}
My test code right now is creating a vector of length 1.
If I explicitly change the code to:
int ret = ioctl(fd, SPI_IOC_MESSAGE( 1 ), tr.data());
...then ioctl(...) succeeds and my bits go down the pipe.
Looking at the expansion of the SPI_IOC_MESSAGE macro in Eclipse, I don't see why this isn't happy.
Suggestions?
I'm cross-compiling for Linux/ARM (Beaglebone Black) from a 64-bit Linux VM, but that I can't see that affecting the macro.
EDIT:
Here are the two macro expansions out of the C pre-processor
int ret = ioctl(fd, (((1U) << (((0 +8)+8)+14)) | ((('k')) << (0 +8)) | (((0)) << 0) | ((((sizeof(char[((((tr.size())*(sizeof (struct spi_ioc_transfer))) < (1 << 14)) ? ((tr.size())*(sizeof (struct spi_ioc_transfer))) : 0)])))) << ((0 +8)+8))), tr.data());
and the literal:
int ret = ioctl(fd, (((1U) << (((0 +8)+8)+14)) | ((('k')) << (0 +8)) | (((0)) << 0) | ((((sizeof(char[((((1)*(sizeof (struct spi_ioc_transfer))) < (1 << 14)) ? ((1)*(sizeof (struct spi_ioc_transfer))) : 0)])))) << ((0 +8)+8))), tr.data());
Absolutely hideous, but I don't see anything surprising in how tr.size() would be getting used there.
Edit to include what seems to be the answer
#ifdef __cplusplus /* If this is a C++ compiler, use C linkage */
extern "C" {
#endif
#include <linux/spi/spidev.h>
#ifdef __cplusplus /* If this is a C++ compiler, use C linkage */
}
#endif
Wrapping the linux SPI include file in an "extern C" instructs the system to treat that section as plain old C, and that seems to let me call SPI_IOC_MESSAGE( tr.size() ) or SPI_IOC_MESSAGE( an_int ) and have the correct thing happen (verified with GDB stepthrough and a signal analyzer).

I suspect the problem might lie in this particular nugget buried in the macro soup:
...sizeof(char[...tr.size()...])...
Noting that Linux code is exclusively C, the C standard (I have C99 draft n1256 here) states for the sizeof operator that an expression operand is unevaluated, and the result is a constant, unless the type of the operand is a variable-length array, in which case it is evaluated and the result is an integer.
The C++ standard (C++11 draft n3242), however, does not seem to give any condition under which the operand is evaluated, and only states the result is a constant.
Thus it looks like this may be one of the corners where C and C++ differ, and compiling one as the other leads to undefined behaviour. In that case, I think the choice is either just hacking up your own version of the macro, or having a separate external C function that wraps it.

Related

Why are C++ (Visual Studio) Datatypes not stopping on their Limit?

Usually, it would be more secure and better (to avoid UB for example) if Datatypes stop working (program crashes or whatever) instead of having no kind of information if their limit is exceeded.
For example, the limit of unsigned char is 255, now let's say the value 3000 is stored as unsigned char in a text-file, and loaded into the C++ program. It will not give any error or something, instead unsigned char will just do some auto-conversion (or UB?) and give some another value, below it's limit. But why? What is that good for?
Is there any way, to make programs build with Visual Studio stop working (crashing or give an alert etc.) if the data type which is used to handle the value is exceeded?
Same with signed/unsigned data types, if you use unsigned, but you load "-1" value somehow, the unsigned data type is just "accepting" it but gives you some other value above 0.
The most important reason for the missing checks is in most cases performance. C++ follows the principle Don't get what you don't buy. If you need checks, write a custom type which makes the check. You can even let it to do the checks in debug mode only. (E.g. std::vector does it in its operator[].) If you want a safe language C++ is your worst choice (right after C and assembly). ;-) But there are many other higher level languages you can choose from.
For debug checks, there is a macro in the standard library: assert():
#ifdef NDEBUG
#define assert(condition) ((void)0)
#else
#define assert(condition) /*implementation defined*/
#endif
Thereby, the /*implementation defined*/ part checks the condition. If it results to false then usually abort() is called which in turn aborts the process immediately (usually producing a core dump which can be evaluated in a debugger).
Is there any way, to make programs build with Visual Studio stop working (crashing or give an alert etc.) if the data type which is used to handle the value is exceeded?
Same with signed/unsigned data types, if you use unsigned, but you load "-1" value somehow, the unsigned data type is just "accepting" it but gives you some other value above 0.
Yes, there is a way – using custom types as wrappers around the originals.
A simple sample to demonstrate this:
#include <cassert>
#include <iostream>
struct UChar {
unsigned char value;
UChar(unsigned char value = 0): value(value) { }
UChar(char value): value((assert(value >= 0), (unsigned char)value)) { }
UChar(int value): value((assert(value >= 0 && value < 256), value)) { }
UChar(unsigned value): value((assert(value < 256), value)) { }
UChar(long value): value((assert(value >= 0 && value < 256), value)) { }
UChar(unsigned long value): value((assert(value < 256), value)) { }
UChar(long long value): value((assert(value >= 0 && value < 256), value)) { }
UChar(unsigned long long value): value((assert(value < 256), value)) { }
UChar(const UChar&) = default;
UChar& operator=(const UChar&) = default;
~UChar() = default;
operator unsigned char() { return value; }
};
#define PRINT_AND_DO(...) std::cout << #__VA_ARGS__ << ";\n"; __VA_ARGS__
int main()
{
// OK
PRINT_AND_DO(UChar c);
PRINT_AND_DO(std::cout << "c: " << (unsigned)c << '\n');
PRINT_AND_DO(c = 'A');
PRINT_AND_DO(std::cout << "c: " << c << '\n');
PRINT_AND_DO(UChar d = 'B');
PRINT_AND_DO(std::cout << "d: " << d << '\n');
PRINT_AND_DO(d = c);
PRINT_AND_DO(std::cout << "d: " << d << '\n');
// This will crash if NDEBUG not defined:
//PRINT_AND_DO(UChar e(3000));
//PRINT_AND_DO(c = 3000);
PRINT_AND_DO(d = -1);
}
Output:
a.out: main.cpp:9: UChar::UChar(int): Assertion `value >= 0 && value < 256' failed.
UChar c;
std::cout << "c: " << (unsigned)c << '\n';
c: 0
c = 'A';
std::cout << "c: " << c << '\n';
c: A
UChar d = 'B';
std::cout << "d: " << d << '\n';
d: B
d = c;
std::cout << "d: " << d << '\n';
d: A
d = -1;
bash: line 7: 31094 Aborted (core dumped) ./a.out
Live Demo on coliru
Note:
This is a demonstration only – not production ready code. E.g. overflow in addition or underflow in subtraction is not catched because the conversion operator operator unsigned char() causes the wrapper instance is converted back to unsigned char whenever an operator is used which is not overloaded in struct UChar (but supports unsigned char). To fix this, more operators have to be overloaded for UChar. However, I believe the demo is good enough to show the principle.
What you describe is not managed by C++ (as you've seen). It is called a wrap-around. I.e. the number is truncated to whatever value the variable can hold as if it was wrapping back to zero once its "limit" is exceeded. (for positive numbers at least).
The reason this is not caught is that it can be useful to have this feature.
That being said - The compiler should give you a warning though if you assign a constant number that is too large for the variable, or if you assign a "bigger-container" type into a smaller one (like uint16 into uint8).
You can configure the compiler to fail compilation on these kinds of warnings and so you would have ot explicitely convert types.
A programming language, particularly one as close to hardware as C/C++ are, generally only exposes what a typical microprocessor provides in its instruction set. Taking x86 as an example, when you add to integers - typically - the processor will set the OV flag if an overflow happened and IIRC OV and SIGN flags for underflows.
Normally, the compilers do not generate extra instructions to check for these flags after every arithmetic operation which could under-/over-flow. That costs performance, and performance is king, even over correctness, it seems.
The concept you are looking for is two-fold:
Saturating arithmetic: Where no matter what you do, a variable is clipped to min/max of its datatype's range. No exceptions are raised.
Compiler explicitly checks for under-/over-flows: The closest example I can think of is checked { } construct in C#, which you should note is not enabled by default.
P.S.: I also recall that x86 might have added saturating arithmetic in its instruction set at some point, perhaps through the SIMD extensions.

How to define endian enum class waiting for C++20? [duplicate]

This question already has answers here:
Macro definition to determine big endian or little endian machine?
(22 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Is there a safe, portable way to determine (during compile time) the endianness of the platform that my program is being compiled on? I'm writing in C.
[EDIT]
Thanks for the answers, I decided to stick with the runtime solution!
To answer the original question of a compile-time check, there's no standardized way to do it that will work across all existing and all future compilers, because none of the existing C, C++, and POSIX standards define macros for detecting endianness.
But, if you're willing to limit yourself to some known set of compilers, you can look up each of those compilers' documentations to find out which predefined macros (if any) they use to define endianness. This page lists several macros you can look for, so here's some code which would work for those:
#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \
defined(__BIG_ENDIAN__) || \
defined(__ARMEB__) || \
defined(__THUMBEB__) || \
defined(__AARCH64EB__) || \
defined(_MIBSEB) || defined(__MIBSEB) || defined(__MIBSEB__)
// It's a big-endian target architecture
#elif defined(__BYTE_ORDER) && __BYTE_ORDER == __LITTLE_ENDIAN || \
defined(__LITTLE_ENDIAN__) || \
defined(__ARMEL__) || \
defined(__THUMBEL__) || \
defined(__AARCH64EL__) || \
defined(_MIPSEL) || defined(__MIPSEL) || defined(__MIPSEL__)
// It's a little-endian target architecture
#else
#error "I don't know what architecture this is!"
#endif
If you can't find what predefined macros your compiler uses from its documentation, you can also try coercing it to spit out its full list of predefined macros and guess from there what will work (look for anything with ENDIAN, ORDER, or the processor architecture name in it). This page lists a number of methods for doing that in different compilers:
Compiler C macros C++ macros
Clang/LLVM clang -dM -E -x c /dev/null clang++ -dM -E -x c++ /dev/null
GNU GCC/G++ gcc -dM -E -x c /dev/null g++ -dM -E -x c++ /dev/null
Hewlett-Packard C/aC++ cc -dM -E -x c /dev/null aCC -dM -E -x c++ /dev/null
IBM XL C/C++ xlc -qshowmacros -E /dev/null xlc++ -qshowmacros -E /dev/null
Intel ICC/ICPC icc -dM -E -x c /dev/null icpc -dM -E -x c++ /dev/null
Microsoft Visual Studio (none) (none)
Oracle Solaris Studio cc -xdumpmacros -E /dev/null CC -xdumpmacros -E /dev/null
Portland Group PGCC/PGCPP pgcc -dM -E (none)
Finally, to round it out, the Microsoft Visual C/C++ compilers are the odd ones out and don't have any of the above. Fortunately, they have documented their predefined macros here, and you can use the target processor architecture to infer the endianness. While all of the currently supported processors in Windows are little-endian (_M_IX86, _M_X64, _M_IA64, and _M_ARM are little-endian), some historically supported processors like the PowerPC (_M_PPC) were big-endian. But more relevantly, the Xbox 360 is a big-endian PowerPC machine, so if you're writing a cross-platform library header, it can't hurt to check for _M_PPC.
This is for compile time checking
You could use information from the boost header file endian.hpp, which covers many platforms.
edit for runtime checking
bool isLittleEndian()
{
short int number = 0x1;
char *numPtr = (char*)&number;
return (numPtr[0] == 1);
}
Create an integer, and read its first byte (least significant byte). If that byte is 1, then the system is little endian, otherwise it's big endian.
edit Thinking about it
Yes you could run into a potential issue in some platforms (can't think of any) where sizeof(char) == sizeof(short int). You could use fixed width multi-byte integral types available in <stdint.h>, or if your platform doesn't have it, again you could adapt a boost header for your use: stdint.hpp
With C99, you can perform the check as:
#define I_AM_LITTLE (((union { unsigned x; unsigned char c; }){1}).c)
Conditionals like if (I_AM_LITTLE) will be evaluated at compile-time and allow the compiler to optimize out whole blocks.
I don't have the reference right off for whether this is strictly speaking a constant expression in C99 (which would allow it to be used in initializers for static-storage-duration data), but if not, it's the next best thing.
Interesting read from the C FAQ:
You probably can't. The usual techniques for detecting endianness
involve pointers or arrays of char, or maybe unions, but preprocessor
arithmetic uses only long integers, and there is no concept of
addressing. Another tempting possibility is something like
#if 'ABCD' == 0x41424344
but this isn't reliable, either.
I would like to extend the answers for providing a constexpr function for C++
union Mix {
int sdat;
char cdat[4];
};
static constexpr Mix mix { 0x1 };
constexpr bool isLittleEndian() {
return mix.cdat[0] == 1;
}
Since mix is constexpr too it is compile time and can be used in constexpr bool isLittleEndian(). Should be safe to use.
Update
As #Cheersandhth pointed out below, these seems to be problematic.
The reason is, that it is not C++11-Standard conform, where type punning is forbidden. There can always only one union member be active at a time. With a standard conforming compiler you will get an error.
So, don't use it in C++. It seems, you can do it in C though. I leave my answer in for educational purposes :-) and because the question is about C...
Update 2
This assumes that int has the size of 4 chars, which is not always given as #PetrVepřek correctly pointed out below. To make your code truly portable you have to be more clever here. This should suffice for many cases though. Note that sizeof(char) is always 1, by definition. The code above assumes sizeof(int)==4.
Use CMake TestBigEndian as
INCLUDE(TestBigEndian)
TEST_BIG_ENDIAN(ENDIAN)
IF (ENDIAN)
# big endian
ELSE (ENDIAN)
# little endian
ENDIF (ENDIAN)
Not during compile time, but perhaps during runtime. Here's a C function I wrote to determine endianness:
/* Returns 1 if LITTLE-ENDIAN or 0 if BIG-ENDIAN */
#include <inttypes.h>
int endianness()
{
union { uint8_t c[4]; uint32_t i; } data;
data.i = 0x12345678;
return (data.c[0] == 0x78);
}
From Finally, one-line endianness detection in the C preprocessor:
#include <stdint.h>
#define IS_BIG_ENDIAN (*(uint16_t *)"\0\xff" < 0x100)
Any decent optimizer will resolve this at compile-time. gcc does at -O1.
Of course stdint.h is C99. For ANSI/C89 portability see Doug Gwyn's Instant C9x library.
I took it from rapidjson library:
#define BYTEORDER_LITTLE_ENDIAN 0 // Little endian machine.
#define BYTEORDER_BIG_ENDIAN 1 // Big endian machine.
//#define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
#ifndef BYTEORDER_ENDIAN
// Detect with GCC 4.6's macro.
# if defined(__BYTE_ORDER__)
# if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
// Detect with GLIBC's endian.h.
# elif defined(__GLIBC__)
# include <endian.h>
# if (__BYTE_ORDER == __LITTLE_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif (__BYTE_ORDER == __BIG_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
// Detect with _LITTLE_ENDIAN and _BIG_ENDIAN macro.
# elif defined(_LITTLE_ENDIAN) && !defined(_BIG_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
// Detect with architecture macros.
# elif defined(__sparc) || defined(__sparc__) || defined(_POWER) || defined(__powerpc__) || defined(__ppc__) || defined(__hpux) || defined(__hppa) || defined(_MIPSEB) || defined(_POWER) || defined(__s390__)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# elif defined(__i386__) || defined(__alpha__) || defined(__ia64) || defined(__ia64__) || defined(_M_IX86) || defined(_M_IA64) || defined(_M_ALPHA) || defined(__amd64) || defined(__amd64__) || defined(_M_AMD64) || defined(__x86_64) || defined(__x86_64__) || defined(_M_X64) || defined(__bfin__)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif defined(_MSC_VER) && (defined(_M_ARM) || defined(_M_ARM64))
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
#endif
I once used a construct like this one:
uint16_t HI_BYTE = 0,
LO_BYTE = 1;
uint16_t s = 1;
if(*(uint8_t *) &s == 1) {
HI_BYTE = 1;
LO_BYTE = 0;
}
pByte[HI_BYTE] = 0x10;
pByte[LO_BYTE] = 0x20;
gcc with -O2 was able to make it completely compile time. That means, the HI_BYTE and LO_BYTE variables were replaced entirely and even the pByte acces was replaced in the assembler by the equivalent of *(unit16_t *pByte) = 0x1020;.
It's as compile time as it gets.
To my knowledge no, not during compile time.
At run-time, you can do trivial checks such as setting a multi-byte value to a known bit string and inspect what bytes that results in. For instance using a union,
typedef union {
uint32_t word;
uint8_t bytes[4];
} byte_check;
or casting,
uint32_t word;
uint8_t * bytes = &word;
Please note that for completely portable endianness checks, you need to take into account both big-endian, little-endian and mixed-endian systems.
For my part, I decided to use an intermediate approach: try the macros, and if they don't exist, or if we can't find them, then do it in runtime. Here is one that works on the GNU-compiler:
#define II 0x4949 // arbitrary values != 1; examples are
#define MM 0x4D4D // taken from the TIFF standard
int
#if defined __BYTE_ORDER__ && __BYTE_ORDER__ == __LITTLE_ENDIAN
const host_endian = II;
# elif defined __BYTE_ORDER__ && __BYTE_ORDER__ == __BIG__ENDIAN
const host_endian = MM;
#else
#define _no_BYTE_ORDER
host_endian = 1; // plain "int", not "int const" !
#endif
and then, in the actual code:
int main(int argc, char **argv) {
#ifdef _no_BYTE_ORDER
host_endian = * (char *) &host_endian ? II : MM;
#undef _no_BYTE_ORDER
#endif
// .... your code here, for instance:
printf("Endedness: %s\n", host_endian == II ? "little-endian"
: "big-endian");
return 0;
}
On the other hand, as the original poster noted, the overhead of a runtime check is so little (two lines of code, and micro-seconds of time) that it's hardly worth the bother to try and do it in the preprocessor.

Why does the C preprocessor consider enum values as equal?

Why does the std::cout line in the following code run even though A and B are different?
#include <iostream>
enum T { A = 1, B = 2 };
// #define A 1
// #define B 2
int main() {
#if (A == B)
std::cout << A << B;
#endif
}
If I use #define instead (as commented out), I get no output as I expect.
Reason for the question:
I want to have a mode selector for some test code in which I can easily change modes by commenting/uncommenting lines on top:
enum T { MODE_RGB = 1, MODE_GREY = 2, MODE_CMYK = 3 };
// #define MODE MODE_RGB
#define MODE MODE_GREY
// #define MODE MODE_CMYK
int main() {
#if (MODE == MODE_RGB)
// do RGB stuff
#elif (MODE == MODE_GREY)
// do greyscale stuff
#else
// do CMYK stuff
#endif
// some common code
some_function(arg1, arg2,
#if (MODE == MODE_RGB)
// RGB calculation for arg3,
#elif (MODE == MODE_GREY)
// greyscale calculation for arg3,
#else
// CMYK calculation for arg3,
#endif
arg4, arg5);
}
I know I can use numeric values e.g.
#define MODE 1 // RGB
...
#if (MODE == 1) // RGB
but it makes the code less readable.
Is there an elegant solution for this?
There are no macros called A or B, so on your #if line, A and B get replaced by 0, so you actually have:
enum T { A = 1, B = 2 };
int main() {
#if (0 == 0)
std::cout << A << B;
#endif
}
The preprocessor runs before the compiler knows anything about your enum. The preprocessor only knows about macros (#define).
This is because the preprocessor works before compile time.
As the enum definitions occur at compile time, A and B will both be defined as empty (pp-number 0) - and thus equal - at pre-processing time, and thus the output statement is included in the compiled code.
When you use #define they are defined differently at pre-processing time and thus the statement evaluates to false.
In relation to your comment about what you want to do, you don't need to use pre-processor #if to do this. You can just use the standard if as both MODE and MODE_GREY (or MODE_RGB or MODE_CMYK) are all still defined:
#include <iostream>
enum T { MODE_RGB = 1, MODE_GREY = 2, MODE_CMYK = 3 };
#define MODE MODE_GREY
int main()
{
if( MODE == MODE_GREY )
std::cout << "Grey mode" << std::endl;
else if( MODE == MODE_RGB )
std::cout << "RGB mode" << std::endl;
else if( MODE == MODE_CMYK )
std::cout << "CMYK mode" << std::endl;
return 0;
}
The other option using only the pre-processor is to do this as #TripeHound correctly answered below.
Identifiers that are not defined macros are interpreted as value 0 in conditional preprocessor directives. Therefore, since you hadn't defined macros A and B, they are both considered 0 and two 0 are equal to each other.
The reason why undefined (to the pre-processor) identifiers are considered 0 is because it allows using undefined macros in the conditional without using #ifdef.
As the other answers said, the C preprocessor doesn't see enums. It expects, and can only understand, macros.
Per the C99 standard, §6.10.1 (Conditional inclusion):
After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers are replaced with the pp-number 0
In other words, in an #if or #elif directive, any macros that cannot be expanded, because they don't exist/are undefined, will behave exactly as if they'd been defined as 0, and therefore will always be equal to each other.
You can catch likely unintended behavior like this in GCC/clang with the warning option -Wundef (you'll probably want to make it fatal with -Werror=undef).
The preprocessor runs before the compiler, which means that the preprocessor doesn't know anything about symbols defined by the compiler and therefore it can't act depending on them.
Other answers explain why what you're trying doesn't work; for an alternative, I'd probably go with:
#define RGB 1
#define GREY 2
#define CMYK 3
#define MODE RGB
#if MODE == RGB
//RGB-mode code
#elif MODE == GREY
//Greyscale code
#elif MODE == CMYK
//CMYK code
#else
# error Undefined MODE
#endif
You might want prefixes on the RGB/GREY/CMYK if there's a danger of clashes with "real" source code.
The posts have explained why, but a possible solution for you that keeps readability might be like this
#define MODE_RGB
int main()
{
#ifdef MODE_RGB
std::cout << "RGB mode" << std::endl;
#elif defined MODE_GREY
std::cout << "Grey mode" << std::endl;
#elif defined MODE_CMYK
std::cout << "CMYK mode" << std::endl;
#endif
}
You just then need to change the macro at the top, to only the macro you are interested in is defined. You could also include a check to make sure that one and only one is defined and if not then and do #error "You must define MODE_RGB, MODE_GREY or MODE_CMYK

fputc perfomance and optimization

I have been seeing some strange behavior with a program I wrote that I cannot really explain and I was wondering if anybody could explain to me what is happening here. I have the feeling this is caused by some advanced optimization technique that g++ is using with -O3 but I am not sure.
I am running something similar to this (not a full example):
char* str = "(long AB string)"; // string _only_ consisting of As and Bs
size_t len = strlen(str);
for(unsigned long offset = 0; offset < len; offset++) {
if(offset % 100 == 0) fputc('\n', f);
fputc(str[offset], f);
}
This is fairly slow. However, when I additionally check the character like this, it suddenly becomes very fast:
char* str = "(long AB string)"; // string _only_ consisting of As and Bs
size_t len = strlen(str);
for(unsigned long offset = 0; offset < len; offset++) {
if(offset % 100 == 0) fputc('\n', f);
if(str[offset] != 'A' && str[offset] != 'B') exit(1);
fputc(str[offset], f);
}
This is despite the string only consisting of As and Bs, so the number of characters written does not change and the program always exits normally.
Can anybody explain to me what is happening here? Does the character check allow the optimizer to make some assumptions about str[offset] that it otherwise couldn't make, allowing it to optimize out some part of the fputc call?
The compiler optimizes away pretty everything into a simple
exit(1)
since the compiler is smart enough to recognize the string constant "(long string)" doesn't contain any 'A's or 'B's.
Frankly, I wouldn't have expected gcc to detect that ;)
In C, fputc(3) is mandated to be a function; equivalent to the typically implemented as a macro putc(3), which accesses the FILE buffer directly. Very hard to do it much faster, except perhaps using fwrite(3) to copy a stretch of characters at a time instead of going one-by-one. But such usage is exactly for what putc(3) is supposed to be optimized for, so...
Analyze your loop at the base C level by only preprocessing (gcc -E) and compiling to assembler (gcc -S), that might give some clues.
Are you sure (i.e., have concrete measurements to say so) that this loop is performance critical (or even relevant)? That would really be very strange.

How do I convert a value from host byte order to little endian?

I need to convert a short value from the host byte order to little endian. If the target was big endian, I could use the htons() function, but alas - it's not.
I guess I could do:
swap(htons(val))
But this could potentially cause the bytes to be swapped twice, rendering the result correct but giving me a performance penalty which is not alright in my case.
Here is an article about endianness and how to determine it from IBM:
Writing endian-independent code in C: Don't let endianness "byte" you
It includes an example of how to determine endianness at run time ( which you would only need to do once )
const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )
int main(void) {
int val;
char *ptr;
ptr = (char*) &val;
val = 0x12345678;
if (is_bigendian()) {
printf(“%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]);
} else {
printf(“%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]);
}
exit(0);
}
The page also has a section on methods for reversing byte order:
short reverseShort (short s) {
unsigned char c1, c2;
if (is_bigendian()) {
return s;
} else {
c1 = s & 255;
c2 = (s >> 8) & 255;
return (c1 << 8) + c2;
}
}
;
short reverseShort (char *c) {
short s;
char *p = (char *)&s;
if (is_bigendian()) {
p[0] = c[0];
p[1] = c[1];
} else {
p[0] = c[1];
p[1] = c[0];
}
return s;
}
Then you should know your endianness and call htons() conditionally. Actually, not even htons, but just swap bytes conditionally. Compile-time, of course.
Something like the following:
unsigned short swaps( unsigned short val)
{
return ((val & 0xff) << 8) | ((val & 0xff00) >> 8);
}
/* host to little endian */
#define PLATFORM_IS_BIG_ENDIAN 1
#if PLATFORM_IS_LITTLE_ENDIAN
unsigned short htoles( unsigned short val)
{
/* no-op on a little endian platform */
return val;
}
#elif PLATFORM_IS_BIG_ENDIAN
unsigned short htoles( unsigned short val)
{
/* need to swap bytes on a big endian platform */
return swaps( val);
}
#else
unsigned short htoles( unsigned short val)
{
/* the platform hasn't been properly configured for the */
/* preprocessor to know if it's little or big endian */
/* use potentially less-performant, but always works option */
return swaps( htons(val));
}
#endif
If you have a system that's properly configured (such that the preprocessor knows whether the target id little or big endian) you get an 'optimized' version of htoles(). Otherwise you get the potentially non-optimized version that depends on htons(). In any case, you get something that works.
Nothing too tricky and more or less portable.
Of course, you can further improve the optimization possibilities by implementing this with inline or as macros as you see fit.
You might want to look at something like the "Portable Open Source Harness (POSH)" for an actual implementation that defines the endianness for various compilers. Note, getting to the library requires going though a pseudo-authentication page (though you don't need to register to give any personal details): http://hookatooka.com/poshlib/
This trick should would: at startup, use ntohs with a dummy value and then compare the resulting value to the original value. If both values are the same, then the machine uses big endian, otherwise it is little endian.
Then, use a ToLittleEndian method that either does nothing or invokes ntohs, depending on the result of the initial test.
(Edited with the information provided in comments)
My rule-of-thumb performance guess is that depends whether you are little-endian-ising a big block of data in one go, or just one value:
If just one value, then the function call overhead is probably going to swamp the overhead of unnecessary byte-swaps, and that's even if the compiler doesn't optimise away the unnecessary byte swaps. Then you're maybe going to write the value as the port number of a socket connection, and try to open or bind a socket, which takes an age compared with any sort of bit-manipulation. So just don't worry about it.
If a large block, then you might worry the compiler won't handle it. So do something like this:
if (!is_little_endian()) {
for (int i = 0; i < size; ++i) {
vals[i] = swap_short(vals[i]);
}
}
Or look into SIMD instructions on your architecture which can do it considerably faster.
Write is_little_endian() using whatever trick you like. I think the one Robert S. Barnes provides is sound, but since you usually know for a given target whether it's going to be big- or little-endian, maybe you should have a platform-specific header file, that defines it to be a macro evaluating either to 1 or 0.
As always, if you really care about performance, then look at the generated assembly to see whether pointless code has been removed or not, and time the various alternatives against each other to see what actually goes fastest.
Unfortunately, there's not really a cross-platform way to determine a system's byte order at compile-time with standard C. I suggest adding a #define to your config.h (or whatever else you or your build system uses for build configuration).
A unit test to check for the correct definition of LITTLE_ENDIAN or BIG_ENDIAN could look like this:
#include <assert.h>
#include <limits.h>
#include <stdint.h>
void check_bits_per_byte(void)
{ assert(CHAR_BIT == 8); }
void check_sizeof_uint32(void)
{ assert(sizeof (uint32_t) == 4); }
void check_byte_order(void)
{
static const union { unsigned char bytes[4]; uint32_t value; } byte_order =
{ { 1, 2, 3, 4 } };
static const uint32_t little_endian = 0x04030201ul;
static const uint32_t big_endian = 0x01020304ul;
#ifdef LITTLE_ENDIAN
assert(byte_order.value == little_endian);
#endif
#ifdef BIG_ENDIAN
assert(byte_order.value == big_endian);
#endif
#if !defined LITTLE_ENDIAN && !defined BIG_ENDIAN
assert(!"byte order unknown or unsupported");
#endif
}
int main(void)
{
check_bits_per_byte();
check_sizeof_uint32();
check_byte_order();
}
On many Linux systems, there is a <endian.h> or <sys/endian.h> with conversion functions. man page for ENDIAN(3)