Misaligned pointer is scaring me. When parsing string, a helpful technique I normally use is treating a group of chars as one unit.
So if I am comparing a string to where <!-- is 4 chars that mean begin comment, I would do...
if( *(unsigned int*)string == tobe32('<!--') )
// This is beggining of a comment possibly
As you can see I handle the Endianess problem. But will I still stumble upon alignment pointer problem. As if I am at index 1 of the string, will casting it to unsigned int pointer give me a 4 byte object on a 1 byte boundary?
Yes, it will almost certainly be a misaligned pointer(a), casting the address should not actually change the address, just change how it's treated when you dereference it.
However, that's not necessarily a problem. Some environments may actually raise a hardware exception if you do this (some early ARMs, from memory), some will run a little slower (some x86s), and no doubt some won't care at all. So it would depend on your underlying environment.
However, I'd really question the need for this trick, since the fact you have to do endian conversion means it may not be as efficient as you think.
My first inclination would be to just write an inline function that checks the four characters individually, time it, and only worry about optimisation if there's a real problem. That would be something like:
// Check first four characters match. Pre-condition is that both
// legacy-C-strings are at least four characters in length.
inline bool match4(const char *str, const char *match) {
if (*str++ != *match++) return false;
if (*str++ != *match++) return false;
if (*str++ != *match++) return false;
return *str == *match;
}
This would be my starting position rather than relying on possibly non-portable solutions using casting.
(a) If you want to know the alignment requirements of certain types, you can use the C++ alignof expression, such as alignof(int), assuming you have C++11 or better and, really, you should have :-)
Related
template <char terminator = '\0'>
size_t strlen(const char *str)
{
const char *char_ptr;
const unsigned long int *longword_ptr;
unsigned long int longword, magic_bits, himagic, lomagic;
for (char_ptr = str; ((unsigned long int) char_ptr
& (sizeof (longword) - 1)) != 0; ++char_ptr)
if (*char_ptr == '\0')
return char_ptr - str;
longword_ptr = (unsigned long int *) char_ptr;
himagic = 0x80808080L;
lomagic = 0x01010101L;
for (;;)
{
longword = *longword_ptr++;
if (((longword - lomagic) & himagic) != 0)
{
const char *cp = (const char *) (longword_ptr - 1);
if (cp[0] == 0)
return cp - str;
if (cp[1] == 0)
return cp - str + 1;
if (cp[2] == 0)
return cp - str + 2;
if (cp[3] == 0)
return cp - str + 3;
}
}
}
The above is glibc strlen() code. It relies on trick to Determine if a word has a zero byte to make it fast.
However, I wish to make the function work with any terminating character, not just '\0', using template. Is it possible to do something similar ?
Use std::memchr to take advantage of libc's hand-written asm
It returns a pointer to the found byte, so you can get the length by subtracting. It returns NULL on not-found, but you said you can assume there will be a match, so we don't need to check except as a debug assert.
Even better, use rawmemchr if you can assume GNU functions are available, so you don't even have to pass a length. (Or not, since glibc 2.37 deprecates it.)
#include <cstring>
size_t lenx(const char *p, int c, size_t len)
{
const void *match = std::memchr(p, c, len); // old C functions take char in an int
return static_cast<const char*>(match) - p;
}
Any decent libc implementation for a modern mainstream CPU will have a fast memchr implementation that checks multiple bytes at once, often hand-written in asm. Very similar to an strlen implementation, but with length-based loop exit condition in the unrolled loop, not just the match-finding loop exit condition.
memchr is somewhat cheaper than strchr, which has to check every byte for being a potential 0; an amount of work that doesn't go down with unrolling and SIMD. If data isn't hot in L1 cache, a good strchr can typically still keep up with available bandwidth, though, on most CPUs for most ISAs. Checking for 0s is also a correctness problem for arrays that contain a 0 byte before the byte you're looking for.
If available, it will even use SIMD instructions to check 16 or 32 bytes at once. A pure C bithack (with strict-aliasing UB) like the one you found is only used in portable fallback code in real C libraries (Why does glibc's strlen need to be so complicated to run quickly? explains this and has some links to glibc's asm implementations), or on targets where it compiles to asm as good as could be written by hand (e.g. MIPS for glibc). (But being wrapped up in a library function, the strict-aliasing UB is dealt with by some means, perhaps as simple as just not being able to inline into other code that accesses that data a different way. If you wanted to do it yourself, you'd want a typedef with something like GNU C __attribute__((may_alias)). See the link earlier in this paragraph.)
You certainly don't want a bithack that's only going to check 4 bytes at a time, especially if unsigned long is an 8-byte type on a 64-bit CPU!
If you don't know the buffer length, use len = -1 in C11 / C++17
Use rawmemchr if available, otherwise use memchr(ptr, c, -1).
That's equivalent to passing SIZE_MAX.
See Is it legal to call memchr with a too-long length, if you know the character will be found before reaching the end of the valid region?
It's guaranteed not to read past the match, or at least to behave as if it didn't, i.e. not faulting. So not reading into the next page just like an optimized strlen, and probably for performance reasons not reading into the next cache line. (At least since C++17 / C11, according to cppreference, but real implementations have almost certainly been safe to use this way for longer, at least for performance reasons.)
The ISO C++ standard itself defers to the C standard for <cstring> functions; C++17 and later defer to C11, which added this requirement that C99 didn't have. (I also don't know if there are real-world implementations that violate that standard; I'd guess not and that it was more a matter of documenting a guarantee that real implementations were already doing.)
The POSIX man page for memchr guarantees stopping on a match; I don't know how far back this guarantee goes for POSIX systems.
Implementations shall behave as if they read the memory byte by byte from the beginning of the bytes pointed to by s and stop at the first occurrence of c (if it is found in the initial n bytes).
Without a guarantee like this, it's hypothetically possible for an implementation to just use unaligned loads starting with the address you pass it, as long as it's far enough from the ptr[size-1] end of the buffer you told it about. That's unlikely for performance reasons, though.
rawmemchr()
If you're on a GNU system, glibc has rawmemchr which assumes there will be a match, not taking a size limit. So it can loop exactly like strlen does, not having a 2nd exit condition based on either length or checking every byte for a 0 as well as the given character.
Fun fact: AArch64 glibc implements it as memchr(ptr, c, -1), or as strlen if the character happens to be 0. On other ISAs, it might actually duplicate the memchr code but without checking against the end of a buffer.
Glibc 2.37 will deprecate it, so apparently not a good idea for new code to switch to it now.
There was a similar question here, but the user in that question seemed to have a much larger array, or vector. If I have:
bool boolArray[4];
And I want to check if all elements are false, I can check [ 0 ], [ 1 ] , [ 2 ] and [ 3 ] either separately, or I can loop through it. Since (as far as I know) false should have value 0 and anything other than 0 is true, I thought about simply doing:
if ( *(int*) boolArray) { }
This works, but I realize that it relies on bool being one byte and int being four bytes. If I cast to (std::uint32_t) would it be OK, or is it still a bad idea? I just happen to have 3 or 4 bools in an array and was wondering if this is safe, and if not if there is a better way to do it.
Also, in the case I end up with more than 4 bools but less than 8 can I do the same thing with a std::uint64_t or unsigned long long or something?
As πάντα ῥεῖ noticed in comments, std::bitset is probably the best way to deal with that in UB-free manner.
std::bitset<4> boolArray {};
if(boolArray.any()) {
//do the thing
}
If you want to stick to arrays, you could use std::any_of, but this requires (possibly peculiar to the readers) usage of functor which just returns its argument:
bool boolArray[4];
if(std::any_of(std::begin(boolArray), std::end(boolArray), [](bool b){return b;}) {
//do the thing
}
Type-punning 4 bools to int might be a bad idea - you cannot be sure of the size of each of the types. It probably will work on most architectures, but std::bitset is guaranteed to work everywhere, under any circumstances.
Several answers have already explained good alternatives, particularly std::bitset and std::any_of(). I am writing separately to point out that, unless you know something we don't, it is not safe to type pun between bool and int in this fashion, for several reasons:
int might not be four bytes, as multiple answers have pointed out.
M.M points out in the comments that bool might not be one byte. I'm not aware of any real-world architectures in which this has ever been the case, but it is nevertheless spec-legal. It (probably) can't be smaller than a byte unless the compiler is doing some very elaborate hide-the-ball chicanery with its memory model, and a multi-byte bool seems rather useless. Note however that a byte need not be 8 bits in the first place.
int can have trap representations. That is, it is legal for certain bit patterns to cause undefined behavior when they are cast to int. This is rare on modern architectures, but might arise on (for example) ia64, or any system with signed zeros.
Regardless of whether you have to worry about any of the above, your code violates the strict aliasing rule, so compilers are free to "optimize" it under the assumption that the bools and the int are entirely separate objects with non-overlapping lifetimes. For example, the compiler might decide that the code which initializes the bool array is a dead store and eliminate it, because the bools "must have" ceased to exist* at some point before you dereferenced the pointer. More complicated situations can also arise relating to register reuse and load/store reordering. All of these infelicities are expressly permitted by the C++ standard, which says the behavior is undefined when you engage in this kind of type punning.
You should use one of the alternative solutions provided by the other answers.
* It is legal (with some qualifications, particularly regarding alignment) to reuse the memory pointed to by boolArray by casting it to int and storing an integer, although if you actually want to do this, you must then pass boolArray through std::launder if you want to read the resulting int later. Regardless, the compiler is entitled to assume that you have done this once it sees the read, even if you don't call launder.
You can use std::bitset<N>::any:
Any returns true if any of the bits are set to true, otherwise false.
#include <iostream>
#include <bitset>
int main ()
{
std::bitset<4> foo;
// modify foo here
if (foo.any())
std::cout << foo << " has " << foo.count() << " bits set.\n";
else
std::cout << foo << " has no bits set.\n";
return 0;
}
Live
If you want to return true if all or none of the bits set to on, you can use std::bitset<N>::all or std::bitset<N>::none respectively.
The standard library has what you need in the form of the std::all_of, std::any_of, std::none_of algorithms.
...And for the obligatory "roll your own" answer, we can provide a simple "or"-like function for any array bool[N], like so:
template<size_t N>
constexpr bool or_all(const bool (&bs)[N]) {
for (bool b : bs) {
if (b) { return b; }
}
return false;
}
Or more concisely,
template<size_t N>
constexpr bool or_all(const bool (&bs)[N]) {
for (bool b : bs) { if (b) { return b; } }
return false;
}
This also has the benefit of both short-circuiting like ||, and being optimised out entirely if calculable at compile time.
Apart from that, if you want to examine the original idea of type-punning bool[N] to some other type to simplify observation, I would very much recommend that you don't do that view it as char[N2] instead, where N2 == (sizeof(bool) * N). This would allow you to provide a simple representation viewer that can automatically scale to the viewed object's actual size, allow iteration over its individual bytes, and allow you to more easily determine whether the representation matches specific values (such as, e.g., zero or non-zero). I'm not entirely sure off the top of my head whether such examination would invoke any UB, but I can say for certain that any such type's construction cannot be a viable constant-expression, due to requiring a reinterpret cast to char* or unsigned char* or similar (either explicitly, or in std::memcpy()), and thus couldn't as easily be optimised out.
The code below generates a compiler warning:
private void test()
{
byte buffer[100];
for (int i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}
warning: comparison between signed and unsigned integer expressions
[-Wsign-compare]
This is because sizeof() returns a size_t, which is unsigned.
I have seen a number of suggestions for how to deal with this, but none with a preponderance of support and none with any convincing logic nor any references to support one approach as clearly "better." The most common suggestions seem to be:
ignore the warnings
turn off the warnings
use a loop variable of type size_t
use a loop variable of type size_t with tricks to avoid decrementing past zero
cast size_of(buffer) to an int
some extremely convoluted suggestions that I did not have the patience to follow because they involved unreadable code, generally involving vectors and/or iterators
libraries that I cannot load in the AVR / ARM embedded environments I often use.
free functions returning a valid int or long representing the byte count of T
Don't use loops (gotta love that advice)
Is there a "correct" way to approach this?
-- Begin Edit --
The example I gave is, of course, trivial, and meant only to demonstrate the type mismatch warning that can occur in an indexing situation.
#3 is not necessarily the obviously correct answer because size_t carries special risks in a decrementing loop such as
for (size_t i = myArray.size; i > 0; --i)
(the array may someday have a size of zero).
#4 is a suggestion to deal with decrementing size_t indexes by including appropriate and necessary checks to avoid ever decrementing past zero. Since that makes the code harder to read, there are some cute shortcuts that are not particularly readable, hence my referring to them as "tricks."
#7 is a suggestion to use libraries that are not generalizable in the sense that they may not be available or appropriate in every setting.
#8 is a suggestion to keep the checks readable, but to hide them in a non-member method, sometimes referred to as a "free function."
#9 is a suggestion to use algorithms rather than loops. This was offered many times as a solution to the size_t indexing problem, and there were a lot of upvotes. I include it even though I can't use the stl library in most of my environments and would have to write the code myself.
-- End Edit--
I am hoping for evidence-based guidance or references as to best practices for handling something like this. Is there a "standard text" or a style guide somewhere that addresses the question? A defined approach that has been adopted/endorsed internally by a major tech company? An emulatable solution forthcoming in a new language release? If necessary, I would be satisfied with an unsupported public recommendation from a single widely recognized expert.
None of the options on offer seem very appealing. The warnings drown out other things I want to see. I don't want to miss signed/unsigned comparisons in places where it might matter. Decrementing a loop variable of type size_t with comparison >=0 results in an infinite loop from unsigned integer wraparound, and even if we protect against that with something like for (size_t i = sizeof(buffer); i-->0 ;), there are other issues with incrementing/decrementing/comparing to size_t variables. Testing against size_t - 1 will yield a large positive 'oops' number when size_t is unexpectedly zero (e.g. strlen(myEmptyString)). Casting an unsigned size_t to an integer is a container size problem (not guaranteed a value) and of course size_t could potentially be bigger than an int.
Given that my arrays are of known sizes well below Int_Max, it seems to me that casting size_t to a signed integer is the best of the bunch, but it makes me cringe a little bit. Especially if it has to be static_cast<int>. Easier to take if it's hidden in a function call with some size testing, but still...
Or perhaps there's a way to turn off the warnings, but just for loop comparisons?
I find any of the three following approaches equally good.
Use a variable of type int to store the size and compare the loop variable to it.
byte buffer[100];
int size = sizeof(buffer);
for (int i = 0; i < size; ++i)
{
buffer[i] = 0;
}
Use size_t as the type of the loop variable.
byte buffer[100];
for (size_t i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
Use a pointer.
byte buffer[100];
byte* end = buffer + sizeof(buffer)
for (byte* p = buffer; p < end; ++p)
{
*p = 0;
}
If you are able to use a C++11 compiler, you can also use a range for loop.
byte buffer[100];
for (byte& b : buffer)
{
b = 0;
}
The most appropriate solution will depend entirely on context. In the context of the code fragment in your question the most appropriate action is perhaps to have type-agreement - the third option in your bullet list. This is appropriate in this case because the usage of i throughout the code is only to index the array - in this case the use of int is inappropriate - or at least unnecessary.
On the other hand if i were an arithmetic object involved in some arithmetic expression that was itself signed, the int might be appropriate and a cast would be in order.
I would suggest that as a guideline, a solution that involves the fewest number of necessary type casts (explicit of implicit) is appropriate, or to look at it another way, the maximum possible type agreement. There is not one "authoritative" rule because the purpose and usage of the variables involved is semantically rather then syntactically dependent. In this case also as has been pointed out in other answers, newer language features supporting iteration may avoid this specific issue altogether.
To discuss the advice you say you have been given specifically:
ignore the warnings
Never a good idea - some will be genuine semantic errors or maintenance issues, and by teh time you have several hundred warnings you are ignoring, how will you spot the one warning that is and issue?
turn off the warnings
An even worse idea; the compiler is helping you to improve your code quality and reliability. Why would you disable that?
use a loop variable of type size_t
In this precise example, that is exactly why you should do; exact type agreement should always be the aim.
use a loop variable of type size_t with tricks to avoid decrementing past zero
This advice is irrelevant for the trivial example given. Moreover I presume that by "tricks" the adviser in fact means checks or just correct code. There is no need for "tricks" and the term is entirely ambiguous - who knows what the adviser means? It suggests something unconventional and a bit "dirty", when there is not need for any solution with such attributes.
cast size_of(buffer) to an int
This may be necessary if the usage of i warrants the use of int for correct semantics elsewhere in the code. The example in the question does not, so this would not be an appropriate solution in this case. Essentially if making i a size_t here causes type agreement warnings elsewhere that cannot themselves be resolved by universal type agreement for all operands in an expression, then a cast may be appropriate. The aim should be to achieve zero warnings an minimum type casts.
some extremely convoluted suggestions that I did not have the patience to follow, generally involving vectors and/or iterators
If you are not prepared to elaborate or even consider such advice, you'd have better omitted the "advice" from your question. The use of STL containers in any case is not always appropriate to a large segment of embedded targets in any case, excessive code size increase and non-deterministic heap management are reasons to avoid on many platforms and applications.
libraries that I cannot load in an embedded environment.
Not all embedded environments have equal constraints. The restriction is on your embedded environment, not by any means all embedded environments. However the "loading of libraries" to resolve or avoid type agreement issues seems like a sledgehammer to crack a nut.
free functions returning a valid int or long representing the byte count of T
It is not clear what that means. What id a "free function"? Is that just a non-member function? Such a function would internally necessarily have a type case, so what have you achieved other than hiding a type cast?
Don't use loops (gotta love that advice).
I doubt you needed to include that advice in your list. The problem is not in any case limited to loops; it is not because you are using a loop that you have the warning, it is because you have used < with mismatched types.
My favorite solution is to use C++11 or newer and skip the whole manual size bounding entirely like so:
// assuming byte is defined by something like using byte = std::uint8_t;
void test()
{
byte buffer[100];
for (auto&& b: buffer)
{
b = 0;
}
}
Alternatively, if I can't use the ranged-based for loop (but still can use C++11 or newer), my favorite syntax becomes:
void test()
{
byte buffer[100];
for (auto i = decltype(sizeof(buffer)){0}; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}
Or for iterating backwards:
void test()
{
byte buffer[100];
// relies on the defined modwrap semantics behavior for unsigned integers
for (auto i = sizeof(buffer) - 1; i < sizeof(buffer); --i)
{
buffer[i] = 0;
}
}
The correct generic way is to use a loop iterator of type size_t. Simply because the is the most correct type to use for describing an array size.
There is not much need for "tricks to avoid decrementing past zero", because the size of an object can never be negative.
If you find yourself needing negative numbers to describe a variable size, it is probably because you have some special case where you are iterating across an array backwards. If so, the "trick" to deal with it is this:
for(size_t i=0; i<sizeof(array); i++)
{
size_t index = sizeof(array)-1 - i;
array[index] = something;
}
However, size_t is often an inconvenient type to use in embedded systems, because it may end up as a larger type than what your MCU can handle with one instruction, resulting in needlessly inefficient code. It may then be better to use a fixed width integer such as uint16_t, if you know the maximum size of the array in advance.
Using plain int in an embedded system is almost certainly incorrect practice. Your variables must be of deterministic size and signedness - most variables in an embedded system are unsigned. Signed variables also lead to major problems whenever you need to use bitwise operators.
If you are able to use C++ 11, you could use decltype to obtain the actual type of what sizeof returns, for instance:
void test()
{
byte buffer[100];
// On macOS decltype(sizeof(buffer)) returns unsigned long, this passes
// the compiler without warnings.
for (decltype(sizeof(buffer)) i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}
If there is for example a class that requires a pointer and a bool. For simplicity an int pointer will be used in examples, but the pointer type is irrelevant as long as it points to something whose size() is more than 1 .
Defining the class with { bool , int *} data members will result in the class having a size that is double the size of the pointer and a lot of wasted space
If the pointer does not point to a char (or other data of size(1)), then presumably the low bit will always be zero. The class could defined with {int *} or for convenience: union { int *, uintptr_t }
The bool is implemented by setting/clearing the low bit of the pointer as per the logical bool value and clearing the bit when you need to use the pointer.
The defined way:
struct myData
{
int * ptr;
bool flag;
};
myData x;
// initialize
x.ptr = new int;
x.flag = false;
// set flag true
x.flag = true;
// set flag false
x.flag = false;
// use ptr
*(x.ptr)=7;
// change ptr
x = y; // y is another int *
And the proposed way:
union tiny
{
int * ptr;
uintptr_t flag;
};
tiny x;
// initialize
x.ptr = new int;
// set flag true
x.flag |= 1;
// set flag false
x.flag &= ~1;
// use ptr
tiny clean=x; // note that clean will likely be optimized out
clean.flag &= ~1; // back to original value as assigned to ptr
*(clean.ptr)=7;
// change ptr
bool flag=x.flag;
x.ptr = y; // y is another int *
x.flag |= flag;
This seems to be undefined behavior, but how portable is this?
As long as you restore the pointer's low-order bit before trying to use it as a pointer, it's likely to be "reasonably" portable, as long as your system, your C++ implementation, and your code meet certain assumptions.
I can't necessarily give you a complete list of assumptions, but off the top of my head:
It assumes you're not pointing to anything whose size is 1 byte. This excludes char, unsigned char, signed char, int8_t, and uint8_t. (And that assumes CHAR_BIT == 8; on exotic systems with, say, 16-bit or 32-bit bytes, other types might be excluded.)
It assumes objects whose size is at least 2 bytes are always aligned at an even address. Note that x86 doesn't require this; you can access a 4-byte int at an odd address, but it will be slightly slower. But compilers typically arrange for objects to be stored at even addresses. Other architectures may have different requirements.
It assumes a pointer to an even address has its low-order bit set to 0.
For that last assumption, I actually have a concrete counterexample. On Cray vector systems (J90, T90, and SV1 are the ones I've used myself) a machine address points to a 64-bit word, but the C compiler under Unicos sets CHAR_BIT == 8. Byte pointers are implemented in software, with the 3-bit byte offset within a word stored in the otherwise unused high-order 3 bits of the 64-bit pointer. So a pointer to an 8-byte aligned object could have easily its low-order bit set to 1.
There have been Lisp implementations (example) that use the low-order 2 bits of pointers to store a type tag. I vaguely recall this causing serious problems during porting.
Bottom line: You can probably get away with it for most systems. Future architectures are largely unpredictable, and I can easily imagine your scheme breaking on the next Big New Thing.
Some things to consider:
Can you store the boolean values in a bit vector outside your class? (Maintaining the association between your pointer and the corresponding bit in the bit vector is left as an exercise).
Consider adding code to all pointer operations that fails with an error message if it ever sees a pointer with its low-order bit set to 1. Use #ifdef to remove the checking code in your production version. If you start running into problems on some platform, build a version of your code with the checks enabled and see what happens.
I suspect that, as your application grows (they seldom shrink), you'll want to store more than just a bool along with your pointer. If that happens, the space issue goes away, because you're already using that extra space anyway.
In "theory": it's undefined behavior as far as I know.
In "reality": it'll work on everyday x86/x64 machines, and probably ARM too?
I can't really make a statement beyond that.
It's very portable, and furthermore, you can assert when you accept the raw pointer to make sure it meets the alignment requirement. This will insure against the unfathomable future compiler that somehow messes you up.
Only reasons not to do it are the readability cost and general maintenance associated with "hacky" stuff like that. I'd shy away from it unless there's a clear gain to be made. But it is sometimes totally worth it.
Conform to those rules and it should be very portable.
I've seen very often array iterations using plain pointer arithmetic even in newer C++ code. I wonder how safe they really are and if it's a good idea to use them. Consider this snippet (it compiles also in C if you put calloc in place of new):
int8_t *buffer = new int8_t[16];
for (int8_t *p = buffer; p < buffer + 16; p++) {
...
}
Wouldn't this kind of iteration result in an overflow and the loop being skipped completely when buffer happens to become allocated at address 0xFFFFFFF0 (in a 32 bit address space) or 0xFFFFFFFFFFFFFFF0 (64 bit)?
As far as I know, this would be an exceptionally unlucky, but still possible circumstance.
This is safe. The C and C++ standards explicitly allow you to calculate a pointer value that points one item beyond the end of an array, and to compare a pointer that points within the array to that value.
An implementation that had an overflow problem in the situation you describe would simply not be allowed to place an array right at the end of memory like that.
In practice, a more likely problem is buffer + 16 comparing equal to NULL, but this is not allowed either and again a conforming implementation would need to leave an empty place following the end of the array.