stl map.find perform differently in debug and release using vs2010

stl map.find perform differently in debug and release using vs2010 - c++

I am using stl map to store flow information extracted from pcap files. When a packet comes, I use map.find to find if the flow the packet belongs to exist. I have to use map.find twice since the packet from A to B and the packet from B to A belongs to the same flow.
struct FiveTuple
{
unsigned short source_port;
unsigned short dest_port;
unsigned int source_ip_addr;
unsigned int dest_ip_addr;
unsigned char transport_proto_type;
};
The FiveTuple identifies a flow. I use the FiveTuple as the key element in map.
map is map< FiveTuple, Flow, FlowCmp>, where FlowCmp is a struct using memcmp to see if FiveTuple a is less than FiveTuple b, just like operator<.
To find whether the flow of the packet exists, I wrote code as follows where m is the name of the map and five_tuple is a FiveTuple with information extracted from the packet:
auto it = m.find(five_tuple);
if( it == m.end())
{
//swap source and dest ip/port in five_tuple,
it = m.find(five_tuple);
if(it == m.end())
{
//do something
}
}
In the debug version in vs2010, the result is reasonable. When I changed it to the release version, I found that instead of returning the right iterator, the second m.find just gave me m.end most of the time. And I found that there are no initialization problems. How to fix the release version problem?

Seems like you are doing memcmp() on FiveTuple objects. That is undefined behaviour because FiveTuple contains trailing garbage bytes. These trailing garbage bytes are different in the debug version and the release version, so you get different results. You should rewrite FlowCmp so that it doesn't use memcmp().
This is a guess based on the limited information provided, but if you want to test it out try cout << sizeof(FiveTuple);. I bet you'll see that sizeof(FiveTuple) > sizeof(short) + sizeof(short) + sizeof(int) + sizeof(int) + sizeof(char). In other words there's garbage in your struct and you shouldn't use memcmp.
Of course memcmp is bad for another reason because it means your code will be non-portable because it's behaviour will depend on the endianess of your platform. That in itself is good enough reason not to use memcmp for this purpose.

Related

Why bitset order looks like reversing per byte

I'm having trouble when I want to read binary file into bitset and process it.
std::ifstream is("data.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg(0, is.end);
int length = is.tellg();
is.seekg(0, is.beg);
char *buffer = new char[length];
is.read(buffer, length);
is.close();
const int k = sizeof(buffer) * 8;
std::bitset<k> tmp;
memcpy(&tmp, buffer, sizeof(buffer));
std::cout << tmp;
delete[] buffer;
}
int a = 5;
std::bitset<32> bit;
memcpy(&bit, &a, sizeof(a));
std::cout << bit;
I want to get {05 00 00 00} (hex memory view), bitset[0~31]={00000101 00000000 00000000 00000000} but I get bitset[0~31]={10100000 00000000 00000000 00000000}

You need to learn how to crawl before you can crawl on broken glass.
In short, computer memory is an opaque box, and you should stop making assumptions about it.
Hyrum's law is the stupidest thing that has ever existed and if you stopped proliferating this cancer, that would be great.
What I'm about to write is common sense to every single competent C++ programmer out there. As trivial as breathing, and as important as breathing. It should be included in every single copy of C++ book ever, hammered into the heads of new programmers as soon as possible, but for some undefined reason, isn't.
The only things you can rely on when it comes to what I'm going to loosely define as "memory" is bits of a byte never being out of order. std::byte is such a type, and before it was added to the standard, we used unsigned char, they are more or less interchangeable, but you should prefer std::byte whenever you can.
So, what do I mean by this?
std::byte a = 0b10101000;
assert(((a >> 3) & 1) == 1); // always true
That's it, everything else is up to the compiler, your machine architecture and stars in the sky.
Oh, what, you thought you can just write int a = 0b1010100000000010; and expect something good? I'm sorry, but that's just not how things work in these savage lands. If you expect any order here, you will have to split it into bytes yourself, you cannot just cast this into std::byte bytes[2] and expect bytes[0] == 0b10101000. It is NEVER correct to assume anything here, if you do, one day your code will break, and by the time you realize that it's broken it will be too late, because it will be yet another undebuggable 30 million line of code legacy codebase half of which is only available in proprietary shared objects that we didn't have source code of since 1997. Goodluck.
So, what's the correct way? Luckily for us, binary shifts are architecture independent. int is guaranteed to be no smaller than 2 bytes, so that's the only thing this example relies on, but most machines have sizeof (int) == 4. If you needed more bytes, or exact number of bytes, you should be using appropriate type from Fixed width integer types.
int a = 0b1010100000000010;
std::byte bytes[2]; // always correct
// std::byte bytes[4]; // stupid assumption by inexperienced programmers
// std::byte bytes[sizeof (a)]; // flexible solution that needs more work
// we think in terms of 8 bits, don't care about the rest
bytes[0] = a & 0xFF;
// we need to skip possibly more than 8 bits to access next 8 bits however
bytes[1] = (a >> CHAR_BIT) & 0xFF;
This is the only absolutely correct way to convert sizeof (T) > 1 into array of bytes and if you see anything else then it's without a doubt subpar implementation that will stop working the moment you change a compiler and/or machine architecture.
The reverse is true too, you need to use binary shifts to convert a byte array to a size bigger than 1 byte.
On top of that, this only applies to primitive types. int, long, short... Sometimes you can rely on it working correctly with float or double as long as you always need IEEE 754 and will never need a machine so old or bizarre that it doesn't support IEEE 754. That's it.
If you think really long and hard, you may realize that this is no different from structs.
struct x {
int a;
int b;
};
What can we rely on? Well, we know that x will have address of a. That's it. If we want to set b, we need to access it by x.b, every other assumption is ALWAYS wrong with no ifs or buts. The only exception is if you wrote your own compiler and you are using your own compiler, but you're ignoring the standard and at that point anything is possible; that's fine, but it's not C++ anymore.
So, what can we infer from what we know now? Array of bytes cannot be just memcpy'd into a std::bitset. You don't know its implementation and you cannot know its implementation, it may change tomorrow and if your code breaks because of that then it's wrong and you're a failure of a programmer.
Want to convert an array of bytes to a bitset? Then go ahead and iterate over every single bit in the byte array and set each bit of the bitset however you need it to be, that's the only correct and sane solution. Every other solution is objectively wrong, now, and forever. Until someone decides to say otherwise in C++ standard. Let's just hope that it will never happen.

C++ casting a struct to std::vector<char> memory alignment

I'm trying to cast a struct into a char vector.
I wanna send my struct casted in std::vector throw a UDP socket and cast it back on the other side. Here is my struct whith the PACK attribute.
#define PACK( __Declaration__ ) __pragma( pack(push, 1) ) __Declaration__ __pragma( pack(pop) )
PACK(struct Inputs
{
uint8_t structureHeader;
int16_t x;
int16_t y;
Key inputs[8];
});
Here is test code:
auto const ptr = reinterpret_cast<char*>(&in);
std::vector<char> buffer(ptr, ptr + sizeof in);
//send and receive via udp
Inputs* my_struct = reinterpret_cast<Inputs*>(&buffer[0]);
The issue is:
All works fine except my uint8_t or int8_t.
I don't know why but whenever and wherever I put a 1Bytes value in the struct,
when I cast it back the value is not readable (but the others are)
I tried to put only 16bits values and it works just fine even with the
maximum values so all bits are ok.
I think this is something with the alignment of the bytes in the memory but i can't figure out how to make it work.
Thank you.

I'm trying to cast a struct into a char vector.
You cannot cast an arbitrary object to a vector. You can cast your object to an array of char and then copy that array into a vector (which is actually what your code is doing).
auto const ptr = reinterpret_cast<char*>(&in);
std::vector<char> buffer(ptr, ptr + sizeof in);
That second line defines a new vector and initializes it by copying the bytes that represent your object into it. This is reasonable, but it's distinct from what you said you were trying to do.
I think this is something with the alignment of the bytes in the memory
This is good intuition. If you hadn't told the compiler to pack the struct, it would have inserted padding bytes to ensure each field starts at its natural alignment. The fact that the operation isn't reversible suggests that somehow the receiving end isn't packed exactly the same way. Are you sure the receiving program has exactly the same packing directive and struct layout?
On x86, you can get by with unaligned data, but you may pay a large performance cost whenever you access an unaligned member variable. With the packing set to one, and the first field being odd-sized, you've guaranteed that the next fields will be unaligned. I'd urge you to reconsider this. Design the struct so that all the fields fall at their natural alignment boundaries and that you don't need to adjust the packing. This may make your struct a little bigger, but it will avoid all the alignment and performance problems.
If you want to omit the padding bytes in your wire format, you'll have to copy the relevant fields byte by byte into the wire format and then copy them back out on the receiving end.
An aside regarding:
#define PACK( __Declaration__ ) __pragma( pack(push, 1) ) __Declaration__ __pragma( pack(pop) )
Identifiers that begin with underscore and a capital letter or with two underscores are reserved for "the implementation," so you probably shouldn't use __Declaration__ as the macro's parameter name. ("The implementation" refers to the compiler, the standard library, and any other runtime bits the compiler requires.)

1
vector class has dynamically allocated memory and uses pointers inside. So you can't send the vector (but you can send the underlying array)
2
SFML has a great class for doing this called sf::packet. It's free, open source, and cross-platform.
I was recently working on a personal cross platform socket library for use in other personal projects and I eventually quit it for SFML. There's just TOO much to test, I was spending all my time testing to make sure stuff worked and not getting any work done on the actual projects I wanted to do.
3
memcpy is your best friend. It is designed to be portable, and you can use that to your advantage.
You can use it to debug. memcpy the thing you want to see into a char array and check that it matches what you expect.
4
To save yourself from having to do tons of robustness testing, limit yourself to only chars, 32-bit integers, and 64-bit doubles. You're using different compilers? struct packing is compiler and architecture dependent. If you have to use a packed struct, you need to guarantee that the packing is working as expected on all platforms you will be using, and that all platforms have the same endianness. Obviously, that's what you're having trouble with and I'm sorry I can't help you more with that. I would I would recommend regular serializing and would definitely avoid struct packing if I was trying to make portable sockets.
If you can make those guarantees that I mentioned, sending is really easy on LINUX.
// POSIX
void send(int fd, Inputs& input)
{
int error = sendto(fd, &input, sizeof(input), ..., ..., ...);
...
}
winsock2 uses a char* instead of a void* :(
void send(int fd, Inputs& input)
{
char buf[sizeof(input)];
memcpy(buf, &input, sizeof(input));
int error = sendto(fd, buf, sizeof(input), ..., ..., ...);
...
}

Did you tried the most simple approach of:
unsigned char *pBuff = (unsigned char*)&in;
for (unsigned int i = 0; i < sizeof(Inputs); i++) {
vecBuffer.push_back(*pBuff);
pBuff++;
}
This would work for both, pack and non pack, since you will iterate the sizeof.

c++ best way to compare byte array to struct

I need help. I have an unsigned char * and say I have a struct
struct{
int a=3;
char b='d';
double c=3.14;
char d='e';
} cmp;
unsigned char input[1000];
l= recv(sockfd,input , sizeof(cmp),0);
I want to compare cmp and input. What is the fastest way?
Thanks a lot in advance.

If the compiler guarantees that there are no gaps between fields in the struct (usually happen due to packing) or you can use a #pragna to cancel any such gaps, then you can compare by either:
memcmp(&cmp, input, sizeof(stuct ThesSruct));
Or, my preferred:
cmp == *(struct TheStruct *)input // provided the struct doesn't contain pointers.
But a much safer way would be to compare it on a field by field basis. And even more, prepare special functions for extracting ints, floats, etc.. from the raw input. For example, extracting an int at index n may be as simple as
*(int *)&input[n]
But it might be more complicated, like shifting chars at 8, 16, 24 bits.
In short, accessing the communication data must be done with the most robust way, checking every basic element and not assuming anything.

Give reinterpret_cast a try. This will allow you to arbitrarily cast the char * to a cmp *
http://msdn.microsoft.com/en-us/library/e0w9f63b.aspx

In the general case James Kantzes comment is correct, you can't compare like that. This is , among other things, due to byte padding.
However in the specific case with the following assumptions;
The sender is on the same cpu architecture as the receiver
The sender is using the same compiler and linker as the receiver
The applications are compiled with the same compiler/linker flags
...other things...you get the gist.
The sender is sending it straight from the struct.
cmp c{ ...set variables... };
send(sockfd, (char*)&c, sizeof(c));
So in short, this is a very brittle way of transporting structs and you shouldn't do it for anything except simple tests or quick hacks.

Most efficient way to read UInt32 from any memory address?

What would be the most efficient way to read a UInt32 value from an arbitrary memory address in C++? (Assuming Windows x86 or Windows x64 architecture.)
For example, consider having a byte pointer that points somewhere in memory to block that contains a combination of ints, string data, etc., all mixed together. The following sample shows reading the various fields from this block in a loop.
typedef unsigned char* BytePtr;
typedef unsigned int UInt32;
...
BytePtr pCurrent = ...;
while ( *pCurrent != 0 )
{
...
if ( *pCurrent == ... )
{
UInt32 nValue = *( (UInt32*) ( pCurrent + 1 ) ); // line A
...
}
pCurrent += ...;
}
If at line A, pPtr happens to contain a 4-byte-aligned address, reading the UInt32 should be a single memory read. If pPtr contains a non-aligned address, more than one memory cycles my be needed which slows the code down. Is there a faster way to read the value from non-aligned addresses?

I'd recommend memcpy into a temporary of type UInt32 within your loop.
This takes advantage of the fact that a four byte memcpy will be inlined by the compiler when building with optimization enabled, and has a few other benefits:
If you are on a platform where alignment matters (hpux, solaris sparc, ...) your code isn't going to trap.
On a platform where alignment matters there it may be worthwhile to do an address check for alignment then one of a regular aligned load or a set of 4 byte loads and bit ors. Your compiler's memcpy very likely will do this the optimal way.
If you are on a platform where an unaligned access is allowed and doesn't hurt performance (x86, x64, powerpc, ...), you are pretty much guarenteed that such a memcpy is then going to be the cheapest way to do this access.
If your memory was initially a pointer to some other data structure, your code may be undefined because of aliasing problems, because you are casting to another type and dereferencing that cast. Run time problems due to aliasing related optimization issues are very hard to track down! Presuming that you can figure them out, fixing can also be very hard in established code and you may have to use obscure compilation options like -fno-strict-aliasing or -qansialias, which can limit the compiler's optimization ability significantly.

Your code is undefined behaviour.
Pretty much the only "correct" solution is to only read something as a type T if it is a type T, as follows:
uint32_t n;
char * p = point_me_to_random_memory();
std::copy(p, p + 4, reinterpret_cast<char*>(&n));
std::cout << "The value is: " << n << std::endl;
In this example, you want to read an integer, and the only way to do that is to have an integer. If you want it to contain a certain binary representation, you need to copy that data to the address starting at the beginning of the variable.

Let the compiler do the optimizing!
UInt32 ReadU32(unsigned char *ptr)
{
return static_cast<UInt32>(ptr[0]) |
(static_cast<UInt32>(ptr[1])<<8) |
(static_cast<UInt32>(ptr[2])<<16) |
(static_cast<UInt32>(ptr[3])<<24);
}

Algorithm for determining Alignment of elements in C/C++ structs

Okay, Allow me to re-ask the question, as none of the answers got at what I was really interested in (apologies if whole-scale editing of the question like this is a faux-paus).
A few points:
This is offline analysis with a different compiler than the one I'm testing, so SIZEOF() or similar won't work for what I'm doing.
I know it's implementation-defined, but I happen to know the implementation that is of interest to me, which is below.
Let's make a function called pack, which takes as input an integer, called alignment, and a tuple of integers, called elements. It outputs another integer, called size.
The function works as follows:
int pack (int alignment, int[] elements)
{
total_size = 0;
foreach( element in elements )
{
while( total_size % min(alignment, element) != 0 ) { ++total_size; }
total_size += element;
}
while( total_size % packing != 0 ) { ++total_size; }
return total_size;
}
I think what I want to ask is "what is the inverse of this function?", but I'm not sure whether inversion is the correct term--I don't remember ever dealing with inversions of functions with multiple inputs, so I could just be using a term that doesn't apply.
Something like what I want (sort of) exists; here I provide pseudo code for a function we'll call determine_align. The function is a little naive, though, as it just calls pack over and over again with different inputs until it gets an answer it expects (or fails).
int determine_align(int total_size, int[] elements)
{
for(packing = 1,2,4,...,64) // expected answers.
{
size_at_cur_packing = pack(packing, elements);
if(actual_size == size_at_cur_packing)
{
return packing;
}
}
return unknown;
}
So the question is, is there a better implementation of determine_align?
Thanks,

Alignment of struct members in C/C++ is entirely implementation-defined. There are a few guarantees there, but I don't see how they would help you.
Thus, there's no generic way to do what you want. In the context of a particular implementation, you should refer to the documentation of that implementation that covers this (if it is covered).

When choosing how to pack members into a struct an implementation doesn't have to follow the sort of scheme that you describe in your algorithm although it is a common one. (i.e. minimum of sizeof type being aligned and preferred machine alignment size.)
You don't have to compare overall size of a struct to determine the padding that has been applied to individual struct members, though. The standard macro offsetof will give the byte offset from the start of the struct of any individual struct member.

I let the compiler do the alignment for me.
In gcc,
typedef struct _foo
{
u8 v1 __attribute__((aligned(4)));
u16 v2 __attribute__((aligned(4)));
u32 v3 __attribute__((aligned(8)));
u8 v1 __attribute__((aligned(4)));
} foo;
Edit: Note that sizeof(foo) will return the correct value including any padding.
Edit2: And offsetof(foo, v2) also works. Given these two functions/macros, you can figure out everything you need to know about the layout of the struct in memory.

I'm honestly not sure what you're trying to do, and I'm probably completely misunderstanding what you're looking for, but if you want to simply determine what the alignment requirement of a struct is, the following macro might be helpful:
#define ALIGNMENT_OF( t ) offsetof( struct { char x; t test; }, test )
To determine the alignment of your foo structure, you can do:
ALIGNMENT_OF( foo);
If this isn't what you're ultimately tring to do, it might be possible that the macro might help in whatever algorithm you do come up with.

You need to pad based on the alignment of the next field and then pad the last element based on the maximum alignment you've seen in the struct. Note that the actual alignment of a field is the minimum of its natural alignment and the packing for that struct. I.e., if you have a struct packed at 4 bytes, a double will be aligned to 4 bytes, even though its natural alignment is 8.
You can make your inner loop faster with total_size+= total_size % min(packing, element.size); You can optimize it further if packing and element.size is a power of two.

If the problem is just that you want to guarantee a particular alignment, that is easy. For a particular alignment=2^n:
void* p = malloc( sizeof( _foo ) + alignment -1 );
p = (void*) ( ( (char*)(p) + alignment - 1 ) & ~alignment );
I've neglected to save to original p returned from malloc. If you intend to free this memory, you need to save that pointer somewhere.

I'm not sure what you want to achieve here. As Pavel Minaev said, alignment is handled by a compiler which in turn is constrained by a platform's Application Binary Interface for data that is made accessible to code compiled by a different compiler. The following paper discusses the problem in the context of a compiler that needs to implement calling conventions:
Christian Lindig and Norman Ramsey. Declarative Composition of Stack Frames. In Evelyn Duesterwald, editors, Proc. of the 14th International Conference on Compiler Construction, Springer, LNCS 2985, 2004.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js