memcpy with initialized variable and negative numbers with cast - c++

I have
QByteArray bytes // Fullfilled earlier
char id_c = bytes[7];
int _id;
_id = 0; // If I comment this result would be different
memcpy(&_id, &id_c, 1);
int result = _id;
I have _id variable and if I comment "_id=0" result variable result would be different with negative number. Why? Why initializing _id with 0 would be different?!
How can I do this alternatively with same result as using "_id=0" but without memcpy and unwanted castings?
This is not my code. I am interested how to get same result correctly without stupid castings.

Correct.
Because this statement:
memcpy(&_id, &id_c, 1);
Is only copying a single byte from &id_c into an address representing a 4-byte integer, &_id. Only the first byte of memory occupied by _id gets anything copied into it. Without the zero init of _id first, the remaining three bytes of that value are left undefined (presumably random garbage values off the stack).
What's wrong with an "unwanted casting"? This is just as fine and the compiler generates the most efficient code.
QByteArray bytes // Fullfilled earlier
int _id = (int)(bytes[7]);
int result = _id;
If you want sign extended result of the unsigned byte copied into _id, then this:
int _id = (signed char)(bytes[7]);

_id = 0 is called assigning 0 value to the variable _id, if you comment that then we cannot be sure what is stored in that _id , and you are updating only one byte out of that, as it is of type int it is more than one byte in size.

You might try these net/host byte order conversions:
on linux
on windows
the only difference is the header file to use; You can use preprocessor tricks to determine the platform and choose the proper header if cross-platform programming is intended. A better approach is to use the C++20 feature std::endian. But you need to handle the conversion yourself:
#include <bit>
#include <climits>
int int_cvt(int x){
if constexpr (endian::native==endian::big)
return x;
y=0;
while(x){
unsigned char c=x;
x>>=std::CHAR_BIT;
y<<=std::CHAR_BIT;
y+=c;
};
return y;
};
cheers,
FM.

Related

Why hash values are same for string?

I want to calculate a hash of the structure passing as string. Although vlanId values are different, the hash value is still the same. The StringHash() funtion calculates the values of the hash. I haven't assigned any value to portId and vsi.
#include<stdio.h>
#include <functional>
#include <cstring>
using namespace std;
unsigned long StringHash(unsigned char *Arr)
{
hash<string> str_hash;
string Str((const char *)Arr);
unsigned long str_hash_value = str_hash(Str);
printf("Hash=%lu\n", str_hash_value);
return str_hash_value;
}
typedef struct
{
unsigned char portId;
unsigned short vlanId;
unsigned short vsi;
}VlanConfig;
int main()
{
VlanConfig v1;
memset(&v1,0,sizeof(VlanConfig));
unsigned char *index = (unsigned char *)&v1 + sizeof(unsigned char);
v1.vlanId = 10;
StringHash(index);
StringHash((unsigned char *)&v1);
v1.vlanId = 12;
StringHash(index);
StringHash((unsigned char *)&v1);
return 0;
}
Output:
Hash=6142509188972423790
Hash=6142509188972423790
Hash=6142509188972423790
Hash=6142509188972423790
You pass the bytes of your structure to a function expecting a zero terminated string. Well, the first byte of your structure already is zero, so you calculate the same hash every time.
Now, that is the explanation why, but not the solution to your problem. Passing a random sequence of bytes to a function expecting a zero-terminated sequence of characters is going to fail spectacularly, no matter how you do it.
Find another way to hash your structure. You are already using hash<>, why not use it for your case:
namespace std
{
template<> struct hash<VlanConfig>
{
std::size_t operator()(VlanConfig const& c) const noexcept
{
std::size_t h1 = std::hash<char>{}(c.portId);
std::size_t h2 = std::hash<short>{}(c.vlanId);
std::size_t h3 = std::hash<short>{}(c.vsi);
return h1 ^ (h2 << 1) ^ (h3 << 2); // or use boost::hash_combine
}
};
}
Then you can do this:
VlanConfig myVariable;
// fill myVariable
std::cout << std::hash<VlanConfig>{}(myVariable) << std::endl;
I can't say for certain, but most likely your issue is structure padding. Unless explicietly set ot pack members and ignore alignment, most compilers will set up the struct as follows:
Byte 0: portId
Byte 1: padding
Bytes 2,3: vlanId
Bytes 4,5: vsi
So when you figure the address of index, it'll point to the padding byte, which is always zero. Thus you're always hashing an empty string.
You should be able to check this in a debugger by inspecting index and comparing it to the address of vlanId.
-- Edit --
After giving this some more thought, I have to say that in my extremely humble opinion, this isn't a good way to get a hash value. Trying to treat several numeric values that might, or might not, be contiguous in memory as a std::string, has too many possibilities for error.
Start with the fact that even if you do get the address correct, consider what happens when you hash two different configurations, one of which has vlanId set to 256, while the other has it set to 512. Assuming a little endian machine, both of those will have a zero byte as the first character of the string, and so you're right back here again.
Worse yet is the case when all four bytes in vlanId and vsi are non zero. In that case, you'll read right off the end of your struct, and keep on going, reading who knows what. There's no way that's going to end well.
One possible solution is to figure the size of data, and use the following ctor for std::string: string (char const *s, size_t n); which has the advantage of forcing the string to exactly the size you want.

Taking an index out of const char* argument

I have the following code:
int some_array[256] = { ... };
int do_stuff(const char* str)
{
int index = *str;
return some_array[index];
}
Apparently the above code causes a bug in some platforms, because *str can in fact be negative.
So I thought of two possible solutions:
Casting the value on assignment (unsigned int index = (unsigned char)*str;).
Passing const unsigned char* instead.
Edit: The rest of this question did not get a treatment, so I moved it to a new thread.
The signedness of char is indeed platform-dependent, but what you do know is that there are as many values of char as there are of unsigned char, and the conversion is injective. So you can absolutely cast the value to associate a lookup index with each character:
unsigned char idx = *str;
return arr[idx];
You should of course make sure that the arr has at least UCHAR_MAX + 1 elements. (This may cause hilarious edge cases when sizeof(unsigned long long int) == 1, which is fortunately rare.)
Characters are allowed to be signed or unsigned, depending on the platform. An assumption of unsigned range is what causes your bug.
Your do_stuff code does not treat const char* as a string representation. It uses it as a sequence of byte-sized indexes into a look-up table. Therefore, there is nothing wrong with forcing unsigned char type on the characters of your string inside do_stuff (i.e. use your solution #1). This keeps re-interpretation of char as an index localized to the implementation of do_stuff function.
Of course, this assumes that other parts of your code do treat str as a C string.

wchar_t* to short int conversion

One of the function in a 3rd party class return awchar_t* that holding a resource id (I don't know why it uses wchar_t* type ) I need to convert this pointer to short int
This method, using AND operator works for me. but it seems like not the correct way. is there any proper way to do this?
wchar_t* s;
short int b = (unsigned long)(s) & 0xFFFF;
wchar_t* s; // I assume this is what you meant
short int b = static_cast<short int>(reinterpret_cast<intptr_t>(s))
You could also replace short int b with auto b, and it will be deduced as short int from the type of the right-hand expression.
It returns the resource ID as a wchar_t* because that is the data type that Windows uses to carry resource identifiers. Resources can be identified by either numeric ID or by name. If numeric, the pointer itself contains the actual ID number encoded in its lower 16 bits. Otherwise it is a normal pointer to a null-terminated string elsewhere in memory. There is an IS_INTRESOURCE() macro to differentiate which is the actual case, eg:
wchar_t *s = ...;
if (IS_INTRESOURCE(s))
{
// s is a numeric ID...
WORD b = (WORD) s;
...
}
else
{
// s is a null-terminated name string
...
}
Did you mean in your code wchar_t *s;?
I'd do the conversion more explicit using
short int b = reinterpret_cast<short int>(s);
If it fits your application needs, I suggest using a data type with a fixed nr of bits, e.g. uint16_t. Using short int means you only know for sure your variable has at least 16 bits. An additional question: Why do you not use unsigned short int, instead of (signed) short int?
In general, knowing the exact nr of bits make things a little more predictable, and makes it easier to know exactly what happens when you cast or use bitmasks.

Python's struct.pack/unpack equivalence in C++

I used struct.pack in Python to transform a data into serialized byte stream.
>>> import struct
>>> struct.pack('i', 1234)
'\xd2\x04\x00\x00'
What is the equivalence in C++?
You'll probably be better off in the long run using a third party library (e.g. Google Protocol Buffers), but if you insist on rolling your own, the C++ version of your example might be something like this:
#include <stdint.h>
#include <string.h>
int32_t myValueToPack = 1234; // or whatever
uint8_t myByteArray[sizeof(myValueToPack)];
int32_t bigEndianValue = htonl(myValueToPack); // convert the value to big-endian for cross-platform compatibility
memcpy(&myByteArray[0], &bigEndianValue, sizeof(bigEndianValue));
// At this point, myByteArray contains the "packed" data in network-endian (aka big-endian) format
The corresponding 'unpack' code would look like this:
// Assume at this point we have the packed array myByteArray, from before
int32_t bigEndianValue;
memcpy(&bigEndianValue, &myByteArray[0], sizeof(bigEndianValue));
int32_t theUnpackedValue = ntohl(bigEndianValue);
In real life you'd probably be packing more than one value, which is easy enough to do (by making the array size larger and calling htonl() and memcpy() in a loop -- don't forget to increase memcpy()'s first argument as you go, so that your second value doesn't overwrite the first value's location in the array, and so on).
You'd also probably want to pack (aka serialize) different data types as well. uint8_t's (aka chars) and booleans are simple enough as no endian-handling is necesary for them -- you can just copy each of them into the array verbatim as a single byte. uint16_t's you can convert to big-endian via htons(), and convert back to native-endian via ntohs(). Floating point values are a bit tricky, since there is no built-in htonf(), but you can roll your own that will work on IEEE754-compliant machines:
uint32_t htonf(float f)
{
uint32_t x;
memcpy(&x, &f, sizeof(float));
return htonl(x);
}
.... and the corresponding ntohf() to unpack them:
float ntohf(uint32_t nf)
{
float x;
nf = ntohl(nf);
memcpy(&x, &nf, sizeof(float));
return x;
}
Lastly for strings you can just add the bytes of the string to the buffer (including the NUL terminator) via memcpy:
const char * s = "hello";
int slen = strlen(s);
memcpy(myByteArray, s, slen+1); // +1 for the NUL byte
There isn't one. C++ doesn't have built-in serialization.
You would have to write individual objects to a byte array/vector, and being careful about endianness (if you want your code to be portable).
https://github.com/karkason/cppystruct
#include "cppystruct.h"
// icmp_header can be any type that supports std::size and std::data and holds bytes
auto [type, code, checksum, p_id, sequence] = pystruct::unpack(PY_STRING("bbHHh"), icmp_header);
int leet = 1337;
auto runtimePacked = pystruct::pack(PY_STRING(">2i10s"), leet, 20, "String!");
// runtimePacked is an std::array filled with "\x00\x00\x059\x00\x00\x00\x10String!\x00\x00\x00"
// The format is "compiled" and has zero overhead in runtime
constexpr auto packed = pystruct::pack(PY_STRING("<2i10s"), 10, 20, "String!");
// packed is an std::array filled with "\x00\x01\x00\x00\x10\x00\x00\x00String!\x00\x00\x00"
You could check out Boost.Serialization, but I doubt you can get it to use the same format as Python's pack.
I was also looking for the same thing. Luckily I found https://github.com/mpapierski/struct
with a few additions you can add missing types into struct.hpp, I think it's the best so far.
To use it, just define you params like this
DEFINE_STRUCT(test,
((2, TYPE_UNSIGNED_INT))
((20, TYPE_CHAR))
((20, TYPE_CHAR))
)
The just call this function which will be generated at compilation
pack(unsigned int p1, unsigned int p2, const char * p3, const char * p4)
The number and type of parameters will depend on what you defined above.
The return type is a char* which contains your packed data.
There is also another unpack() function which you can use to read the buffer
You can use union to get different view into the same memory.
For example:
union Pack{
int i;
char c[sizeof(int)];
};
Pack p = {};
p.i = 1234;
std::string packed(p.c, sizeof(int)); // "\xd2\x04\x00\0"
As mentioned in the other answers, you have to notice the endianness.

How to convert char* to unsigned short in C++

I have a char* name which is a string representation of the short I want, such as "15" and need to output this as unsigned short unitId to a binary file. This cast must also be cross-platform compatible.
Is this the correct cast: unitId = unsigned short(temp);
Please note that I am at an beginner level in understanding binary.
I assume that your char* name contains a string representation of the short that you want, i.e. "15".
Do not cast a char* directly to a non-pointer type. Casts in C don't actually change the data at all (with a few exceptions)--they just inform the compiler that you want to treat one type into another type. If you cast a char* to an unsigned short, you'll be taking the value of the pointer (which has nothing to do with the contents), chopping off everything that doesn't fit into a short, and then throwing away the rest. This is absolutely not what you want.
Instead use the std::strtoul function, which parses a string and gives you back the equivalent number:
unsigned short number = (unsigned short) strtoul(name, NULL, 0);
(You still need to use a cast, because strtoul returns an unsigned long. This cast is between two different integer types, however, and so is valid. The worst that can happen is that the number inside name is too big to fit into a short--a situation that you can check for elsewhere.)
#include <boost/lexical_cast.hpp>
unitId = boost::lexical_cast<unsigned short>(temp);
To convert a string to binary in C++ you can use stringstream.
#include <sstream>
. . .
int somefunction()
{
unsigned short num;
char *name = "123";
std::stringstream ss(name);
ss >> num;
if (ss.fail() == false)
{
// You can write out the binary value of num. Since you mention
// cross platform in your question, be sure to enforce a byte order.
}
}
that cast will give you (a truncated) integer version of the pointer, assuming temp is also a char*. This is almost certainly not what you want (and the syntax is wrong too).
Take a look at the function atoi, it may be what you need, e.g. unitId = (unsigned short)(atoi(temp));
Note that this assumes that (a) temp is pointing to a string of digits and (b) the digits represent a number that can fit into an unsigned short
Is the pointer name the id, or the string of chars pointed to by name? That is if name contains "1234", do you need to output 1234 to the file? I will assume this is the case, since the other case, which you would do with unitId = unsigned short(name), is certainly wrong.
What you want then is the strtoul() function.
char * endp
unitId = (unsigned short)strtoul(name, &endp, 0);
if (endp == name) {
/* The conversion failed. The string pointed to by name does not look like a number. */
}
Be careful about writing binary values to a file; the result of doing the obvious thing may work now but will likely not be portable.
If you have a string (char* in C) representation of a number you must use the appropriate function to convert that string to the numeric value it represents.
There are several functions for doing this. They are documented here:
http://www.cplusplus.com/reference/clibrary/cstdlib