Convert byte array to unsigned int using pointers

Convert byte array to unsigned int using pointers - c++

char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
so if the memory looks like this:
0000 0000 0000 0000 0000 0000 0000 0001
The program outputs 0.
How do I make it output a uint value of the entire 32 bits?

Because you are using type promotion. char will promote to int when accessed. You'll get no diagnostic for this. So what you are doing is dereferencing the first element in your char array, which is 0, and assigning it to an int...which likewise ends up being 0.
What you want to do is technically undefined behavior but generally works. You want to do this:
unsigned int j = *reinterpret_cast<unsigned int*>(f);
At this point you'll be dealing with undefined behavior and with the endianness of the platform. You probably do not have the value you want recorded in your byte stream. You're treading in territory that requires intimate knowledge of your compiler and your target architecture.

Supposed your platform supports 32bit length integers, you can do the following to achieve the kind of cast you want:
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
uint32_t j;
memcpy(&j,f,sizeof(j));
printf("%u\n", j);
Be aware of endianess in integer representation.

In order to ensure that your code works on both little endian and big endian systems, you could do the following:
char f[4] = {0,0,0,1};
int32_t j = *((int32_t *)f);
j=ntohl(j);
printf("%d", j);
This will print 1 on both little endian and big endian systems. Without using ntohl, 1 will only be printed on Big Endian systems.
The code works because f is being assigned values in the same way as in a Big Endian System. Since network order is also Big Endian, ntohl will correctly convert j. If the host is Big Endian, j will remain unchanged. If the host is Little Endian, the bytes in j will be reversed.

What happens in the line:
unsigned int j = *f;
is simply assigning the first element of f to the integer j. It is equivalent to:
unsigned int j = f[0];
and since f[0] is 0 it is really just assigning a 0 to the integer:
unsigned int j = 0;
You will have to convert the elements of f.
Reinterpretation will always cause undefined behavior. The following example shows such usage and it is always incorrect:
unsigned int j = *( unsigned int* )f;
Undefined behavior may produce any result, even apparently correct ones. Even if such code appears to produce correct results when you run it for the first time, this isn't proof that the program is defined. The program is still undefined, and may produce incorrect results at any time.
There is no such thing as technically undefined behavior or generally works, the program is either undefined or not. Relying on such statements is dangerous and irresponsible.
Luckily we don't have to rely on such bad code.
All you need to do is choose the representation of the integer that will be stored in f, and then convert it. It appears you want to store in big-endian, with at most 8 bits per element. This doesn't mean that the machine must be big-endian, only the representation of the integer you're encoding in f. Representation of integers on the machine is not important, as this method is completely portable.
This means the most significant byte will appear first. The most significant byte is f[0], and the least significant byte is f[3].
We will need an integer capable of storing at least 32 bits and type unsigned long does this.
Type char is for used storing characters not integers. An unsigned integer type like unsigned char should be used.
Then only the conversion from big-endian encoded in f must be done:
unsigned char encoded[4] = { 0 , 0 , 0 , 1 };
unsigned long value = 0;
value = value | ( ( ( unsigned long )encoded[0] & 0xFF ) << 24 );
value = value | ( ( ( unsigned long )encoded[1] & 0xFF ) << 16 );
value = value | ( ( ( unsigned long )encoded[2] & 0xFF ) << 8 );
value = value | ( ( ( unsigned long )encoded[3] & 0xFF ) << 0 );

regarding the posted code:
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
in C, the return type from malloc() is void* which can be assigned to any other pointer, so casting just clutters the code and can be a problem when applying maintenance to the code.
The C standard defines sizeof(char) as 1, so that expression has absolutely no effect as a part of the expression passed to malloc()
the size of a int is not necessarily 4 (think of microprocessors or 64bit architecture)
the function: calloc() will pre set all the bytes to 0x00
which byte should be set to 0x01 depends on the Endianness of the underlying architecture
lets' assume, for now, your computer is a little Endian architecture. (I.E. Intel or similar)
then the code should look similar to the following:
#include <stdio.h> // printf(), perror()
#include <stdlib.h> // calloc(), exit(), EXIT_FAILURE
int main( void )
{
char *f = calloc( 1, sizeof(unsigned int) );
if( !f )
{
perror( "calloc failed" );
exit( EXIT_FAILURE );
}
// implied else, calloc successful
// f[sizeof(unsigned int)-1] = 0x01; // if big Endian
f[0] = 0x01; // assume little Endian/Intel x86 or similar
unsigned int j = *(unsigned int*)f;
printf("%u\n", j);
}
Which when compiled/linked, outputs the following:
1

Related

Malloc/VirtualAlloc prepending FFFFFF after 127 dec

Whenever I load a struct into memory the memory block seems to contain ffffff before certain bytes. After closer inspection I figured this occurs exactly at 0x80 (128 in dec).
#include <Windows.h>
#include <stdio.h>
typedef struct __tagMYSTRUCT {
BYTE unused[4096];
} MYSTRUCT, *PMYSTRUCT;
int main() {
MYSTRUCT myStruct;
for (int i = 0; i < 4094; i++) {
myStruct.unused[i] = 0x00;
}
myStruct.unused[4094] = 0x7F; /* No FFFFFF prepend */
myStruct.unused[4095] = 0x80; /* FFFFFF prepend */
MYSTRUCT *p = (MYSTRUCT*)malloc(4096);
*p = myStruct;
char *read = (char*)p;
for (int i = 0; i < 4096; i++) {
printf("%02x ", read[i]);
}
free(p);
p = NULL;
read = NULL;
return 0;
}
Any one knows why this happens and / or how to 'fix' it? (I assume bytes should reach to 0xff); if I write these bytes to a file, as in, fwrite(&myStruct, sizeof(myStruct), 1, [filestream]) it doesn't include the ffffff's
Compiler used: Visual Studio 2015 Community
P.S. as stated in the title the same occurs when using VirtualAlloc

This has nothing to do with VirtualAlloc nor with malloc.
Note that the following details depend on your platform and different things might happen on different operating systems or compilers:
char is a signed type (on your platform). It has a range of -128 to 127. When you treat the number 128 as a char it wraps around and is actually stored as -128.
%02x tells printf to print an unsigned int, in hexadecimal, with at least two digits. But you are actually passing a char. The compiler will automatically convert it to an int (with the value -128), which printf will then misinterpret as an unsigned int. On your platform, -128 converted to an unsigned int will give the same value as 0xffffff80.

convert 512-bits in a 256 Hexadecimal table

I have a buffer unsigned char table[512] that I want to convert faster into a table of short int table[256] where every position is compound by to bytes of the table.
I have a camera that give me this buffer that is the table to convert the disparity to the real depth.
unsigned char zDtable[512] = {0};
unsigned short int zDTableHexa[256]={0};
.. get the buffer data.....
for (int i = 0; i < 256; ++i) {
zDTableHexa[i]=zDtable[i*2]<<8 + zDtable[i*2+1];
}
these 2 has problem in converting well the values, the bytes are inversed:
memcpy(zDTableHexa_ptr,zDtable,256*sizeof( unsigned short int));
unsigned short* zDTableHexa = (unsigned short*)zDtable;

Try something like this
short* zDTableHexa = (short*)zDtable;
It simply maps the memory space of char array to an array of shorts. So if the memory looks like this:
(char0),(char1),(char2),(char3)
then it will be reinterpreted to be
(short0 = char0,char1),(short1 = char2,char3)
Beware that such direct reinterpretation depends on endianness and formally allows a sufficiently pedantic compiler to do ungood things, i.e., it's system- and compiler-specific.

4 chars to int in c++

I have to read 10 bytes from a file and the last 4 bytes are an unsigned integer. But I got a 11 char byte long char array / pointer. How do I convert the last 4 bytes (without the zero terminating character at the end) to an unsigned integer?
//pesudo code
char *p = readBytesFromFile();
unsigned int myInt = 0;
for( int i = 6; i < 10; i++ )
myInt += (int)p[i];
Is that correct? Doesn't seem correct to me.

The following code might work:
myInt = *(reinterpret_cast<unsigned int*>(p + 6));
iff:
There are no alignment problems (e.g. on a GPU memory space this is very likely to blow if some guarantees aren't provided).
You can guarantee that the system endianness is the same used to store the data
You can be sure that sizeof(int) == 4, this is not guaranteed everywhere
If not, as Dietmar suggested, you should loop over your data (forward or reverse according to the endianness) and do something like
myInt = myInt << 8 | static_cast<unsigned char>(p[i])
this is alignment-safe (it should be on every system). Still pay attention to points 1 and 3.

I agree with the previous answer but just wanna add that this solution may not work 100% if the file was created with a different endianness.
I do not want to confuse you with extra information but keep in mind that endianness may cause you problem when you cast directly from a file.
Here's a tutorial on endianness : http://www.codeproject.com/Articles/4804/Basic-concepts-on-Endianness

Try myInt = *(reinterpret_cast<unsigned int*>(p + 6));.
This takes the address of the 6th character, reinterprets as a pointer to an unsigned int, and then returns the (unsigned int) value it points to.

Maybe using an union is an option? I think this might work;
UPDATE: Yes, it works.
union intc32 {
char c[4];
int v;
};
int charsToInt(char a, char b, char c, char d) {
intc32 r = { { a, b, c, d } };
return r.v;
}

How to store double - endian independent

Despite the fact that big-endian computers are not very widely used, I want to store the double datatype in an independant format.
For int, this is really simple, since bit shifts make that very convenient.
int number;
int size=sizeof(number);
char bytes[size];
for (int i=0; i<size; ++i)
bytes[size-1-i] = (number >> 8*i) & 0xFF;
This code snipet stores the number in big endian format, despite the machine it is being run on. What is the most elegant way to do this for double?

The best way for portability and taking format into account, is serializing/deserializing the mantissa and the exponent separately. For that you can use the frexp()/ldexp() functions.
For example, to serialize:
int exp;
unsigned long long mant;
mant = (unsigned long long)(ULLONG_MAX * frexp(number, &exp));
// then serialize exp and mant.
And then to deserialize:
// deserialize to exp and mant.
double result = ldexp ((double)mant / ULLONG_MAX, exp);

The elegant thing to do is to limit the endianness problem to as small a scope as possible. That narrow scope is the I/O boundary between your program and the outside world. For example, the functions that send binary data to / receive binary data from some other application need to be aware of the endian problem, as do the functions that write binary data to / read binary data from some data file. Make those interfaces cognizant of the representation problem.
Make everything else blissfully ignorant of the problem. Use the local representation everywhere else. Represent a double precision floating point number as a double rather than an array of 8 bytes, represent a 32 bit integer as an int or int32_t rather than an array of 4 bytes, et cetera. Dealing with the endianness problem throughout your code is going to make your code bloated, error prone, and ugly.

The same. Any numeric object, including double, is eventually several bytes which are interpreted in a specific order according to endianness. So if you revert the order of the bytes you'll get exactly the same value in the reversed endianness.

char *src_data;
char *dst_data;
for (i=0;i<N*sizeof(double);i++) *dst_data++=src_data[i ^ mask];
// where mask = 7, if native == low endian
// mask = 0, if native = big_endian
The elegance lies in mask which handles also short and integer types: it's sizeof(elem)-1 if the target and source endianness differ.

Not very portable and standards violating, but something like this:
std::array<unsigned char, 8> serialize_double( double const* d )
{
std::array<unsigned char, 8> retval;
char const* begin = reinterpret_cast<char const*>(d);
char const* end = begin + sizeof(double);
union
{
uint8 i8s[8];
uint16 i16s[4];
uint32 i32s[2];
uint64 i64s;
} u;
u.i64s = 0x0001020304050607ull; // one byte order
// u.i64s = 0x0706050403020100ull; // the other byte order
for (size_t index = 0; index < 8; ++index)
{
retval[ u.i8s[index] ] = begin[index];
}
return retval;
}
might handle a platform with 8 bit chars, 8 byte doubles, and any crazy-ass byte ordering (ie, big endian in words but little endian between words for 64 bit values, for example).
Now, this doesn't cover the endianness of doubles being different than that of 64 bit ints.
An easier approach might be to cast your double into a 64 bit unsigned value, then output that as you would any other int.

void reverse_endian(double number, char (&bytes)[sizeof(double)])
{
const int size=sizeof(number);
memcpy(bytes, &number, size);
for (int i=0; i<size/2; ++i)
std::swap(bytes[i], bytes[size-i-1]);
}

How to get values from unaligned memory in a standard way?

I know C++11 has some standard facilities which would allow to get integral values from unaligned memory. How could something like this be written in a more standard way?
template <class R>
inline R get_unaligned_le(const unsigned char p[], const std::size_t s) {
R r = 0;
for (std::size_t i = 0; i < s; i++)
r |= (*p++ & 0xff) << (i * 8); // take the first 8-bits of the char
return r;
}
To take the values stored in litte-endian order, you can then write:
uint_least16_t value1 = get_unaligned_le<uint_least16_t > (&buffer[0], 2);
uint_least32_t value2 = get_unaligned_le<uint_least32_t > (&buffer[2], 4);

How did the integral values get into the unaligned memory to begin with?
If they were memcpyed in, then you can use memcpy to get them out.
If they were read from a file or the network, you have to know their
format: how they were written to begin with. If they are four byte
big-endian 2s complement (the usual network format), then something
like:
// Supposes native int is at least 32 bytes...
unsigned
getNetworkInt( unsigned char const* buffer )
{
return buffer[0] << 24
| buffer[1] << 16
| buffer[2] << 8
| buffer[3];
}
This will work for any unsigned type, provided the type you're aiming
for is at least as large as the type you input. For signed, it depends
on just how portable you want to be. If all of your potential target
machines are 2's complement, and will have an integral type with the
same size as your input type, then you can use exactly the same code as
above. If your native machine is a 1's complement 36 bit machine (e.g.
a Unisys mainframe), and you're reading signed network format integers
(32 bit 2's complement), you'll need some additional logic.

As always, create the desired variable and populate it byte-wise:
#include <algorithm>
#include <type_traits>
template <typename R>
R get(unsigned char * p, std::size_t len = sizeof(R))
{
assert(len >= sizeof(R) && std::is_trivially_copyable<R>::value);
R result;
std::copy(p, p + sizeof(R), static_cast<unsigned char *>(&result));
return result;
}
This only works universally for trivially copyable types, though you can probably use it for on-trivial types if you have additional guarantees from elsewhere.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert byte array to unsigned int using pointers - c++

char* f = (char)malloc(4 sizeof(char)); f[0] = 0; f[1] = 0; f[2] = 0; f[3] = 1; unsigned int j = *f; printf("%u\n", j); so if the memory looks like this: 0000 0000 0000 0000 0000 0000 0000 0001 The program outputs 0. How do I make it output a uint value of the entire 32 bits?

Related

Malloc/VirtualAlloc prepending FFFFFF after 127 dec

convert 512-bits in a 256 Hexadecimal table

4 chars to int in c++

How to store double - endian independent

How to get values from unaligned memory in a standard way?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert byte array to unsigned int using pointers - c++

char* f = (char*)malloc(4 * sizeof(char)); f[0] = 0; f[1] = 0; f[2] = 0; f[3] = 1; unsigned int j = *f; printf("%u\n", j); so if the memory looks like this: 0000 0000 0000 0000 0000 0000 0000 0001 The program outputs 0. How do I make it output a uint value of the entire 32 bits?

Related

Malloc/VirtualAlloc prepending FFFFFF after 127 dec

convert 512-bits in a 256 Hexadecimal table

4 chars to int in c++

How to store double - endian independent

How to get values from unaligned memory in a standard way?

Categories

Resources

char* f = (char)malloc(4 sizeof(char)); f[0] = 0; f[1] = 0; f[2] = 0; f[3] = 1; unsigned int j = *f; printf("%u\n", j); so if the memory looks like this: 0000 0000 0000 0000 0000 0000 0000 0001 The program outputs 0. How do I make it output a uint value of the entire 32 bits?