Whenever I load a struct into memory the memory block seems to contain ffffff before certain bytes. After closer inspection I figured this occurs exactly at 0x80 (128 in dec).
#include <Windows.h>
#include <stdio.h>
typedef struct __tagMYSTRUCT {
BYTE unused[4096];
} MYSTRUCT, *PMYSTRUCT;
int main() {
MYSTRUCT myStruct;
for (int i = 0; i < 4094; i++) {
myStruct.unused[i] = 0x00;
}
myStruct.unused[4094] = 0x7F; /* No FFFFFF prepend */
myStruct.unused[4095] = 0x80; /* FFFFFF prepend */
MYSTRUCT *p = (MYSTRUCT*)malloc(4096);
*p = myStruct;
char *read = (char*)p;
for (int i = 0; i < 4096; i++) {
printf("%02x ", read[i]);
}
free(p);
p = NULL;
read = NULL;
return 0;
}
Any one knows why this happens and / or how to 'fix' it? (I assume bytes should reach to 0xff); if I write these bytes to a file, as in, fwrite(&myStruct, sizeof(myStruct), 1, [filestream]) it doesn't include the ffffff's
Compiler used: Visual Studio 2015 Community
P.S. as stated in the title the same occurs when using VirtualAlloc
This has nothing to do with VirtualAlloc nor with malloc.
Note that the following details depend on your platform and different things might happen on different operating systems or compilers:
char is a signed type (on your platform). It has a range of -128 to 127. When you treat the number 128 as a char it wraps around and is actually stored as -128.
%02x tells printf to print an unsigned int, in hexadecimal, with at least two digits. But you are actually passing a char. The compiler will automatically convert it to an int (with the value -128), which printf will then misinterpret as an unsigned int. On your platform, -128 converted to an unsigned int will give the same value as 0xffffff80.
Related
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
so if the memory looks like this:
0000 0000 0000 0000 0000 0000 0000 0001
The program outputs 0.
How do I make it output a uint value of the entire 32 bits?
Because you are using type promotion. char will promote to int when accessed. You'll get no diagnostic for this. So what you are doing is dereferencing the first element in your char array, which is 0, and assigning it to an int...which likewise ends up being 0.
What you want to do is technically undefined behavior but generally works. You want to do this:
unsigned int j = *reinterpret_cast<unsigned int*>(f);
At this point you'll be dealing with undefined behavior and with the endianness of the platform. You probably do not have the value you want recorded in your byte stream. You're treading in territory that requires intimate knowledge of your compiler and your target architecture.
Supposed your platform supports 32bit length integers, you can do the following to achieve the kind of cast you want:
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
uint32_t j;
memcpy(&j,f,sizeof(j));
printf("%u\n", j);
Be aware of endianess in integer representation.
In order to ensure that your code works on both little endian and big endian systems, you could do the following:
char f[4] = {0,0,0,1};
int32_t j = *((int32_t *)f);
j=ntohl(j);
printf("%d", j);
This will print 1 on both little endian and big endian systems. Without using ntohl, 1 will only be printed on Big Endian systems.
The code works because f is being assigned values in the same way as in a Big Endian System. Since network order is also Big Endian, ntohl will correctly convert j. If the host is Big Endian, j will remain unchanged. If the host is Little Endian, the bytes in j will be reversed.
What happens in the line:
unsigned int j = *f;
is simply assigning the first element of f to the integer j. It is equivalent to:
unsigned int j = f[0];
and since f[0] is 0 it is really just assigning a 0 to the integer:
unsigned int j = 0;
You will have to convert the elements of f.
Reinterpretation will always cause undefined behavior. The following example shows such usage and it is always incorrect:
unsigned int j = *( unsigned int* )f;
Undefined behavior may produce any result, even apparently correct ones. Even if such code appears to produce correct results when you run it for the first time, this isn't proof that the program is defined. The program is still undefined, and may produce incorrect results at any time.
There is no such thing as technically undefined behavior or generally works, the program is either undefined or not. Relying on such statements is dangerous and irresponsible.
Luckily we don't have to rely on such bad code.
All you need to do is choose the representation of the integer that will be stored in f, and then convert it. It appears you want to store in big-endian, with at most 8 bits per element. This doesn't mean that the machine must be big-endian, only the representation of the integer you're encoding in f. Representation of integers on the machine is not important, as this method is completely portable.
This means the most significant byte will appear first. The most significant byte is f[0], and the least significant byte is f[3].
We will need an integer capable of storing at least 32 bits and type unsigned long does this.
Type char is for used storing characters not integers. An unsigned integer type like unsigned char should be used.
Then only the conversion from big-endian encoded in f must be done:
unsigned char encoded[4] = { 0 , 0 , 0 , 1 };
unsigned long value = 0;
value = value | ( ( ( unsigned long )encoded[0] & 0xFF ) << 24 );
value = value | ( ( ( unsigned long )encoded[1] & 0xFF ) << 16 );
value = value | ( ( ( unsigned long )encoded[2] & 0xFF ) << 8 );
value = value | ( ( ( unsigned long )encoded[3] & 0xFF ) << 0 );
regarding the posted code:
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
in C, the return type from malloc() is void* which can be assigned to any other pointer, so casting just clutters the code and can be a problem when applying maintenance to the code.
The C standard defines sizeof(char) as 1, so that expression has absolutely no effect as a part of the expression passed to malloc()
the size of a int is not necessarily 4 (think of microprocessors or 64bit architecture)
the function: calloc() will pre set all the bytes to 0x00
which byte should be set to 0x01 depends on the Endianness of the underlying architecture
lets' assume, for now, your computer is a little Endian architecture. (I.E. Intel or similar)
then the code should look similar to the following:
#include <stdio.h> // printf(), perror()
#include <stdlib.h> // calloc(), exit(), EXIT_FAILURE
int main( void )
{
char *f = calloc( 1, sizeof(unsigned int) );
if( !f )
{
perror( "calloc failed" );
exit( EXIT_FAILURE );
}
// implied else, calloc successful
// f[sizeof(unsigned int)-1] = 0x01; // if big Endian
f[0] = 0x01; // assume little Endian/Intel x86 or similar
unsigned int j = *(unsigned int*)f;
printf("%u\n", j);
}
Which when compiled/linked, outputs the following:
1
I could not fully understand the consequences of what I read here: Casting an int pointer to a char ptr and vice versa
In short, would this work?
set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
if ((uintmax_t)buffer % 4) {//misaligned
for (int i = 0; i < 4; i++) {
buffer[i] = 0xff;
}
} else {//4-byte alignment
*((uint32_t*) buffer) = MASK;
}
}
Edit
There was a long discussion (it was in the comments, which mysteriously got deleted) about what type the pointer should be casted to in order to check the alignment. The subject is now addressed here.
This conversion is safe if you are filling same value in all 4 bytes. If byte order matters then this conversion is not safe.
Because when you use integer to fill 4 Bytes at a time it will fill 4 Bytes but order depends on the endianness.
No, it won't work in every case. Aside from endianness, which may or may not be an issue, you assume that the alignment of uint32_t is 4. But this quantity is implementation-defined (C11 Draft N1570 Section 6.2.8). You can use the _Alignof operator to get the alignment in a portable way.
Second, the effective type (ibid. Sec. 6.5) of the location pointed to by buffer may not be compatible to uint32_t (e.g. if buffer points to an unsigned char array). In that case you break strict aliasing rules once you try reading through the array itself or through a pointer of different type.
Assuming that the pointer actually points to an array of unsigned char, the following code will work
typedef union { unsigned char chr[sizeof(uint32_t)]; uint32_t u32; } conv_t;
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffffU;
if ((uintptr_t)buffer % _Alignof(uint32_t)) {// misaligned
for (size_t i = 0; i < sizeof(uint32_t); i++) {
buffer[i] = 0xffU;
}
} else { // correct alignment
conv_t *cnv = (conv_t *) buffer;
cnv->u32 = MASK;
}
}
This code might be of help to you. It shows a 32-bit number being built by assigning its contents a byte at a time, forcing misalignment. It compiles and works on my machine.
#include<stdint.h>
#include<stdio.h>
#include<inttypes.h>
#include<stdlib.h>
int main () {
uint32_t *data = (uint32_t*)malloc(sizeof(uint32_t)*2);
char *buf = (char*)data;
uintptr_t addr = (uintptr_t)buf;
int i,j;
i = !(addr%4) ? 1 : 0;
uint32_t x = (1<<6)-1;
for( j=0;j<4;j++ ) buf[i+j] = ((char*)&x)[j];
printf("%" PRIu32 "\n",*((uint32_t*) (addr+i)) );
}
As mentioned by #Learner, endianness must be obeyed. The code above is not portable and would break on a big endian machine.
Note that my compiler throws the error "cast from ‘char*’ to ‘unsigned int’ loses precision [-fpermissive]" when trying to cast a char* to an unsigned int, as done in the original post. This post explains that uintptr_t should be used instead.
In addition to the endian issue, which has already been mentioned here:
CHAR_BIT - the number of bits per char - should also be considered.
It is 8 on most platforms, where for (int i=0; i<4; i++) should work fine.
A safer way of doing it would be for (int i=0; i<sizeof(uint32_t); i++).
Alternatively, you can include <limits.h> and use for (int i=0; i<32/CHAR_BIT; i++).
Use reinterpret_cast<>() if you want to ensure the underlying data does not "change shape".
As Learner has mentioned, when you store data in machine memory endianess becomes a factor. If you know how the data is stored correctly in memory (correct endianess) and you are specifically testing its layout as an alternate representation, then you would want to use reinterpret_cast<>() to test that memory, as a specific type, without modifying the original storage.
Below, I've modified your example to use reinterpret_cast<>():
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
if (*reinterpret_cast<unsigned int *>(buffer) % 4) {//misaligned
for (int i = 0; i < 4; i++) {
buffer[i] = 0xff;
}
} else {//4-byte alignment
*reinterpret_cast<unsigned int *>(buffer) = MASK;
}
}
It should also be noted, your function appears to set the buffer (32-bytes of contiguous memory) to 0xFFFFFFFF, regardless of which branch it takes.
Your code is perfect for working with any architecture with 32bit and up. There is no issue with byte ordering since all your source bytes are 0xFF.
At x86 or x64 machines, the extra work necessary to deal with eventually unaligned access to RAM are managed by the CPU and transparent to the programmer (since Pentium II), with some performance cost at each access. So, if you are just setting the first four bytes of a buffer a few times, you are good to simplify your function:
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
*((uint32_t *)buffer) = MASK;
}
Some readings:
A Linux kernel doc about UNALIGNED MEMORY ACCESSES
Intel Architecture Optimization Manual, section 3.4
Windows Data Alignment on IPF, x86, and x64
A Practical 'Aligned vs. unaligned memory access', by Alexander Sandler
I am wondering why the pointer value (324502) is in var signalLengthDebugVar1 instead of the expected integer value (2)?
struct ShmLengthOfSignalName {
int signalLength;
};
//...
BYTE* pBuf = NULL;
//...
int main(void){
//...
pBuf = (BYTE*) MapViewOfFile(hMapFile, FILE_MAP_ALL_ACCESS, 0, 0, BUF_SIZE);
//...
JobaSignal sig1;
printf("Value SignalLength: %d \r\n", pBuf[30]); // 2
const ShmLengthOfSignalName * signalNameLengthPtr = (const ShmLengthOfSignalName *)(pBuf + 30);
int signalLengthDebugVar1 = signalNameLengthPtr->signalLength; // content: 324502 maybe pointer?
int signalLengthDebugVar2 = (int) pBuf[30]; // content 2
sig1.setNameLength(signalLengthDebugVar2);
}
When you print the value, you're reading only the single byte at pBuf + 30:
// takes pBuf[30], converts that byte's value to int, and prints it
printf("Value SignalLength: %d \r\n", pBuf[30]); // 2
Later, when you cast the pointer and dereference it, you're accessing a full int, which is sizeof(int) bytes (likely 4). This occupies not just the byte at pBuf + 30 but also the subsequent bytes at pBuf + 31, etc., up to sizeof(int) on your platform. It also interprets these bytes according to your platform's byte-endianness (little-endian for Intel, big-endian for most other platforms).
// the signalLength struct member is an int
int signalLengthDebugVar1 = signalNameLengthPtr->signalLength; // content: 324502 maybe pointer?
Note also that the compiler is permitted to add padding before or after the loation of its signalLength field. In other words, you can't assume that signalLength will start at struct offset zero, unless you use extern "C" or a compiler-specific #pragma. And even then, you can't control the endianness interpretation, so if the data was encoded as big-endian and you're on a little-endian machine like x86, the value you see will be wrong.
The bottom line is that in C++ this is not a safe way to decode binary data.
I am looking for any library of example parsing a binary msg in C++. Most people asks for reading a binary file, or data received in a socket, but I just have a set of binary messages I need to decode. Somebody mentioned boost::spirit, but I haven't been able to find a suitable example for my needs.
As an example:
9A690C12E077033811FFDFFEF07F042C1CE0B704381E00B1FEFFF78004A92440
where first 8 bits are a preamble, next 6 bits the msg ID (an integer from 0 to 63), next 212 bits are data, and final 24 bits are a CRC24.
So in this case, msg 26, I have to get this data from the 212 data bits:
4 bits integer value
4 bits integer value
A 9 bit float value from 0 to 63.875, where LSB is 0.125
4 bits integer value
EDIT: I need to operate at bit level, so a memcpy is not a good solution, since it copies a number of bytes. To get first 4-bit integer value I should get 2 bits from a byte, and another 2 bits from the next byte, shift each pair and compose. What I am asking for is a more elegant way of extracting the values, because I have about 20 different messages and wanted to reach a common solution to parse them at bit level.
And so on.
Do you know os any library which can easily achieve this?
I also found other Q/A where static_cast is being used. I googled about it, and for each person recommending this approach, there is another one warning about endians. Since I already have my message, I don't know if such a warning applies to me, or is just for socket communications.
EDIT: boost:dynamic_bitset looks promising. Any help using it?
If you can't find a generic library to parse your data, use bitfields to get the data and memcpy() it into an variable of the struct. See the link Bitfields. This will be more streamlined towards your application.
Don't forget to pack the structure.
Example:
#pragma pack
include "order32.h"
struct yourfields{
#if O32_HOST_ORDER == O32_BIG_ENDIAN
unsigned int preamble:8;
unsigned int msgid:6;
unsigned data:212;
unsigned crc:24;
#else
unsigned crc:24;
unsigned data:212;
unsigned int msgid:6;
unsigned int preamble:8;
#endif
}/*__attribute__((packed)) for gcc*/;
You can do a little compile time check to assert if your machine uses LITTLE ENDIAN or BIG ENDIAN format. After that define it into a PREPROCESSOR SYMBOL::
//order32.h
#ifndef ORDER32_H
#define ORDER32_H
#include <limits.h>
#include <stdint.h>
#if CHAR_BIT != 8
#error "unsupported char size"
#endif
enum
{
O32_LITTLE_ENDIAN = 0x03020100ul,
O32_BIG_ENDIAN = 0x00010203ul,
O32_PDP_ENDIAN = 0x01000302ul
};
static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
{ { 0, 1, 2, 3 } };
#define O32_HOST_ORDER (o32_host_order.value)
#endif
Thanks to code by Christoph # here
Example program for using bitfields and their outputs:
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <memory.h>
using namespace std;
struct bitfields{
unsigned opcode:5;
unsigned info:3;
}__attribute__((packed));
struct bitfields opcodes;
/* info: 3bits; opcode: 5bits;*/
/* 001 10001 => 0x31*/
/* 010 10010 => 0x52*/
void set_data(unsigned char data)
{
memcpy(&opcodes,&data,sizeof(data));
}
void print_data()
{
cout << opcodes.opcode << ' ' << opcodes.info << endl;
}
int main(int argc, char *argv[])
{
set_data(0x31);
print_data(); //must print 17 1 on my little-endian machine
set_data(0x52);
print_data(); //must print 18 2
cout << sizeof(opcodes); //must print 1
return 0;
}
You can manipulate bits for your own, for example to parse 4 bit integer value do:
char[64] byte_data;
size_t readPos = 3; //any byte
int value = 0;
int bits_to_read = 4;
for (size_t i = 0; i < bits_to_read; ++i) {
value |= static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
}
Floats usually sent as string data:
std::string temp;
temp.assign(_data+readPos, 9);
flaot value = std::stof(temp);
If your data contains custom float format then just extract bits and do your math:
char[64] byte_data;
size_t readPos = 3; //any byte
float value = 0;
int i = 0;
int bits_to_read = 9;
while (bits_to_read) {
if (i > 8) {
++readPos;
i = 0;
}
const int bit = static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
//here your code
++i;
--bits_to_read;
}
Here is a good article that describes several solutions to the problem.
It even contains the reference to the ibstream class that the author created specifically for this purpose (the link seems dead, though). The only other mention of this class I could find is in the bit C++ library here - it might be what you need, though it's not popular and it's under GPL.
Anyway, the boost::dynamic_bitset might be the best choice as it's time-tested and community-proven. But I have no personal experience with it.
I want to read sizeof(int) bytes from a char* array.
a) In what scenario's do we need to worry if endianness needs to be checked?
b) How would you read the first 4 bytes either taking endianness into consideration or not.
EDIT : The sizeof(int) bytes that I have read needs to be compared with an integer value.
What is the best approach to go about this problem
Do you mean something like that?:
char* a;
int i;
memcpy(&i, a, sizeof(i));
You only have to worry about endianess if the source of the data is from a different platform, like a device.
a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.
You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
You process a data file that was created on a system with a different "endianness."
In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.
b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:
int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data
Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.
To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:
inline unsigned short swap_16bit(unsigned short us)
{
return (unsigned short)(((us & 0xFF00) >> 8) |
((us & 0x00FF) << 8));
}
inline unsigned long swap_32bit(unsigned long ul)
{
return (unsigned long)(((ul & 0xFF000000) >> 24) |
((ul & 0x00FF0000) >> 8) |
((ul & 0x0000FF00) << 8) |
((ul & 0x000000FF) << 24));
}
Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:
int foo = *(int*)(stream+offset_in_stream);
The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is
best: the library routines htonl and ntohl work very well with this, and they
are usually fairly well optimized.
However, if network byte order is not being used, you may need to do things in
other ways. You need to know two things: the size of an integer, and the byte order.
Once you know that, you know how many bytes to extract and in which order to put
them together into an int.
Some example code that assumes sizeof(int) is the right number of bytes:
#include <limits.h>
int bytes_to_int_big_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result = (result << CHAR_BIT) + bytes[i];
return result;
}
int bytes_to_int_little_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result += bytes[i] << (i * CHAR_BIT);
return result;
}
#ifdef TEST
#include <stdio.h>
int main(void)
{
const int correct = 0x01020304;
const char little[] = "\x04\x03\x02\x01";
const char big[] = "\x01\x02\x03\x04";
printf("correct: %0x\n", correct);
printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
return 0;
}
#endif
How about
int int_from_bytes(const char * bytes, _Bool reverse)
{
if(!reverse)
return *(int *)(void *)bytes;
char tmp[sizeof(int)];
for(size_t i = sizeof(tmp); i--; ++bytes)
tmp[i] = *bytes;
return *(int *)(void *)tmp;
}
You'd use it like this:
int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);
If you're on a system where casting void * to int * may result in alignment conflicts, you can use
int int_from_bytes(const char * bytes, _Bool reverse)
{
int tmp;
if(reverse)
{
for(size_t i = sizeof(tmp); i--; ++bytes)
((char *)&tmp)[i] = *bytes;
}
else memcpy(&tmp, bytes, sizeof(tmp));
return tmp;
}
You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.
Given that, can't you just use a for loop?
void ReadBytes(char * stream) {
for (int i = 0; i < sizeof(int); i++) {
char foo = stream[i];
}
}
}
Are you asking for something more complicated than that?
You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.
Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.
Why read when you can just compare?
bool AreEqual(int i, char *data)
{
return memcmp(&i, data, sizeof(int)) == 0;
}
If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.