endianness when using part of int array - c++

I'm trying to pullout values from a uint8_t array.
But I'm having troubles understanding how these are represented in the memory.
#include <cstdio>
#include <cstring>
#include <stdint.h>
int main(){
uint8_t tmp1[2];
uint16_t tmp2 = 511;//0x01 + 0xFF = 0x01FF
tmp1[0] = 255;//0xFF
tmp1[1] = 1;//0x01
fprintf(stderr,"memcmp = %d\n",memcmp(tmp1,&tmp2,2));
fprintf(stderr,"first elem in uint8 array = %u\n",(uint8_t) *(tmp1+0));
fprintf(stderr,"first elem in uint8 array = %u\n",(uint8_t) *(tmp1+1));
fprintf(stderr,"2xuint8_t as uint16_t = %u\n",(uint16_t) *tmp1);
return 0;
}
So i have an 2 element long array of datatype uint8_t. And I have a single variable uint16_t.
So when I take the value 511 on my little endian machine, I would assume this is layed out in memory as
0000 0001 1111 1111
But when I use memcompare it looks like it is actually being represented as
1111 1111 0000 0001
So little endianness is only used "within" each byte?
And since the single bit that is set in the tmp1[1] counts as 256, even though it is further "right" in my stream. The values for each byte (not bit), is therefore bigendian? I'm abit confused about this.
Also if I want to coerce an fprint, to printout, my 2xuint8_t as a single uint16_t, how do I do this. The code below doesn't work, it only printouts the first byte.
fprintf(stderr,"2x uint8_t as uint16_t = %u\n",(uint16_t) *tmp1);
Thanks in advance

Your assumption of what you expect is backwards. Your observation is consistent with little-endian representation. To answer your last question, it would look like this:
fprintf(stderr,"2x uint8_t as uint16_t = %u\n",*(uint16_t*)tmp1);

Don't think of endianness as "within bytes". Think of it as "byte ordering". (That is, the actual bit ordering never matters because humans typically read values in big-endian.) If it helps to imagine that the bits are reversed on a little-endian machine, you can imagine it that way. (in that case, your example would have looked like 1111 1111 1000 0000, but as I said, humans don't typically read numbers such that the most significant values are to the right...but you might want to imagine that's how the computer sees things, if it helps you understand little-endian.)
On a little endian machine, 0xAABBCCDD would be seen as 0xDD 0xCC 0xBB 0xAA in memory, just as you are seeing. On a big-endian machine (such as a PPC box) you'd see the same ordering in-memory as you see when you write out the 32-bit word.

First, if you want be 100% sure that your variables are stored in right order in memory, you should put them in a struct.
Then note that memcmp() treats input you give it as a sequence of bytes, since it has no assumptions regarding the nature of the data you give it. Think, for example, of the following code:
#include <stdio.h>
#include <stdint.h>
int main(int argc, char** argv) {
int32_t a, b;
a = 1;
b = -1;
printf( "%i\n", memcmp( &a, &b, sizeof( int32_t ) ) );
}
It outputs -254 on my little-endian machine regardless of fact that a > b. This is because it has no ideas about what the memory actually is, so it compares them like an array of uint8_t.
If you actually want to visualize how the data is represented on your machine, you may first use fwrite to write a struct into the file and then open it with your favorite hex editor (in my experience, wxHexEditor is great in telling you how the data looks if it is X-bit Y-endian ingeter). Here's the source:
#include <stdio.h>
#include <stdint.h>
typedef struct {
uint8_t tmp1[2];
uint16_t tmp2;
} mytmp;
int main(int argc, char** argv) {
mytmp tmp;
tmp.tmp1[0] = 255;
tmp.tmp1[1] = 1;
tmp.tmp2 = 511;
FILE* file = fopen( "struct-dump", "w" );
fwrite( &tmp, sizeof( mytmp ), 1, file );
fclose( file );
}
As for treating an array of uint8_t as uint16_t, you would probably want to declare a union or use pointer coercion.

Related

ntohl() returning 0 when reading from mmap()

Good evening, I am attempting to read some binary information from a .img file. I can retrieve 16-bit numbers (uint16_t) from ntohs(), but when I try to retrieve from the same position using ntohl(), it gives me 0 instead.
Here are the critical pieces of my program.
#include <iostream>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <arpa/inet.h>
#include <cmath>
int fd;
struct blockInfo {
long blockSize = 0;
long blockCount = 0;
long fatStart = 0;
long fatBlocks = 0;
long rootStart = 0;
long rootBlocks = 0;
long freeBlocks = 0;
long resBlocks = 0;
long alloBlocks = 0;
};
int main(int argc, char *argv[]) {
fd = open(argv[1], O_RDWR);
// Get file size
struct stat buf{};
stat(path, &buf);
size_t size = buf.st_size;
// A struct to hold data retrieved from a big endian image.
blockInfo info;
auto mapPointer = (char*) mmap(nullptr, size,
(PROT_READ | PROT_WRITE), MAP_PRIVATE, fd, 0);
info.blockSize = ntohs((uint16_t) mapPointer[12]);
long anotherBlockSize = ntohl((uint32_t) mapPointer[11]);
printf("%ld", info.blockSize); // == 512, correct
printf("%ld", anotherBlockSize); // == 0, what?
}
I understand that blockSize and anotherBlockSize are not supposed to be equal, but anotherBlockSize should be non-zero at the least, right?
Something else, I go to access data at ntohs(pointer[16]), which should return 2, but also returns 0. What is going on here? Any help would be appreciated.
No, anotherBlockSize will not necessarily be non-zero
info.blockSize = ntohs((uint16_t) mapPointer[12]);
This code reads a char from offset 12 relatively to mapPointer, casts it to uint16_t and applies ntohs() to it.
long anotherBlockSize = ntohl((uint32_t) mapPointer[11]);
This code reads a char from offset 11 relatively to mapPointer, casts it to uint32_t and applies ntohl() to it.
Obviously, you are reading non-overlapped data (different chars) from the mapped memory, so you should not expect blockSize and anotherBlockSize to be connected.
If you are trying to read the same memory in different ways (as uint32_t and uint16_t), you must do some pointer casting:
info.blockSize = ntohs( *((uint16_t*)&mapPointer[12]));
Note that such code will generally be platform dependent. Such cast working perfectly on x86 may fail on ARM.
auto mapPointer = (char*) ...
This declares mapPointer to be a char *.
... ntohl((uint32_t) mapPointer[11]);
Your obvious intent here is to use mapPointer to retrieve a 32 bit value, a four-byte value, from this location.
Unfortunately, because mapPointer is a plain, garden-variety char *, the expression mapPointer[11] evaluates to a single, lonely char value. One byte. That's what the code reads from the mmaped memory block, at the 11th offset from the start of the block. The (uint32_t) does not read an uint32_t from the address referenced mapPointer+11. mapPointer[11] reads a single char value from mapPointer+11, because mapPointer is a pointer to a char, converts it to a uint32_t, and feeds to to ntohl().

What is the most suitable type of vector to keep the bytes of a file?

What is the most suitable type of vector to keep the bytes of a file?
I'm considering using the int type, because the bits "00000000" (1 byte) are interpreted to 0!
The goal is to save this data (bytes) to a file and retrieve from this file later.
NOTE: The files contain null bytes ("00000000" in bits)!
I'm a bit lost here. Help me! =D Thanks!
UPDATE I:
To read the file I'm using this function:
char* readFileBytes(const char *name){
std::ifstream fl(name);
fl.seekg( 0, std::ios::end );
size_t len = fl.tellg();
char *ret = new char[len];
fl.seekg(0, std::ios::beg);
fl.read(ret, len);
fl.close();
return ret;
}
NOTE I: I need to find a way to ensure that bits "00000000" can be recovered from the file!
NOTE II: Any suggestions for a safe way to save those bits "00000000" to a file?
NOTE III: When using char array I had problems converting bits "00000000" for that type.
Code Snippet:
int bit8Array[] = {0, 0, 0, 0, 0, 0, 0, 0};
char charByte = (bit8Array[7] ) |
(bit8Array[6] << 1) |
(bit8Array[5] << 2) |
(bit8Array[4] << 3) |
(bit8Array[3] << 4) |
(bit8Array[2] << 5) |
(bit8Array[1] << 6) |
(bit8Array[0] << 7);
UPDATE II:
Following the #chqrlie recommendations.
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <random>
#include <cstring>
#include <iterator>
std::vector<unsigned char> readFileBytes(const char* filename)
{
// Open the file.
std::ifstream file(filename, std::ios::binary);
// Stop eating new lines in binary mode!
file.unsetf(std::ios::skipws);
// Get its size
std::streampos fileSize;
file.seekg(0, std::ios::end);
fileSize = file.tellg();
file.seekg(0, std::ios::beg);
// Reserve capacity.
std::vector<unsigned char> unsignedCharVec;
unsignedCharVec.reserve(fileSize);
// Read the data.
unsignedCharVec.insert(unsignedCharVec.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return unsignedCharVec;
}
int main(){
std::vector<unsigned char> unsignedCharVec;
// txt file contents "xz"
unsignedCharVec=readFileBytes("xz.txt");
// Letters -> UTF8/HEX -> bits!
// x -> 78 -> 0111 1000
// z -> 7a -> 0111 1010
for(unsigned char c : unsignedCharVec){
printf("%c\n", c);
for(int o=7; o >= 0; o--){
printf("%i", ((c >> o) & 1));
}
printf("%s", "\n");
}
// Prints...
// x
// 01111000
// z
// 01111010
return 0;
}
UPDATE III:
This is the code I am using using to write to a binary file:
void writeFileBytes(const char* filename, std::vector<unsigned char>& fileBytes){
std::ofstream file(filename, std::ios::out|std::ios::binary);
file.write(fileBytes.size() ? (char*)&fileBytes[0] : 0,
std::streamsize(fileBytes.size()));
}
writeFileBytes("xz.bin", fileBytesOutput);
UPDATE IV:
Futher read about UPDATE III:
c++ - Save the contents of a "std::vector<unsigned char>" to a file
CONCLUSION:
Definitely the solution to the problem of the "00000000" bits (1 byte) was change the type that stores the bytes of the file to std::vector<unsigned char> as the guidance of friends. std::vector<unsigned char> is a universal type (exists in all environments) and will accept any octal (unlike char* in "UPDATE I")!
In addition, changing from array (char) to vector (unsigned char) was crucial for success! With vector I manipulate my data more securely and completely independent of its content (in char array I have problems with this).
Thanks a lot!
Use std::vector<unsigned char>. Don't use std::uint8_t: it's won't exist on systems that don't have a native hardware type of exactly 8 bits. unsigned char will always exist; it will usually be the smallest addressable type that the hardware supports, and it's required to be at least 8 bits wide, so if you're trafficking in 8-bit bytes, it will handle the bits that you need.
If you really, really, really like the fixed-width types, you might consider std::uint_least8_t, which will always exist, and has at least eight bits, or std::uint_fast8_t, which also has at least eight bits. But file I/O traffics in char types, and mixing char and it's variants with vaguely specified "least" and "fast" types may well get confusing.
There are 3 problems in your code:
You use the char type and return a char *. Yet the return value is not a proper C string as you do not allocate an extra byte for the '\0' terminator nor null terminate it.
If the file may contain null bytes, you should probably use type unsigned char or uint8_t to make it explicit that the array does not contain text.
You do not return the array size to the caller. The caller has no way to tell how long the array is. You should probably use a std::vector<uint8_t> or std::vector<unsigned char> instead of an array allocated with new.
uint8_t is the winner in my eyes:
it's exactly 8 bits, or 1 byte, long;
it's unsigned without requiring you to type unsigned every time;
it's exactly the same on all platforms;
it's a generic type that does not imply any specific use, unlike char / unsigned char, which is associated with characters of text even if it can technically be used for any purpose just the same as uint8_t.
Bottom line: uint8_t is functionally equivalent to unsigned char, but does a better job of saying this is some data of unspecified nature in the source code.
So use std::vector<uint8_t>.
#include <stdint.h> to make the uint8_t definition available.
P. S. As pointed out in the comments, the C++ standard defines char as 1 byte, and byte is not, strictly speaking, required to be the same as octet (8 bits). On such a hypothetical system, char will still exist and will be 1 byte long, but uint8_t is defined as 8 bits (octet) and thus may not exist (due to implementation difficulties / overhead). So char is more portable, theoretically speaking, but uint8_t is more strict and has wider guarantees of expected behavior.

Endian-ness in a char array containing binary characters

I'm building some code to read a RIFF wav file and I've bumped into something odd.
The first 4 bytes of the file header are the word RIFF in big-endian ascii coding:
0x5249 0x4646
I read this first element using:
char *fileID = new char[4];
filestream.read(fileID,4);
When I write this to screen the results are as expected:
std::cout << fileID << std::endl;
>> RIFF
Now, the next 4 bytes give the size of the file, but crucially they're little-endian.
So, I write a little function to flip the bytes, based on a union:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[3];
flip.flip_char[1] = input[2];
flip.flip_char[2] = input[1];
flip.flip_char[3] = input[0];
return flip.flip_int;
}
This looks good to me, except when I call it, the value returned is totally wrong. Interestingly, the following code (where the bytes are not reversed!) works correctly:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[0];
flip.flip_char[1] = input[1];
flip.flip_char[2] = input[2];
flip.flip_char[3] = input[3];
return flip.flip_int;
}
This has thoroughly confused me. Is the union somehow reversing the bytes for me?! If not, how are the bytes being converted to int correctly without being reversed?
I think there's some facet of endian-ness here that I'm ignorant to..
You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.
You need to figure of the endianess of your machine. #include <sys/param.h> will help you do that.
You could also use the fact that network byte order is big ended (if my memory serves me correctly - you need to check). In which case convert to big ended and use the ntohs function. That should work on any machine that you compile the code on.

Problem converting endianness

I'm following this tutorial for using OpenAL in C++: http://enigma-dev.org/forums/index.php?topic=730.0
As you can see in the tutorial, they leave a few methods unimplemented, and I am having trouble implementing file_read_int32_le(char*, FILE*) and file_read_int16_le(char*, FILE*). Apparently what it should do is load 4 bytes from the file (or 2 in the case of int16 I guess..), convert it from little-endian to big endian and then return it as an unsigned integer. Here's the code:
static unsigned int file_read_int32_le(char* buffer, FILE* file) {
size_t bytesRead = fread(buffer, 1, 4, file);
printf("%x\n",(unsigned int)*buffer);
unsigned int* newBuffer = (unsigned int*)malloc(4);
*newBuffer = ((*buffer << 24) & 0xFF000000U) | ((*buffer << 8) & 0x00FF0000U) | ((*buffer >> 8) & 0x0000FF00U) | ((*buffer >> 24) & 0x000000FFU);
printf("%x\n", *newBuffer);
return (unsigned int)*newBuffer;
}
When debugging (in XCode) it says that the hexadecimal value of *buffer is 0x72, which is only one byte. When I create newBuffer using malloc(4), I get a 4-byte buffer (*newBuffer is something like 0xC0000003) which then, after the operations, becomes 0x72000000. I assume the result I'm looking for is 0x00000027 (edit: actually 0x00000072), but how would I achieve this? Is it something to do with converting between the char* buffer and the unsigned int* newBuffer?
Yes, *buffer will read in Xcode's debugger as 0x72, because buffer is a pointer to a char.
If the first four bytes in the memory block pointed to by buffer are (hex) 72 00 00 00, then the return value should be 0x00000072, not 0x00000027. The bytes should get swapped, but not the two "nybbles" that make up each byte.
This code leaks the memory you malloc'd, and you don't need to malloc here anyway.
Your byte-swapping is correct on a PowerPC or 68K Mac, but not on an Intel Mac or ARM-based iOS. On those platforms, you don't have to do any byte-swapping because they're natively little-endian.
Core Foundation provides a way to do this all much more easily:
static uint32_t file_read_int32_le(char* buffer, FILE* file) {
fread(buffer, 1, 4, file); // Get four bytes from the file
uint32_t val = *(uint32_t*)buffer; // Turn them into a 32-bit integer
// Swap on a big-endian Mac, do nothing on a little-endian Mac or iOS
return CFSwapInt32LittleToHost(val);
}
there's a whole range of functions called "htons/htonl/hton" whose sole purpose in life is to convert from "host" to "network" byte order.
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html
Each function has a reciprocal that does the opposite.
Now, these functions won't help you necessarily because they intrinsically convert from your hosts specific byte order, so please just use this answer as a starting point to find what you need. Generally code should never make assumptions about what architecture it's on.
Intel == "Little Endian".
Network == "Big Endian".
Hope this starts you out on the right track.
I've used the following for integral types. On some platforms, it's not safe for non-integral types.
template <typename T> T byte_reverse(T in) {
T out;
char* in_c = reinterpret_cast<char *>(&in);
char* out_c = reinterpret_cast<char *>(&out);
std::reverse_copy(in_c, in_c+sizeof(T), out_c);
return out;
};
So, to put that in your file reader (why are you passing the buffer in, since it appears that it could be a temporary)
static unsigned int file_read_int32_le(FILE* file) {
unsigned int int_buffer;
size_t bytesRead = fread(&int_buffer, 1, sizeof(int_buffer), file);
/* Error or less than 4 bytes should be checked */
return byte_reverse(int_buffer);
}

Bitwise operators and converting an int to 2 bytes and back again

My background is php so entering the world of low-level stuff like char is bytes, which are bits, which is binary values, etc is taking some time to get the hang of.
What I am trying to do here is sent some values from an Ardunio board to openFrameWorks (both are c++).
What this script currently does (and works well for one sensor I might add) when asked for the data to be sent is:
int value_01 = analogRead(0); // which outputs between 0-1024
unsigned char val1;
unsigned char val2;
//some Complicated bitshift operation
val1 = value_01 &0xFF;
val2 = (value_01 >> 8) &0xFF;
//send both bytes
Serial.print(val1, BYTE);
Serial.print(val2, BYTE);
Apparently this is the most reliable way of getting the data across.
So now that it is send via serial port, the bytes are added to a char string and converted back by:
int num = ( (unsigned char)bytesReadString[1] << 8 | (unsigned char)bytesReadString[0] );
So to recap, im trying to get 4 sensors worth of data (which I am assuming will be 8 of those serialprints?) and to have int num_01 - num_04... at the end of it all.
Im assuming this (as with most things) might be quite easy for someone with experience in these concepts.
Write a function to abstract sending the data (I've gotten rid of your temporary variables because they don't add much value):
void send16(int value)
{
//send both bytes
Serial.print(value & 0xFF, BYTE);
Serial.print((value >> 8) & 0xFF, BYTE);
}
Now you can easily send any data you want:
send16(analogRead(0));
send16(analogRead(1));
...
Just send them one after the other.
Note that the serial driver lets you send one byte (8 bits) at a time. A value between 0 and 1023 inclusive (which looks like what you're getting) fits in 10 bits. So 1 byte is not enough. 2 bytes, i.e. 16 bits, are enough (there is some extra space, but unless transfer speed is an issue, you don't need to worry about this wasted space).
So, the first two bytes can carry the data for your first sensor. The next two bytes carry the data for the second sensor, the next two bytes for the third sensor, and the last two bytes for the last sensor.
I suggest you use the function that R Samuel Klatchko suggested on the sending side, and hopefully you can work out what you need to do on the receiving side.
int num = ( (unsigned char)bytesReadString[1] << 8 |
(unsigned char)bytesReadString[0] );
That code will not do what you expect.
When you shift an 8-bit unsigned char, you lose the extra bits.
11111111 << 3 == 11111000
11111111 << 8 == 00000000
i.e. any unsigned char, when shifted 8 bits, must be zero.
You need something more like this:
typedef unsigned uint;
typedef unsigned char uchar;
uint num = (static_cast<uint>(static_cast<uchar>(bytesReadString[1])) << 8 ) |
static_cast<uint>(static_cast<uchar>(bytesReadString[0]));
You might get the same result from:
typedef unsigned short ushort;
uint num = *reinterpret_cast<ushort *>(bytesReadString);
If the byte ordering is OK. Should work on Little Endian (x86 or x64), but not on Big Endian (PPC, Sparc, Alpha, etc.)
To generalise the "Send" code a bit --
void SendBuff(const void *pBuff, size_t nBytes)
{
const char *p = reinterpret_cast<const char *>(pBuff);
for (size_t i=0; i<nBytes; i++)
Serial.print(p[i], BYTE);
}
template <typename T>
void Send(const T &t)
{
SendBuff(&t, sizeof(T));
}