Writing bytes in files the right way in C / C++ [Endianess] - c++

I'm writing a program that creates MIDI files, and I'm trying to write the midi messages on a file.
I tested first all the way to create file from zero using the function fputc() and inputting byte per byte all the file, and it went well.
The problem came when I tried to write more than one byte at the same time (e.g. writing a short int or an int into the file), because the function fwrite() put the bytes backwards.
For example:
FILE* midiFile;
midiFile = fopen("test.mid", "wb");
short msg = 0x0006;
fwrite(msg, sizeof(msg), 1, midiFile);
fclose(midifile);
The output written int the file its 0x06 the 0x00, and not the expected: 0x00,0x06.
I read about that, and find that it's caused by the endianness; my Intel processor uses little endian so it writes variables bigger than 1 byte backwards (compared to a big endian machine).
I still need to correct that and write the bytes the way I want to develop my program.
My compiler doesn't identify functions like htonl() or similar (I don't know why) but I'm asking a way to do it, or how to write short's and int's on char arrays (especially short's).

Either write the bytes you want in order, one at a time...
or swap the bytes before you write them.
uint8_t msbyte = msg >> 8;
uint8_t lsbyte = msg & 0xFF;
uint8_t buffer[2];
// Big Endian
buffer[0] = msbyte;
buffer[1] = lsbyte;
/* Little endian
buffer[0] = lsbyte;
buffer[1] = msbyte;
*/
fwrite(&buffer[0], 1, sizeof(buffer), midiFile);
Swapping bytes:
uint16_t swap_bytes(const uint16_t value)
{
uint16_t result;
result = value >> 8;
result += (value & 0xFF) << 8;
return result;
}

Related

how to store data from std::vector<short> in std::vector<uint8_t>

What I want to do is store the data in a std::vector<short> in a std::vector<uint8_t>, splitting each short into two uint8_t values. I need to do this because I have a network application that will only send std::vector<uint8_t>'s, so I need to convert to uint8_t to send and then convert back when I receive the uint8_t vector.
Normally what i would do (and what I saw when I looked up the problem) is:
std::vector<uint8_t> newVec(oldvec.begin(),oldvec.end());
However, if i understand correctly this will take each individual short value, truncate to the size of a uint8_t, and make a new vector of half the amount of data and the same number of entries, when what i want is the same amount of data with twice as many entries.
solutions that include a way to reverse the process and that avoid copying as much as possible would help a lot. Thanks!
to split something at the 8 bit boundary, you can use right shifts and masks, i.e.
uint16_t val;
uint8_t low = val & 0xFF;
uint8_t high = (val >> 8) & 0xFF;
now you can put your high and low into the second vector in your order.
For splitting and merging, you would have the following:
unsigned short oldShort;
uint8_t char1 = oldShort & 0xFF; // lower byte
uint8_t char2 = oldShort >> 8; // upper byte
Then push the two parts onto the vector, and send it off to your network library. On the receiving end, during re-assembly, you would read the next two bytes off of the vector and combine them back into the short.
Note: Make sure that there are an even number of elements on the received vector such that you didn't obtain corrupted/modified data during transit.
// Read off the next two characters and merge them again
unsigned short mergedShort = (char2 << 8) | char1;
I need to do this because I have a network application1 that will only send std::vector's
Besides masking and bit shifting you should take endianess into account when sending stuff over the wire.
The network representation of data is usually big endian. So you can always put the MSB first. Provide a simple function like:
std::vector<uint8_t> networkSerialize(const std::vector<uint16_t>& input) {
std::vector<uint8_t> output;
output.reserve(input.size() * sizeof(uint16_t)); // Pre-allocate for sake of
// performance
for(auto snumber : input) {
output.push_back((snumber & 0xFF00) >> 8); // Extract the MSB
output.push_back((snumber & 0xFF)); // Extract the LSB
}
return output;
}
and use it like
std::vector<uint8_t> newVec = networkSerialize(oldvec);
See live demo.
1)Emphasis mine
Disclaimer: People are talking about "network byte order". If you send something huger than 1 byte, of course you need to take network endiannes into account. However, as far as I understand the limitation "network application that will only send std::vector<uint8_t>" explicitly states that "I don't want to mess with any of that endianness stuff". uint8_t is already a one byte and if you send a sequence of bytes in an one order, you should get them back in the exactly same order. This is helpful: sending the array through a socket. There can be different system endianness on client and server machines but OP said nothing about it so that is a different story...
Regarding the answer:
Assuming all "endianness" questions are closed.
If you just want to send a vector of shorts, I believe, VTT`s answer will perform the best. However, if std::vector<short> is just a particular case, you can use pack() function from my answer to a similar question. It packs any iterable container, string, C-string and more... into a vector of bytes and does not perform any endiannes shenanigans. Just include byte_pack.h and then you can use it like this:
#include "byte_pack.h"
void cout_bytes(const std::vector<std::uint8_t>& bytes)
{
for(unsigned byte : bytes) {
std::cout << "0x" << std::setfill('0') << std::setw(2) << std::hex
<< byte << " ";
}
std::cout << std::endl;
}
int main()
{
std::vector<short> test = { (short) 0xaabb, (short) 0xccdd };
std::vector<std::uint8_t> test_result = pack(test);
cout_bytes(test_result); // -> 0xbb 0xaa 0xdd 0xcc (remember of endianness)
return 0;
}
Just copy everything in one go:
::std::vector<short> shorts;
// populate shorts...
::std::vector<uint8_t> bytes;
::std::size_t const bytes_count(shorts.size() * sizeof(short) / sizeof(uint8_t));
bytes.resize(bytes_count);
::memcpy(bytes.data(), shorts.data(), bytes_count);

How can you read different sized bit values from a file?

I'm reading a bunch of bit values from a text file which are in binary from because I stored them using fwrite. The problem is that the first value in the file is 5 bytes in size and the next 4800 values are 2 bytes in size. So when I try to cycle through the file and read the values it will give me the wrong results because my program does not know that it should take 5 bytes the first time and then 2 bytes the remaining 4800 times.
Here is how I'm cycling through the file:
long lSize;
unsigned short * buffer;
size_t result;
pFile = fOpen("dataValues.txt", "rb");
lSize = ftell(pFile);
buffer = (unsigned short *) malloc (sizeof(unsigned short)*lSize);
size_t count = lSize/sizeof(short);
for(size_t i = 0; i < count; ++i)
{
result = fread(buffer+i, sizeof(unsigned short), 1, pFile);
print("%u\n", buffer[i]);
}
I'm pretty sure I'm going to need to change my fread statement because the first value is of type time_t so I'll probably need a statement that looks like this:
result = fread(buffer+i, sizeof(time_t), 1, pFile);
However, this did not work work when I tried it and I think it's because I am not changing the starting position properly. I think that while I do read 5 bytes worth of data, I don't move the starting position enough.
Does anyone here have a good understanding of fread? Can you please let me know what I can change to make my program accomplish what I need.
EDIT:
This is how I'm writing to the file.
fwrite(&timer, sizeof(timer), 1, pFile);
fwrite(ptr, sizeof(unsigned short), rawData.size(), pFile);
EDIT2:
I tried to read the file using ifstream
int main()
{
time_t x;
ifstream infile;
infile.open("binaryValues.txt", ios::binary | ios::in);
infile.read((char *) &x, sizeof(x));
return 0;
}
However, now it doesn't compile and just give me a bunch of undefined reference to errors to code that I don't even have written.
I don't see the problem:
uint8_t five_byte_buffer[5];
uint8_t two_byte_buffer[2];
//...
ifstream my_file(/*...*/);
my_file.read(&five_byte_buffer[0], 5);
my_file.read(&two_byte_buffer[0], 2);
So, what is your specific issue?
Edit 1: Reading in a loop
while (my_file.read(&five_byte_buffer[0], 5))
{
my_file.read(&two_byte_buffer[0], 5);
Process_Data();
}
You can't. Streams are byte, almost always octet (8 bit byte) oriented.
You can easily enough build a bit-oriented stream on top of that. You just keep a few bytes in a buffer and keep track of which bit is current. Watch out for getting the last few bits, and attempts to mix byte access with bit access.
Untested but this is the general idea.
struct bitstream
{
unsigned long long rack; // 64 bits rack
FILE *fp; // file opened for reading
int rackpos; // 0 - 63, poisition of bits read.
}
int getbits(struct bitstream *bs, int Nbits)
{
unsigned long long mask = 0x8000 0000 0000 0000;
int answer = 0;
while(bs->rackpos > 8)
{
bs->rack <<= 8;
bs->rack |= fgetc(bs->fp);
bs->rackpos -= 8;
}
mask >>= bs->rackpos;
for(i=0;i<Nbits;i++)
{
answer <<= 1;
answer |= bs->rack & mask;
mask >>= 1;
}
bs->rackpos += Nbits;
return answer;
}
You need to decide how you know when the stream is terminated. As is you'll corrupt the last few bits with the EOF read by fgetc().

Problem converting endianness

I'm following this tutorial for using OpenAL in C++: http://enigma-dev.org/forums/index.php?topic=730.0
As you can see in the tutorial, they leave a few methods unimplemented, and I am having trouble implementing file_read_int32_le(char*, FILE*) and file_read_int16_le(char*, FILE*). Apparently what it should do is load 4 bytes from the file (or 2 in the case of int16 I guess..), convert it from little-endian to big endian and then return it as an unsigned integer. Here's the code:
static unsigned int file_read_int32_le(char* buffer, FILE* file) {
size_t bytesRead = fread(buffer, 1, 4, file);
printf("%x\n",(unsigned int)*buffer);
unsigned int* newBuffer = (unsigned int*)malloc(4);
*newBuffer = ((*buffer << 24) & 0xFF000000U) | ((*buffer << 8) & 0x00FF0000U) | ((*buffer >> 8) & 0x0000FF00U) | ((*buffer >> 24) & 0x000000FFU);
printf("%x\n", *newBuffer);
return (unsigned int)*newBuffer;
}
When debugging (in XCode) it says that the hexadecimal value of *buffer is 0x72, which is only one byte. When I create newBuffer using malloc(4), I get a 4-byte buffer (*newBuffer is something like 0xC0000003) which then, after the operations, becomes 0x72000000. I assume the result I'm looking for is 0x00000027 (edit: actually 0x00000072), but how would I achieve this? Is it something to do with converting between the char* buffer and the unsigned int* newBuffer?
Yes, *buffer will read in Xcode's debugger as 0x72, because buffer is a pointer to a char.
If the first four bytes in the memory block pointed to by buffer are (hex) 72 00 00 00, then the return value should be 0x00000072, not 0x00000027. The bytes should get swapped, but not the two "nybbles" that make up each byte.
This code leaks the memory you malloc'd, and you don't need to malloc here anyway.
Your byte-swapping is correct on a PowerPC or 68K Mac, but not on an Intel Mac or ARM-based iOS. On those platforms, you don't have to do any byte-swapping because they're natively little-endian.
Core Foundation provides a way to do this all much more easily:
static uint32_t file_read_int32_le(char* buffer, FILE* file) {
fread(buffer, 1, 4, file); // Get four bytes from the file
uint32_t val = *(uint32_t*)buffer; // Turn them into a 32-bit integer
// Swap on a big-endian Mac, do nothing on a little-endian Mac or iOS
return CFSwapInt32LittleToHost(val);
}
there's a whole range of functions called "htons/htonl/hton" whose sole purpose in life is to convert from "host" to "network" byte order.
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html
Each function has a reciprocal that does the opposite.
Now, these functions won't help you necessarily because they intrinsically convert from your hosts specific byte order, so please just use this answer as a starting point to find what you need. Generally code should never make assumptions about what architecture it's on.
Intel == "Little Endian".
Network == "Big Endian".
Hope this starts you out on the right track.
I've used the following for integral types. On some platforms, it's not safe for non-integral types.
template <typename T> T byte_reverse(T in) {
T out;
char* in_c = reinterpret_cast<char *>(&in);
char* out_c = reinterpret_cast<char *>(&out);
std::reverse_copy(in_c, in_c+sizeof(T), out_c);
return out;
};
So, to put that in your file reader (why are you passing the buffer in, since it appears that it could be a temporary)
static unsigned int file_read_int32_le(FILE* file) {
unsigned int int_buffer;
size_t bytesRead = fread(&int_buffer, 1, sizeof(int_buffer), file);
/* Error or less than 4 bytes should be checked */
return byte_reverse(int_buffer);
}

Bitwise operators and converting an int to 2 bytes and back again

My background is php so entering the world of low-level stuff like char is bytes, which are bits, which is binary values, etc is taking some time to get the hang of.
What I am trying to do here is sent some values from an Ardunio board to openFrameWorks (both are c++).
What this script currently does (and works well for one sensor I might add) when asked for the data to be sent is:
int value_01 = analogRead(0); // which outputs between 0-1024
unsigned char val1;
unsigned char val2;
//some Complicated bitshift operation
val1 = value_01 &0xFF;
val2 = (value_01 >> 8) &0xFF;
//send both bytes
Serial.print(val1, BYTE);
Serial.print(val2, BYTE);
Apparently this is the most reliable way of getting the data across.
So now that it is send via serial port, the bytes are added to a char string and converted back by:
int num = ( (unsigned char)bytesReadString[1] << 8 | (unsigned char)bytesReadString[0] );
So to recap, im trying to get 4 sensors worth of data (which I am assuming will be 8 of those serialprints?) and to have int num_01 - num_04... at the end of it all.
Im assuming this (as with most things) might be quite easy for someone with experience in these concepts.
Write a function to abstract sending the data (I've gotten rid of your temporary variables because they don't add much value):
void send16(int value)
{
//send both bytes
Serial.print(value & 0xFF, BYTE);
Serial.print((value >> 8) & 0xFF, BYTE);
}
Now you can easily send any data you want:
send16(analogRead(0));
send16(analogRead(1));
...
Just send them one after the other.
Note that the serial driver lets you send one byte (8 bits) at a time. A value between 0 and 1023 inclusive (which looks like what you're getting) fits in 10 bits. So 1 byte is not enough. 2 bytes, i.e. 16 bits, are enough (there is some extra space, but unless transfer speed is an issue, you don't need to worry about this wasted space).
So, the first two bytes can carry the data for your first sensor. The next two bytes carry the data for the second sensor, the next two bytes for the third sensor, and the last two bytes for the last sensor.
I suggest you use the function that R Samuel Klatchko suggested on the sending side, and hopefully you can work out what you need to do on the receiving side.
int num = ( (unsigned char)bytesReadString[1] << 8 |
(unsigned char)bytesReadString[0] );
That code will not do what you expect.
When you shift an 8-bit unsigned char, you lose the extra bits.
11111111 << 3 == 11111000
11111111 << 8 == 00000000
i.e. any unsigned char, when shifted 8 bits, must be zero.
You need something more like this:
typedef unsigned uint;
typedef unsigned char uchar;
uint num = (static_cast<uint>(static_cast<uchar>(bytesReadString[1])) << 8 ) |
static_cast<uint>(static_cast<uchar>(bytesReadString[0]));
You might get the same result from:
typedef unsigned short ushort;
uint num = *reinterpret_cast<ushort *>(bytesReadString);
If the byte ordering is OK. Should work on Little Endian (x86 or x64), but not on Big Endian (PPC, Sparc, Alpha, etc.)
To generalise the "Send" code a bit --
void SendBuff(const void *pBuff, size_t nBytes)
{
const char *p = reinterpret_cast<const char *>(pBuff);
for (size_t i=0; i<nBytes; i++)
Serial.print(p[i], BYTE);
}
template <typename T>
void Send(const T &t)
{
SendBuff(&t, sizeof(T));
}

C/C++ read a byte from an hexinput from stdin

Can't exactly find a way on how to do the following in C/C++.
Input : hexdecimal values, for example: ffffffffff...
I've tried the following code in order to read the input :
uint16_t twoBytes;
scanf("%x",&twoBytes);
Thats works fine and all, but how do I split the 2bytes in 1bytes uint8_t values (or maybe even read the first byte only). Would like to read the first byte from the input, and store it in a byte matrix in a position of choosing.
uint8_t matrix[50][50]
Since I'm not very skilled in formating / reading from input in C/C++ (and have only used scanf so far) any other ideas on how to do this easily (and fast if it goes) is greatly appreciated .
Edit: Found even a better method by using the fread function as it lets one specify how many bytes it should read from the stream (stdin in this case) and save to a variable/array.
size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );
Parameters
ptr - Pointer to a block of memory with a minimum size of (size*count) bytes.
size - Size in bytes of each element to be read.
count - Number of elements, each one with a size of size bytes.
stream - Pointer to a FILE object that specifies an input stream.
cplusplus ref
%x reads an unsigned int, not a uint16_t (thought they may be the same on your particular platform).
To read only one byte, try this:
uint32_t byteTmp;
scanf("%2x", &byteTmp);
uint8_t byte = byteTmp;
This reads an unsigned int, but stops after reading two characters (two hex characters equals eight bits, or one byte).
You should be able to split the variable like this:
uint8_t LowerByte=twoBytes & 256;
uint8_t HigherByte=twoBytes >> 8;
A couple of thoughts:
1) read it as characters and convert it manually - painful
2) If you know that there are a multiple of 4 hexits, you can just read in twobytes and then convert to one-byte values with high = twobytes << 8; low = twobyets & FF;
3) %2x