I am trying to implement a four byte value (most significant data first) to compute the total length of data. I found a code snippet to compute this but I didn't get a 4 byte data in the output. Instead I only got a 2 byte value.
char bytesLen[4] ;
unsigned int blockSize = 535;
bytesLen[0] = (blockSize & 0xFF);
bytesLen[1] = (blockSize >> 8) & 0xFF;
bytesLen[2] = (blockSize >> 16) & 0xFF;
bytesLen[3] = (blockSize >> 24) & 0xFF;
std::cout << "bytesLen: " << bytesLen << '\n';
Did I missed something in my code?
No, you didn't. You're outputting the array as a C string, which is null terminated. The third byte is nul so only two characters will be shown.
This is not a rational way to output binary values.
Also you're saving least significant byte first, not most significant. For most significant you have to reverse the order of the bytes.
This shows how to do the same thing without shift operators and bitmasks.
#include <iostream>
#include <iomanip>
// C++11
#include <cstdint>
int main(void)
{
// with union, the memory allocation is shared
union {
uint8_t bytes[4];
uint32_t n;
} length;
// see htonl if needs to be in network byte order
// or ntohl if from network byte order to host
length.n = 535;
std::cout << std::hex;
for(int i=0; i<4; i++) {
std::cout << (unsigned int)length.bytes[i] << " ";
}
std::cout << std::endl;
return 0;
}
If you want ms byte first, then you've reversed the order of the bytes.
You get incorrect output because you treat everything as a C string even though it is not. Get rid of the char type and fix the printing.
In C++, it would be like this:
#include <iostream>
#include <cstdint>
int main()
{
uint8_t bytesLen[sizeof(uint32_t)];
uint32_t blockSize = 535;
bytesLen[3] = (blockSize >> 0) & 0xFF;
bytesLen[2] = (blockSize >> 8) & 0xFF;
bytesLen[1] = (blockSize >> 16) & 0xFF;
bytesLen[0] = (blockSize >> 24) & 0xFF;
bool removeZeroes = true;
std::cout << "bytesLen: 0x";
for(size_t i=0; i<sizeof(bytesLen); i++)
{
if(bytesLen[i] != 0)
{
removeZeroes = false;
}
if(!removeZeroes)
{
std::cout << std::hex << (int)bytesLen[i];
}
}
std::cout << std::endl;
return 0;
}
Here's the fixed code [untested]. Note this won't compile as is. You'll need to reorder it slightly, but it should help:
unsigned char bytesLen[4] ;
unsigned int blockSize = 535;
// little endian
#if 0
bytesLen[0] = (blockSize & 0xFF);
bytesLen[1] = (blockSize >> 8) & 0xFF;
bytesLen[2] = (blockSize >> 16) & 0xFF;
bytesLen[3] = (blockSize >> 24) & 0xFF;
// big endian
#else
bytesLen[3] = (blockSize & 0xFF);
bytesLen[2] = (blockSize >> 8) & 0xFF;
bytesLen[1] = (blockSize >> 16) & 0xFF;
bytesLen[0] = (blockSize >> 24) & 0xFF;
#endif
char tmp[9];
char *
pretty_print(char *dst,unsigned char *src)
{
char *hex = "0123456789ABCDEF";
char *bp = dst;
int chr;
for (int idx = 0; idx <= 3; ++idx) {
chr = src[idx];
*bp++ = hex[(chr >> 4) & 0x0F];
*bp++ = hex[(chr >> 0) & 0x0F];
}
*bp = 0;
return dst;
}
std::cout << "bytesLen: " << pretty_print(tmp,bytesLen) << '\n';
UPDATE:
Based upon your followup question, to concatenate binary data, we can not use string-like functions such as sprintf [because the binary data may have 0x00 inside, which would stop the string transfer short]. Also, if the binary data had no 0x00 in it, the string functions would run beyond the end of the array(s) looking for it, and bad things would happen. The string functions also assume signed char data and when dealing with raw binary, we want to use unsigned char.
Here's something to try:
unsigned char finalData[1000]; // size is just example
unsigned char bytesLen[4];
unsigned char blockContent[300];
unsigned char *dst;
dst = finalData;
memcpy(dst,bytesLen,sizeof(bytesLen));
dst += sizeof(bytesLen);
memcpy(dst,blockContent,sizeof(blockContent));
dst += sizeof(blockContent);
// append more if needed in similar way ...
Note: The above presupposes that blockContent is of fixed size. If it were to have a variable number of bytes, we'd need to replace sizeof(blockContent) with (e.g.) bclen where that is the number of bytes in blockContent
Related
Toy program to split an integer into 4 bytes and later combine these bytes to get back the input value results into error. However the program works for positive integers. I am interested in signed integers. Need help.
Expected Output: -12345
Actual Output: -57
int main()
{
int j,i = -12345;
char b[4];
b[0] = (i >> 24) & 0xFF;
b[1] = (i >> 16) & 0xFF;
b[2] = (i >> 8) & 0xFF;
b[3] = (i >> 0) & 0xFF;
j = (int)((b[0] << 24) | (b[1] << 16) | (b[2] << 8) | (b[3] << 0));
std::cout << j;
return 0;
}
There are actually two problems that leads to your "error".
The first is that the result of e.g. b[0] << 24 will be an int. When you cast that to a char (and assuming that char is an 8-bit type) then you cut off the top 24 bits of the value, truncating it.
The second problem is that char could be unsigned (it's implementation-defined if char is signed or unsigned). If char is unsigned then the value -1 (0xffffffff) will become 255 (0x000000ff).
When you then bring all that together it will almost certainly result in wrong values.
In general, whenever you feel the need to do a C-style cast (like in (char)(b[0] << 24)) when programming in C++, you should take that as a sign that you're doing something wrong.
One possible way to solve your problem, always work with explicit unsigned data-types.
First you need to copy the original int value to an unsigned int:
unsigned ui;
memcpy(&ui, &i, sizeof ui);
Then use ui instead of i when doing the "split". And explicitly use unsigned char:
unsigned char b[sizeof(unsigned)] = { 0 };
b[0] = (ui >> 24) & 0xFF;
b[1] = (ui >> 16) & 0xFF;
b[2] = (ui >> 8) & 0xFF;
b[3] = (ui >> 0) & 0xFF;
Then to put it all back, again use an explicit unsigned type, and copy it to the resulting variable:
unsigned uj = (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | (b[3] << 0);
memcpy(&j, &uj, sizeof j);
I suggest using unsigned data types here to avoid possible problems that can come from sign-extension during conversion.
Your code works only for possessive numbers! "i" is negative and by shifting it to to right b[0] becomes positive! and finally desensitization results error!
try
int main()
{
int j, i = -12345;
const char* bytes = reinterpret_cast<const char*>(&i);
j = *reinterpret_cast<const int*>(bytes);
std::cout << j;
return 0;
}
I have a vector which holds byte data (chars) received from a socket. This data holds different datatypes i want to extract. E.g. the first 8 elements (8 Bytes) of the vector are an uint64_t. Now I want to convert these first 8 Bytes to a single uint64.
A workaround I've found is:
// recv_buffer is the vector containing the received Bytes
std::vector<uint64_t> frame_number(recv_buffer.begin(), recv_buffer.begin() + sizeof(uint64_t));
uint64_t frame_num = frame.number.at(0);
Is there a way to extract the data without creating a new vector?
This is an effective method:
C/C++:
uint64_t hexToUint64(char *data, int32_t offset){
uint64_t num = 0;
for (int32_t i = offset; i < offset + 8; i++) {
num = (num << 8) + (data[i] & 0xFF);
}
return num;
}
Java:
long hexToUint64(byte[] data, int offset){
return
((long)data[offset++] << 56 & 0xFF00000000000000L) |
((long)data[offset++] << 48 & 0xFF000000000000L) |
((long)data[offset++] << 40 & 0xFF0000000000L) |
((long)data[offset++] << 32 & 0xFF00000000L) |
((long)data[offset++] << 24 & 0xFF000000L) |
((long)data[offset++] << 16 & 0xFF0000L) |
((long)data[offset++] << 8 & 0xFF00L) |
((long)data[offset++] & 0xFFL);
}
JavaScript:
function hexToUint64(data, offset) {
let num = 0;
let multiple = 0x100000000000000;
for (let i = offset; i < offset + 8; i++ , multiple /= 0x100) {
num += (data[i] & 0xFF) * multiple;
}
return num;
}
One normally uses memcpy or similar to a properly aligned structure, and then ntohl to convert a number from network byte order to computer byte order. ntohl is not part of the C++ specification, but exists in Linux and Windows and others regardless.
uint64_t frame_num;
std::copy(recv_buffer.begin(), recv_buffer.begin() + sizeof(uint64_t), static_cast<char*>(&fame_num);
//or memcpy(&frame_num, recv_buffer.data(), sizeof(frame_num));
frame_num = ntohl(ntohl);
It is tempting to do this for a struct that represents an entire network header, but since C++ compilers can inject padding bytes into structs, and it's undefined to write to the padding, it's better to do this one primitive at a time.
You could perform the conversion byte by byte like this:
int main()
{
unsigned char bytesArray[8];
bytesArray[0] = 0x05;
bytesArray[1] = 0x00;
bytesArray[2] = 0x00;
bytesArray[3] = 0x00;
bytesArray[4] = 0x00;
bytesArray[5] = 0x00;
bytesArray[6] = 0x00;
bytesArray[7] = 0x00;
uint64_t intVal = 0;
intVal = (intVal << 8) + bytesArray[7];
intVal = (intVal << 8) + bytesArray[6];
intVal = (intVal << 8) + bytesArray[5];
intVal = (intVal << 8) + bytesArray[4];
intVal = (intVal << 8) + bytesArray[3];
intVal = (intVal << 8) + bytesArray[2];
intVal = (intVal << 8) + bytesArray[1];
intVal = (intVal << 8) + bytesArray[0];
cout<<intVal;
return 0;
}
I suggest doing the following:
uint64_t frame_num = *((uint64_t*)recv_buffer.data());
You should of course first verify that the amount of data you have in recv_buffer is at least sizeof(frame_num) bytes.
I am trying to unpack mp3 frames using bitfields.
The header of mp3 frames starts with the syncword 0xFFF followed by 20 bits of header data. The structure of the header is represented as follows:
struct Mp3FrameRaw {
unsigned short fff:12; // Should always be 0xFFF = 4095
unsigned short mpeg_standard : 1;
unsigned short layer : 2;
unsigned short error_protection : 1;
unsigned short bitrate : 4;
unsigned short frequency : 2;
unsigned short pad_bit : 1;
unsigned short : 1;
unsigned short mode :2;
unsigned short mode_extension :2;
unsigned short copyrighted : 1;
unsigned short original: 1;
unsigned short emphasis: 2;
};
In total the header is 32 bit long.
My program first finds the syncword:
size_t find_sync_word(std::vector<unsigned char> & input) {
bool previous_was_ff = false;
for (size_t offset = 0; offset < input.size(); ++offset) {
if (previous_was_ff && (input[offset] & 0xF0 == 0xF0))
return offset - 1;
previous_was_ff = 0xFF == input[offset];
}
return -1;
}
And then tries to unpack the first header:
int parse(std::vector<unsigned char> & input) {
size_t offset = find_sync_word(input);
if (offset < 0) {
std::cerr << "Not a valid Mp3 file" << std::endl;
return -1;
}
Mp3FrameRaw *frame_ptr = reinterpret_cast<Mp3FrameRaw * >(input.data() + offset);
std::cout << frame_ptr->fff << " (Should always be 4095)" << std::endl;
std::cout << frame_ptr->layer << " (Should be 1 )" << std::endl;
std::cout << frame_ptr->bitrate << " (Should be 1-14)" << std::endl;
return 0;
}
The main.cpp reads:
int main() {
std::ifstream mp3_file("/path/to/file.mp3", std::ios::binary);
std::vector<unsigned char> file_contents((std::istreambuf_iterator<char>(mp3_file)),
std::istreambuf_iterator<char>());
return parse(file_contents);
}
The result reads:
3071 (Should always be 4095)
3 (Should be 1 )
0 (Should be 1 - 14)
Contrary, if I unpack the fields manually bit by bit, everything works as expected. e.g
{
size_t offset;
Mp3FrameRaw frame;
...
frame.fff = input[offset++];
frame.fff = (frame.fff << 4) | (input[offset] >> 4);
frame.mpeg_standard = (input[offset] >> 3) & 1;
frame.layer = (input[offset] >> 1) & 0x3;
frame.error_protection = (input[offset++]) & 0x1;
frame.bitrate = input[offset] >> 4;
...
}
I assume that the bitfields are not located in a way they intuitively should do. What am I doing wrong?
I am using gcc on Ubuntu 18.04.
In my project I'm using huge set of short strings in ASCII 7-bit and have to process (store, compare, search etc) these strings with maximum performance.
Basically, I build some Index array of uint64_t type and each element stores 9 characters of a word and use that index as Numeric element for any string comparison operation.
Current implementation works fast, but may be it's possible to improve it a bit if you will..
This function converts up to 9 initial characters to uint64_t value - any comparison of that number is equivalent of standard "strcmp" function.
#include <cstdint>
#include <iostream>
uint64_t cnv(const char* str, size_t len)
{
uint64_t res = 0;
switch (len)
{
default:
case 9: res = str[8];
case 8: res |= uint64_t(str[7]) << 7;
case 7: res |= uint64_t(str[6]) << 14;
case 6: res |= uint64_t(str[5]) << 21;
case 5: res |= uint64_t(str[4]) << 28;
case 4: res |= uint64_t(str[3]) << 35;
case 3: res |= uint64_t(str[2]) << 42;
case 2: res |= uint64_t(str[1]) << 49;
case 1: res |= uint64_t(str[0]) << 56;
case 0: break;
}
return res;
}
int main()
{
uint64_t v0 = cnv("000", 3);
uint64_t v1 = cnv("0000000", 7);
std::cout << (v1 < v0);
}
You may load 8 bytes of an original string at once than condense them inside a resulting integer (and reverse them if your machine has a little-endian number representation).
#include <iostream>
uint64_t ascii2ulong (const char *s, int len)
{
uint64_t i = (*(uint64_t*)s);
if (len < 8) i &= ((1UL << (len<<3))-1);
#ifndef BIG_ENDIAN
i = (i&0x007f007f007f007fUL) | ((i & 0x7f007f007f007f00) >> 1);
i = (i&0x00003fff00003fffUL) | ((i & 0x3fff00003fff0000) >> 2);
i = ((i&0x000000000fffffffUL) << 7) | ((i & 0x0fffffff00000000) << (7-4));
// Note: Previous line: an additional left shift of 7 is applied
// to make room for s[8] character
#else
i = ((i&0x007f007f007f007fUL) << 7) | ((i & 0x7f007f007f007f00) >> 8);
i = ((i&0x00003fff00003fffUL) << 14) | ((i & 0x3fff00003fff0000) >> 16);
i = ((i&0x000000000fffffffUL) << (28+7)) | ((i & 0x0fffffff00000000) >> (32-7));
#endif
if (len > 8) i |= ((uint64_t)s[8]);
return i;
}
//Test
std::string ulong2str(uint64_t compressed) {
std::string s;
for (int i = 56; i >= 0; i-=7)
if (char nxt=(compressed>>i)&0x7f) s+= nxt;
return s;
}
int main() {
std::cout << ulong2str(ascii2ulong("ABCDEFGHI", 9))<<std::endl;
std::cout << ulong2str(ascii2ulong("ABCDE", 5))<<std::endl;
std::cout << (ascii2ulong("AB", 2) < ascii2ulong("B", 1))<<std::endl;
std::cout << (ascii2ulong("AB", 2) < ascii2ulong("A", 1))<<std::endl;
return 0;
}
But note: doing in such a way you formally violate allocated address ranges (if your original string has < 8 bytes allocated). If you run a program with memory sanity checking, it may produce a runtime error. To avoid this you may of course use memcpy to copy as many bytes as you need in place of uint64_t i = (*(uint64_t*)s);:
uint64_t i;
memcpy(&i,s,std::min(len,8));
If some hardware acceleration is used for memcpy at you machine (which is likely) it may be not bad in terms of efficiency.
I am trying to write some processor independent code to write some files in big endian. I have a sample of code below and I can't understand why it doesn't work. All it is supposed to do is let byte store each byte of data one by one in big endian order. In my actual program I would then write the individual byte out to a file, so I get the same byte order in the file regardless of processor architecture.
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
long bitmask = (0xFF << (sizeof(long) - 1) * 8);
char byte = 0;
for(long i = 0; i < sizeof(long); i++) {
byte = data & bitmask;
data <<= 8;
}
return 0;
}
For some reason byte always has the value of 0. This confuses me, I am looking at the debugger and see this:
data = 00010010001101000101011001111000
bitmask = 11111111000000000000000000000000
I would think that data & mask would give 00010010, but it just makes byte 00000000 every time! How can his be? I have written some code for the little endian order and this works great, see below:
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
long bitmask = 0xFF;
char byte = 0;
for(long i = 0; i < sizeof(long); i++) {
byte = data & bitmask;
data >>= 8;
}
return 0;
}
Why does the little endian one work and the big endian not? Thanks for any help :-)
You should use the standard functions ntohl() and kin for this. They operate on explicit sized variables (i.e. uint16_t and uin32_t) rather than compiler-specific long, which necessary for portability.
Some platforms provide 64-bit versions in <endian.h>
In your example, data is 0x12345678.
Your first assignment to byte is therefore:
byte = 0x12000000;
which won't fit in a byte, so it gets truncated to zero.
try:
byte = (data & bitmask) >> (sizeof(long) - 1) * 8);
You're getting the shifting all wrong.
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
int shift = (sizeof(long) - 1) * 8
const unsigned long mask = 0xff;
char byte = 0;
for (long i = 0; i < sizeof(long); i++, shift -= 8) {
byte = (data & (mask << shift)) >> shift;
}
return 0;
}
Now, I wouldn't recommend you do things this way. I would recommend instead writing some nice conversion functions. Many compilers have these as builtins. So you can write your functions to do it the hard way, then switch them to just forward to the compiler builtin when you figure out what it is.
#include <tr1/cstdint> // To get uint16_t, uint32_t and so on.
inline uint16_t to_bigendian(uint16_t val, char bytes[2])
{
bytes[0] = (val >> 8) & 0xffu;
bytes[1] = val & 0xffu;
}
inline uint32_t to_bigendian(uint32_t val, char bytes[4])
{
bytes[0] = (val >> 24) & 0xffu;
bytes[1] = (val >> 16) & 0xffu;
bytes[2] = (val >> 8) & 0xffu;
bytes[3] = val & 0xffu;
}
This code is simpler and easier to understand than your loop. It's also faster. And lastly, it is recognized by some compilers and automatically turned into the single byte swap operation that would be required on most CPUs.
because you are masking off the top byte from an integer and then not shifting it back down 24 bits ...
Change your loop to:
for(long i = 0; i < sizeof(long); i++) {
byte = (data & bitmask) >> 24;
data <<= 8;
}