I need to read binary data which contain a column of numbers (time tags) and use 8bytes to record each number. I know that they are recorded in little endian order. If read correctly they should be decoded as (example)
...
2147426467
2147426635
2147512936
...
I recognize that the above numbers are on the 2^31 -1 threshold.
I try to read the data and invert the endiandness with:
(length is the total number of bytes and buffer is pointer to an array that contains the bytes)
unsigned long int tag;
//uint64_t tag;
for (int j=0; j<length; j=j+8) //read the whole file in 8-byte blocks
{ tag = 0;
for (int i=0; i<=7; i++) //read each block ,byte by byte
{tag ^= ((unsigned char)buffer[j+i])<<8*i ;} //shift each byte to invert endiandness and add them with ^=
}
}
when run, the code gives:
...
2147426467
2147426635
18446744071562097256
similar big numbers
...
The last number is not (2^64 - 1 - correct value).
Same result using uint64_t tag.
The code succeeds with declaring tag as
unsigned int tag;
but fails for tags greater than 2^32 -1. At least this makes sense.
I suppose I need some kind of casting on buffer[i+j] but I don't know how to do it.
(static_cast<uint64_t>(buffer[j+i]))
also doesn't work.
I read a similar question but still need some help.
We assume that buffer[j+i] is a char, and that chars are signed on your platform. Casting to unsigned char converts buffer[j+i] into an unsigned type. However, when applying the << operator, the unsigned char value gets promoted to int so long as an int can hold all values representable by unsigned char.
Your attempt to cast buffer[j+i] directly to uint64_t fails because if char is signed, the sign extension is still applied before the value is converted to the unsigned type.
A double cast may work (that is, cast to unsigned char and then to unsigned long), but using an unsigned long variable to hold the intermediate value should make the intention of the code more clear. For me, the code would look like:
decltype(tag) val = static_cast<unsigned char>(buffer[j+i]);
tag ^= val << 8*i;
You use a temporary value.
The computer will automatically reserve the least amount needed to store a temporary value. In your case that would be the 32 bits.
Once you shift the byte further than 32 bits it will be shifted into oblivion.
In order to fix this you need to explicitly store the value in a 64 bit integer first.
So instead of
{tag ^= ((unsigned char)buffer[j+i])<<8*i ;}
you should use something like this
{
unsigned long long tmp = (unsigned char)buffer[j+i];
tmp <<= 8*i;
tag ^= tmp;
}
Related
I know that an unsigned character's size is 8 bits, and the size of an integer is 32 bits.
But I want to know if I perform an operation between two integers below 255, is it safe to say it is as fast as performing the same operation on two unsigned characters of the same value of that integers?
Example:
int Int2 = 0x10;
int Int1 = 0xff;
unsigned char Char0 = 0x10;
unsigned char Char1 = 0xff;
Int1 + Int2 ; // Is calculating this
Char0 + Char1; // Faster than this??
Update:
Let's put this in the context as someone suggested
for (unsigned char c=0;c!=256;c++){ // does this loop
std::cout<<c; // dont mind this line it can be any statement
}
for (int i=0;i!=256;i++){ // perform faster than this one??
std::cout<<i;// this too
}
I know that an unsigned character's size is 8 bits
This is not necessarily always the case in C++. But it may be true in particular implementation of C++.
and the size of an integer is 32 bits.
There are several integer types in C++. In fact, character types are integer types as well.
Int1 + Int2 ; // Is calculating this
Char0 + Char1; // Faster than this??
Integers of lower rank than int are promoted to int (or unsigned int in rare cases) when used as operand of most binary operators. Both operators in the example operate on int after the promotion. You don't use the result at all, so there's no need for the compiler to produce any code, so they should be equally fast in this trivial example.
Whether one piece of code is faster than the other depends on many factors. It's not possible to accurately guess which way it would go without context.
I originally had 2 WORD (that's 4 bytes). I have stored them in an unsigned int. How can I split this such that I have 2 (left-most) bytes in one unsigned short variable and the other 2 bytes in another unsigned short variable?
I hope my question is clear, otherwise please tell me and I will add more details! :)
Example: I have this hexadecimal stored in unsigned int: 4f07aabb
How can I turn this into two unsigned shorts so one of them holds 4f07 and the other holds aabb?
If you are sure that unsigned int has at least 4 bytes on your target system (this is not guaranteed!), you can do:
unsigned short one = static_cast<unsigned short>(original >> (2 * 8));
unsigned short two = static_cast<unsigned short>(original % (1 << (2 * 8)));
This is only guaranteed to work if the original value indeed only contains a 4-byte value (possibly with padding zeroes in front). If you're not fond of bitshifting, you could also do
uint32_t original = 0x4f07aabb; // guarantee 32 bits
uint16_t parts[2];
std::memcpy(&parts[0], &original, sizeof(uint32_t));
unsigned short one = static_cast<unsigned short>(parts[0]);
unsigned short two = static_cast<unsigned short>(parts[1]);
This will yield the two values depending on the target system's endianness; on a litte-endian architecture, the results are reversed. You can check endianness with upcoming C++20's std::endian::native.
I am dealing with very large list of booleans in C++, around 2^N items of N booleans each. Because memory is critical in such situation, i.e. an exponential growth, I would like to build a N-bits long variable to store each element.
For small N, for example 24, I am just using unsigned long int. It takes 64MB ((2^24)*32/8/1024/1024). But I need to go up to 36. The only option with build-in variable is unsigned long long int, but it takes 512GB ((2^36)*64/8/1024/1024/1024), which is a bit too much.
With a 36-bits variable, it would work for me because the size drops to 288GB ((2^36)*36/8/1024/1024/1024), which fits on a node of my supercomputer.
I tried std::bitset, but std::bitset< N > creates a element of at least 8B.
So a list of std::bitset< 1 > is much greater than a list of unsigned long int.
It is because the std::bitset just change the representation, not the container.
I also tried boost::dynamic_bitset<> from Boost, but the result is even worst (at least 32B!), for the same reason.
I know an option is to write all elements as one chain of booleans, 2473901162496 (2^36*36), then to store then in 38654705664 (2473901162496/64) unsigned long long int, which gives 288GB (38654705664*64/8/1024/1024/1024). Then to access an element is just a game of finding in which elements the 36 bits are stored (can be either one or two). But it is a lot of rewriting of the existing code (3000 lines) because mapping becomes impossible and because adding and deleting items during the execution in some functions will be surely complicated, confusing, challenging, and the result will be most likely not efficient.
How to build a N-bits variable in C++?
How about a struct with 5 chars (and perhaps some fancy operator overloading as needed to keep it compatible to the existing code)? A struct with a long and a char probably won't work because of padding / alignment...
Basically your own mini BitSet optimized for size:
struct Bitset40 {
unsigned char data[5];
bool getBit(int index) {
return (data[index / 8] & (1 << (index % 8))) != 0;
}
bool setBit(int index, bool newVal) {
if (newVal) {
data[index / 8] |= (1 << (index % 8));
} else {
data[index / 8] &= ~(1 << (index % 8));
}
}
};
Edit: As geza has also pointed out int he comments, the "trick" here is to get as close as possible to the minimum number of bytes needed (without wasting memory by triggering alignment losses, padding or pointer indirection, see http://www.catb.org/esr/structure-packing/).
Edit 2: If you feel adventurous, you could also try a bit field (and please let us know how much space it actually consumes):
struct Bitset36 {
unsigned long long data:36;
}
I'm not an expert, but this is what I would "try". Find the bytes for the smallest type your compiler supports (should be char). You can check with sizeof and you should get 1. That means 1 byte, so 8 bits.
So if you wanted a 24 bit type...you would need 3 chars. For 36 you would need 5 char array and you would have 4 bits of wasted padding on the end. This could easily be accounted for.
i.e.
char typeSize[3] = {0}; // should hold 24 bits
Now make a bit mask to access each position of typeSize.
const unsigned char one = 0b0000'0001;
const unsigned char two = 0b0000'0010;
const unsigned char three = 0b0000'0100;
const unsigned char four = 0b0000'1000;
const unsigned char five = 0b0001'0000;
const unsigned char six = 0b0010'0000;
const unsigned char seven = 0b0100'0000;
const unsigned char eight = 0b1000'0000;
Now you can use the bit-wise or to set the values to 1 where needed..
typeSize[1] |= four;
*typeSize[0] |= (four | five);
To turn off bits use the & operator..
typeSize[0] &= ~four;
typeSize[2] &= ~(four| five);
You can read the position of each bit with the & operator.
typeSize[0] & four
Bear in mind, I don't have a compiler handy to try this out so hopefully this is a useful approach to your problem.
Good luck ;-)
You can use array of unsigned long int and store and retrieve needed bit chains with bitwise operations. This approach excludes space overhead.
Simplified example for unsigned byte array B[] and 12-bit variables V (represented as ushort):
Set V[0]:
B[0] = V & 0xFF; //low byte
B[1] = B[1] & 0xF0; // clear low nibble
B[1] = B[1] | (V >> 8); //fill low nibble of the second byte with the highest nibble of V
on my Arduino, the following code produces output I don't understand:
void setup(){
Serial.begin(9600);
int a = 250;
Serial.println(a, BIN);
a = a << 8;
Serial.println(a, BIN);
a = a >> 8;
Serial.println(a, BIN);
}
void loop(){}
The output is:
11111010
11111111111111111111101000000000
11111111111111111111111111111010
I do understand the first line: leading zeros are not printed to the serial terminal. However, after shifting the bits the data type of a seems to have changed from int to long (32 bits are printed). The expected behaviour is that bits are shifted to the left, and that bits which are shifted "out" of the 16 bits an int has are simply dropped. Shifting the bits back does not turn the "32bit" variable to "16bit" again.
Shifting by 7 or less positions does not show this effect.
I probably should say that I am not using the Arduino IDE, but the Makefile from https://github.com/sudar/Arduino-Makefile.
What is going on? I almost expect this to be "normal", but I don't get it. Or is it something in the printing routine which simply adds 16 "1"'s to the output?
Enno
In addition to other answers, Integers might be stored in 16 bits or 32 bits depending on what arduino you have.
The function printing numbers in Arduino is defined in /arduino-1.0.5/hardware/arduino/cores/arduino/Print.cpp
size_t Print::printNumber(unsigned long n, uint8_t base) {
char buf[8 * sizeof(long) + 1]; // Assumes 8-bit chars plus zero byte.
char *str = &buf[sizeof(buf) - 1];
*str = '\0';
// prevent crash if called with base == 1
if (base < 2) base = 10;
do {
unsigned long m = n;
n /= base;
char c = m - base * n;
*--str = c < 10 ? c + '0' : c + 'A' - 10;
} while(n);
return write(str);
}
All other functions rely on this one, so yes your int gets promoted to an unsigned long when you print it, not when you shift it.
However, the library is correct. By shifting left 8 positions, the negative bit in the integer number becomes '1', so when the integer value is promoted to unsigned long the runtime correctly pads it with 16 extra '1's instead of '0's.
If you are using such a value not as a number but to contain some flags, use unsigned int instead of int.
ETA: for completeness, I'll add further explanation for the second shifting operation.
Once you touch the 'negative bit' inside the int number, when you shift towards right the runtime pads the number with '1's in order to preserve its negative value. Shifting to the left k positions corresponds to dividing the number by 2^k, and since the number is negative to start with then the result must remain negative.
I'm programming with a PLC and I'm reading values out of it.
It gives me the data in unsigned char. That's fine, but the values in my PLC can be over 255. And since unsigned chars can't give a value over 255 I get the wrong information.
The structure I get from the library:
struct PlcVarValue
{
unsigned long ulTimeStamp ALIGNATTRIB;
unsigned char bQuality ALIGNATTRIB;
unsigned char byData[1] ALIGNATTRIB;
};
ulTimeStamp gives the time
bQuality gives true/false (be able to read it or not)
byData[1] gives the data.
Anyways I'm trying this now: (where ppValues is an object of PlcVarValue)
unsigned char* variableValue = ppValues[0]->byData;
int iVariableValue = *variableValue;
This works fine... untill ppValues[0]->byData is > 255;
When I try the following when the number is for example 257:
unsigned char testValue = ppValues[0]->byData[0];
unsigned char testValue2 = ppValues[0]->byData[1];
the output is testvalue = 1 and testvalue2 = 1
that doesn't make sense to me.
So my question is, how can I get this solved so it gives me the correct number?
That actually looks like a variable-sized structure, where having an array of size 1 at the end being a common way to have it. See e.g. this tutorial about it.
In this case, both bytes being 1 for the value 257 is the correct values. Think of the two bytes as a 16-bit value, and combine the bits. One byte will become the hight byte, where 1 corresponds to 256, and then add the low bytes which is 1 and you have 256 + 1 which of course is equal to 257. Simple binary arithmetic.
Which byte is the high, and which is the low we can't say, but it's easy to check if you can force a message that contains the value 258 instead, as then one byte will still be 1 but the other will be 2.
How to combine it into a single unsigned 16-bit value is also easy if you know the bitwise shift and or operators:
uint8_t high_byte = ...
uint8_t low_byte = ...
uint16_t word = high_byte << 8 | low_byte;