Below, you will find a constexpr string literal to CRC32 computation.
I had to reinterpret the string literal character from char to unsigned char. Because reinterpret_cast is not available in constexpr function, the workaround is a small utility function to Two's complement manually but i am a little disappointed with it.
Does it exist a more elegant solution to deal with that kind of manipulation ?
#include <iostream>
class Crc32Gen {
uint32_t m_[256] {};
static constexpr unsigned char reinterpret_cast_schar_to_uchar( char v ) {
return v>=0 ? v : ~(v-1);
}
public:
// algorithm from http://create.stephan-brumme.com/crc32/#sarwate
constexpr Crc32Gen() {
constexpr uint32_t polynomial = 0xEDB88320;
for (unsigned int i = 0; i <= 0xFF; i++) {
uint32_t crc = i;
for (unsigned int j = 0; j < 8; j++)
crc = (crc >> 1) ^ (-int(crc & 1) & polynomial);
m_[i] = crc;
}
}
constexpr uint32_t operator()( const char* data ) const {
uint32_t crc = ~0;
while (auto c = reinterpret_cast_schar_to_uchar(*data++))
crc = (crc >> 8) ^ m_[(crc & 0xFF) ^ c];
return ~crc;
}
};
constexpr Crc32Gen const crc32Gen_;
int main() {
constexpr auto const val = crc32Gen_( "The character code for É is greater than 127" );
std::cout << std::hex << val << std::endl;
}
Edit : in that case, static_cast<unsigned char>(*data++) is enough.
Two's complement is not guaranteed by the standard; in clause 3.9.1:
7 - [...] The representations of integral types
shall define values by use of a pure binary numeration system. [Example: this International Standard
permits 2's complement, 1's complement and signed magnitude representations for integral types. — end
example ]
So any code that assumes two's complement is going to have to perform the appropriate manipulations manually.
That said, your conversion function is unnecessary (and possibly incorrect); for signed-to-unsigned conversions you can just use the standard integral conversion (4.7):
2 - If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
Corrected code, using static_cast::
constexpr uint32_t operator()( const char* data ) const {
uint32_t crc = ~0;
while (auto c = static_cast<unsigned char>(*data++))
crc = (crc >> 8) ^ m_[(crc & 0xFF) ^ c];
return ~crc;
}
Related
I'm digging into the subtleties of CRCs. If I understand correctly, every CRC polynomial is provided in at least two representations, the normal one and the reversed one.
The normal one targets implementations where the content is processed from most signifiant bit to least significant bit and switched to the left (like for example in this wikipedia page).
The reversed one aims to handle LSb to MSb interfaces. If you process LSb to MSb with the reversed polynomial and switching to the right you get the same CRC value (also encoded LSb to MSb). This is described for example here. This is convenient for LSb to MSb communication interfaces.
What I don't understand is when you switch to software implementations. Why are there two variants of a software ie. byte implementation? (One for MSb to LSb, and one for the opposite bit order.)
You do not get the "same CRC value" (reflected or not) with the reflected calculation. It is an entirely different value, because the bits of the message are processed in the opposite order.
"when you switch": You simply use the CRC definition, reflected or not, that matches what the application is expecting. Whether the CRC is reflected is one of several parameters that define the CRC, along with the number of the bits in the CRC, the polynomial, the initial value, and the final exclusive or value. You can find the definition of over a hundred different CRCs here.
"why are there two": The forward implementation exists because that corresponds most closely to the mathematics, with the least significant term of the polynomial in the least significant bit of the binary representation of the polynomial. The reflected implementation exists because it was realized that it could be implemented in software a little more simply, with fewer instructions, but still have the same error-detection performance.
Here is an example for two common 32-bit CRCs with the same polynomial. Forward, CRC-32/BZIP bit-wise implementation:
uint32_t crc32bzip2_bit(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
crc = ~crc;
for (size_t i = 0; i < len; i++) {
crc ^= (uint32_t)data[i] << 24;
for (unsigned k = 0; k < 8; k++) {
crc = crc & 0x80000000 ? (crc << 1) ^ 0x4c11db7 : crc << 1;
}
}
crc = ~crc;
return crc;
}
Reflected CRC-32/ZIP bit-wise:
uint32_t crc32iso_hdlc_bit(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
crc = ~crc;
for (size_t i = 0; i < len; i++) {
crc ^= data[i];
for (unsigned k = 0; k < 8; k++) {
crc = crc & 1 ? (crc >> 1) ^ 0xedb88320 : crc >> 1;
}
}
crc = ~crc;
return crc;
}
The main savings is one instruction, the shift up of the data byte, that you can get rid of with the reflected implementation. Also the constant that you & with (1 vs. 0x80000000) is smaller, which may also save an instruction or a register, or perhaps just result in a shorter instruction, depending on the size of immediate values supported in the instruction set.
The shift is avoided for byte-wise calculations as well:
uint32_t crc32bzip2_byte(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
for (size_t i = 0; i < len; i++) {
crc = (crc << 8) ^
table_byte[((crc >> 24) ^ data[i]) & 0xff];
}
return crc;
}
vs.
uint32_t crc32iso_hdlc_byte(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
for (size_t i = 0; i < len; i++) {
crc = (crc >> 8) ^
table_byte[(crc ^ data[i]) & 0xff];
}
return crc;
}
I am really new to c++ so sorry if this question is bad or not understandable.
I have an integer, for example, int a; its values can be between 1 and 3500 (this value I get from a file). I also have an array of chars, unsigned char packet[MAX_PACKET_SIZE];. My goal is to place this integer value in this array between indexes packet[10] and packet[17] So in the end takes 8 bytes.
If the value for a is 1 i would like my packet array to be:
packet[10] = 30, packet[11] = 30, packet[12] = 30, packet[13] = 30, packet[14] = 30, packet[15] = 30, packet[16] = 30, packet[17] = 31
You have to look into binary representation and binary math to really understand what bit-wise operations do to values.
Note that 3500 easily fits into a 16 bits value, 3500 is less than 2^16.
If you want to use types with guaranteed sizes, you have to use uint8_t, uint16_t and similar.
Such operations require a careful approach in case of C++, if you want a potable code. An int may have different size and even different order of bytes (endianness), but operators of bit-wise shift >> and << are agnostic to endianness. Operator << always shift toward higher digits, >> always shifts to less significant ones.
Note, that in C++ results of all bit-wise operations are promoted to be of size of unsigned int or larger type if required. Shift operations with signed values are not defined.
In naive but safe variant of required algorithm we have to do those steps.
(Decide in which order we write bytes down into the buffer. Let assume we do that from less significant to more significant.)
Determine first byte to write value to to, pointed by p
Determine size of written value in bytes, a pend pointer defines a byte after the written value.
"Cut" first byte out of the original value by using AND operation (&) and mask consisting of all 1's and assign it to the location pointed by p.
Remove written byte from value by shift to the right (>>).
Increment p.
If (p != pend) go to 3.
(optional) we can save p or pend for further purposes, e.g. for sequenced writes.
In C-styled (but already C++) variant this would look like:
unsigned char * pack_int(unsigned char *p, unsigned value)
{
unsigned char *pend = p + sizeof(value);
while(p != pend)
{
// ~ is a bit-wise not, ~0 produces an int with all bits set
*p = value & ((unsigned char)~0);
value >>= CHAR_BIT;
p++;
}
return p;
}
Use of ((unsigned char)~0) instead of 0xFF literal is simply a protection from bytes that aren't 8 bit. Compiler would convert it into correct literal value.
C++ allows to make this implementation to be type-agnostic. E.g. one that still requires sequenced iterators to address output location:
template <class InIt, class T>
InIt pack(InIt p, T value)
{
using target_t = std::make_unsigned_t<std::remove_reference_t<decltype(*p)>>;
using src_t = std::make_unsigned_t<T>;
InIt pend = p + sizeof(T);
src_t val = static_cast<src_t> (value); // if T is signed, it would fit anyway.
while(p != pend)
{
*p = (val & (target_t)~0);
val >>= CHAR_BIT;
p++;
}
return pend;
}
In C++17 one can use
InIt pend = p;
std::advance(pend, sizeof(T));
A better implementation in C++ would generate conversion sequence statically, during compilation, instead of using a loop, by application of recursive templates.
This is a fully functional program that uses both :
#include <iostream>
#include <array>
#include <climits>
#include <type_traits>
// C-styled function
// packs value into buffer, returns pointer to the byte after its end.
unsigned char * pack_int(unsigned char *p, unsigned value)
{
unsigned char *pend = p + sizeof(value);
while(p != pend)
{
// ~ is a bit-wise not, ~0 produces an int with all bits set
*p = value & ((unsigned char)~0);
value >>= CHAR_BIT;
p++;
}
return p;
}
// a type-agnostic template
template <class InIt, class T>
InIt pack(InIt p, T value)
{
using target_t = std::make_unsigned_t<std::remove_reference_t<decltype(*p)>>;
using src_t = std::make_unsigned_t<T>;
InIt pend = p + sizeof(T);
src_t val = static_cast<src_t> (value); // if T is signed, it would fit anyway.
while(p != pend)
{
*p = (val & (target_t)~0);
val >>= CHAR_BIT;
p++;
}
return pend;
}
int main(int argc, char** argv )
{
std::array<unsigned char, 16> buffer = {};
auto ptr = pack_int(&(buffer[0]), 0xA4B3C2D1);
ptr = pack(ptr, (long long)0xA4B3C2D1);
pack(ptr, 0xA4B3C2D1);
std::cout << std::hex;
for( auto c : buffer)
std::cout << +c << ", ";
std::cout << "{end}\n";
}
Output of this would be
d1, c2, b3, a4, d1, c2, b3, a4, 0, 0, 0, 0, d1, c2, b3, a4, {end}
The sequence d1, c2, b3, a4, repeated twice, is obviously a reversed representation for hex value 0xA4B3C2D1. On little-endian system that matches representation of unsigned int in memory. For 3500 (hex 0xDAC) it would be ac, d, 0, 0.
In communications a "network order" is accepted as a standard, also known as "big-endian", where the most significant byte comes first, which would require a slight alteration to the algorithm above.
How can I convert given bitset of a length N (where 0 < N < 64) to signed int. For instance, given:
std::bitset<13> b("1111111101100");
I would like to get back the value -20, not 8172.
My approach:
int t = (static_cast<int>(b.to_ullong()));
if(t > pow(2, 13)/2)
t -= pow(2, 13);
Is there a more generic way to approach this?
Edit: Also the bitset is actually std::bitset<64> and the N can be run-time known value passed by other means.
We can write a function template to do this for us:
template <size_t N, class = std::enable_if_t<(N > 0 && N < 64)>
int64_t as_signed(const std::bitset<N>& b)
{
int64_t v = b.to_ullong(); // safe since we know N < 64
return b[N-1] ? ((1LL << N) - v) : v;
}
Perhaps best is to let compiler to sign-extend it itself:
struct S { int64_t x:N; } s;
int64_t result = s.x = b.to_ullong();
Compiler likely optimizes that s out.
It must be is safe since the int64_t (where available) is required to be two's complement.
Edit: When the actual bit count to extend is only known run-time then most portable algorithm is with mask:
// Do this if bits above position N in b may be are not zero to clear those.
int64_t x = b.to_ullong() & ((1ULL << N) - 1);
// Otherwise just
int64_t x = b.to_ullong();
int64_t const mask = 1ULL << (N - 1);
int64_t result = (x ^ mask) - mask;
A slightly faster but less portable method with dynamic bit counts is with bit shifts (works when architecture has signed arithmetic right shift):
int const shift = 64 - N;
int64_t result = ((int64_t)b.to_ullong() << shift) >> shift;
It seems so strange. I found misunderstanding. I use gcc with char as signed char. I always thought that in comparison expressions(and other expressions) signed value converts to unsigned if necessary.
int a = -4;
unsigned int b = a;
std::cout << (b == a) << std::endl; // writes 1, Ok
but the problem is that
char a = -4;
unsigned char b = a;
std::cout << (b == a) << std::endl; // writes 0
what is the magic in comparison operator if it's not just bitwise?
According to the C++ Standard
6 If both operands are of arithmetic or enumeration type, the usual
arithmetic conversions are performed on both operands; each of the
operators shall yield true if the specified relationship is true and
false if it is false.
So in this expression
b == a
of the example
char a = -4;
unsigned char b = -a;
std::cout << (b == a) << std::endl; // writes 0
the both operands are converted to type int. As the result signed char propagets its signed bit and two values become unequal.
To demonstrate the effect try to run this simple example
{
char a = -4;
unsigned char b = -a;
std::cout << std::hex << "a = " << ( int )a << "'\tb = " << ( int )b << std::endl;
if ( b > a ) std::cout << "b is greater than a, that is b is positive and a is negative\n";
}
The output is
a = fffffffc' 'b = 4
b is greater than a, that is b is positive and a is negative
Edit: Only now I have seen that definitions of the variables have to look as
char a = -4;
unsigned char b = a;
that is the minus in the definition of b ahould not be present.
Since an (unsigned) int is at least 16 bits wide, let's use that for instructional purposes:
In the first case: a = 0xfffc, and b = (unsigned int) (a) = 0xfffc
Following the arithmetic conversion rules, the comparison is evaluated as:
((unsigned int) b == (unsigned int) a) or (0xfffc == 0xfffc), which is (1)
In the 2nd case: a = 0xfc, and b = (unsigned char) ((int) a) or:
b = (unsigned char) (0xfffc) = 0xfc i.e., sign-extended to (int) and truncated
Since and int can represent the range of both the signed char and unsigned char types, the comparison is evaluated as: (zero-extended vs. sign-extended)
((int) b == (int) a) or (0x00fc == 0xfffc), which is (0).
Note: The C and C++ integer conversion rules behave the same way in these cases. Of course, I'm assuming that the char types are 8 bit, which is typical, but only the minimum required.
They both output 0 because unsigned values can get converted to signed values, not viceversa (like you said).
I was asked to get the internal binary representation of different types in C. My program currently works fine with 'int' but I would like to use it with "double" and "float". My code looks like this:
template <typename T>
string findBin(T x) {
string binary;
for(int i = 4096 ; i >= 1; i/=2) {
if((x & i) != 0) binary += "1";
else binary += "0";
}
return binary;
}
The program fails when I try to instantiate the template using a "double" or a "float".
Succinctly, you don't.
The bitwise operators do not make sense when applied to double or float, and the standard says that the bitwise operators (~, &, |, ^, >>, <<, and the assignment variants) do not accept double or float operands.
Both double and float have 3 sections - a sign bit, an exponent, and the mantissa. Suppose for a moment that you could shift a double right. The exponent, in particular, means that there is no simple translation to shifting a bit pattern right - the sign bit would move into the exponent, and the least significant bit of the exponent would shift into the mantissa, with completely non-obvious sets of meanings. In IEEE 754, there's an implied 1 bit in front of the actual mantissa bits, which also complicates the interpretation.
Similar comments apply to any of the other bit operators.
So, because there is no sane or useful interpretation of the bit operators to double values, they are not allowed by the standard.
From the comments:
I'm only interested in the binary representation. I just want to print it, not do anything useful with it.
This code was written several years ago for SPARC (big-endian) architecture.
#include <stdio.h>
union u_double
{
double dbl;
char data[sizeof(double)];
};
union u_float
{
float flt;
char data[sizeof(float)];
};
static void dump_float(union u_float f)
{
int exp;
long mant;
printf("32-bit float: sign: %d, ", (f.data[0] & 0x80) >> 7);
exp = ((f.data[0] & 0x7F) << 1) | ((f.data[1] & 0x80) >> 7);
printf("expt: %4d (unbiassed %5d), ", exp, exp - 127);
mant = ((((f.data[1] & 0x7F) << 8) | (f.data[2] & 0xFF)) << 8) | (f.data[3] & 0xFF);
printf("mant: %16ld (0x%06lX)\n", mant, mant);
}
static void dump_double(union u_double d)
{
int exp;
long long mant;
printf("64-bit float: sign: %d, ", (d.data[0] & 0x80) >> 7);
exp = ((d.data[0] & 0x7F) << 4) | ((d.data[1] & 0xF0) >> 4);
printf("expt: %4d (unbiassed %5d), ", exp, exp - 1023);
mant = ((((d.data[1] & 0x0F) << 8) | (d.data[2] & 0xFF)) << 8) | (d.data[3] & 0xFF);
mant = (mant << 32) | ((((((d.data[4] & 0xFF) << 8) | (d.data[5] & 0xFF)) << 8) | (d.data[6] & 0xFF)) << 8) | (d.data[7] & 0xFF);
printf("mant: %16lld (0x%013llX)\n", mant, mant);
}
static void print_value(double v)
{
union u_double d;
union u_float f;
f.flt = v;
d.dbl = v;
printf("SPARC: float/double of %g\n", v);
// image_print(stdout, 0, f.data, sizeof(f.data));
// image_print(stdout, 0, d.data, sizeof(d.data));
dump_float(f);
dump_double(d);
}
int main(void)
{
print_value(+1.0);
print_value(+2.0);
print_value(+3.0);
print_value( 0.0);
print_value(-3.0);
print_value(+3.1415926535897932);
print_value(+1e126);
return(0);
}
The commented out 'image_print()` function prints an arbitrary set of bytes in hex, with various minor tweaks. Contact me if you want the code (see my profile).
If you're using Intel (little-endian), you'll probably need to tweak the code to deal with the reverse bit order. But it shows how you can do it - using a union.
You cannot directly apply bitwise operators to float or double, but you can still access the bits indirectly by putting the variable in a union with a character array of the appropriate size, then reading the bits from those characters. For example:
string BitsFromDouble(double value) {
union {
double doubleValue;
char asChars[sizeof(double)];
};
doubleValue = value; // Write to the union
/* Extract the bits. */
string result;
for (size i = 0; i < sizeof(double); ++i)
result += CharToBits(asChars[i]);
return result;
}
You may need to adjust your routine to work on chars, which usually don't range up to 4096, and there may also be some weirdness with endianness here, but the basic idea should work. It won't be cross-platform compatible, since machines use different endianness and representations of doubles, so be careful how you use this.
Bitwise operators don't generally work with "binary representation" (also called object representation) of any type. Bitwise operators work with value representation of the type, which is generally different from object representation. That applies to int as well as to double.
If you really want to get to the internal binary representation of an object of any type, as you stated in your question, you need to reinterpret the object of that type as an array of unsigned char objects and then use the bitwise operators on these unsigned chars
For example
double d = 12.34;
const unsigned char *c = reinterpret_cast<unsigned char *>(&d);
Now by accessing elements c[0] through c[sizeof(double) - 1] you will see the internal representation of type double. You can use bitwise operations on these unsigned char values, if you want to.
Note, again, that in general case in order to access internal representation of type int you have to do the same thing. It generally applies to any type other than char types.
Do a bit-wise cast of a pointer to the double to long long * and dereference.
Example:
inline double bit_and_d(double* d, long long mask) {
long long t = (*(long long*)d) & mask;
return *(double*)&t;
}
Edit: This is almost certainly going to run afoul of gcc's enforcement of strict aliasing. Use one of the various workarounds for that. (memcpy, unions, __attribute__((__may_alias__)), etc)
Other solution is to get a pointer to the floating point variable and cast it to a pointer to integer type of the same size, and then get value of the integer this pointer points to. Now you have an integer variable with same binary representation as the floating point one and you can use your bitwise operator.
string findBin(float f) {
string binary;
for(long i = 4096 ; i >= 1; i/=2) {
long x = * ( long * ) &y;
if((x & i) != 0) binary += "1";
else binary += "0";
}
return binary;
}
But remember: you have to cast to a type with same size. Otherwise unpredictable things may happen (like buffer overflow, access violation etc.).
As others have said, you can use a bitwise operator on a double by casting double* to long long* (or sometimes just long*).
int main(){
double * x = (double*)malloc(sizeof(double));
*x = -5.12345;
printf("%f\n", *x);
*((long*)x) &= 0x7FFFFFFFFFFFFFFF;
printf("%f\n", *x);
return 0;
}
On my computer, this code prints:
-5.123450
5.123450