I am calculating CRC on a large chunk of data every cycle in hardware (64B per cycle). In order to parallelize the CRC calculation, I want to calculate the CRC for small data chunks and then XOR them in parallel.
Approach:
We divide the data into small chunks (64B data divided into 8 chunks
of 8B each).
Then we calculate CRC's for all the chunks
individually (8 CRC's in parallel for 8B chunks).
Finally calculate
the CRC for padded data. This answer points out that the CRC
for padded data is obtained by multiplying the old CRC with x^n.
Hence, I am calculating the CRC for a small chunk of data, then multiply it with CRC of 0x1 shifted by 'i' times as shown below.
In short, I am trying to accomplish below:
For example: CRC-8 on this site:
Input Data=(0x05 0x07) CRC=0x54
Step-1: Data=0x5 CRC=0x1B
Step-2: Data=0x7 CRC=0x15
Step-3: Data=(0x1 0x0) CRC=0x15
Step-4: Multiply step-1 CRC and step-3 CRC with primitive polynomial 0x7. So, I calculate (0x1B).(0x15) = (0x1 0xC7) mod 0x7.
Step-5: Calculate CRC Data=(0x1 0xC7) CRC=0x4E (I assume this is same as (0x1 0xC7) mod 0x7)
Step-6: XOR the result to get the final CRC. 0x4E^0x15=0x5B
As we can see, the result in step-6 is not the correct result.
Can someone help me how to calculate the CRC for padded data? Or where am I going wrong in the above example?
Rather than calculate and then adjust multiple CRC's, bytes of data can be carryless multiplied to form a set of 16 bit "folded" products, which are then xor'ed and a single modulo operation performed on the xor'ed "folded" products. An optimized modulo operation uses two carryless multiples, so it's avoided until all folded products have been generated and xor'ed together. A carryless multiply uses XOR instead of ADD and a borrowless divide uses XOR instead of SUB. Intel has a pdf file about this using the XMM instruction PCLMULQDQ (carryless multiply), where 16 bytes are read at a time, split into two 8 byte groups, with each group folded into a 16 byte product, and the two 16 byte products are xor'ed to form a single 16 byte product. Using 8 XMM registers to hold folding products, 128 bytes at time are processed. (256 bytes at at time in the case of AVX512 and ZMM registers).
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
Assume your hardware can implement a carryless multiply that takes two 8 bit operands and produces a 16 bit (technically 15 bit) product.
Let message = M = 31 32 33 34 35 36 37 38. In this case CRC(M) = C7
pre-calculated constants (all values shown in hex):
2^38%107 = DF cycles forwards 0x38 bits
2^30%107 = 29 cycles forwards 0x30 bits
2^28%107 = 62 cycles forwards 0x28 bits
2^20%107 = 16 cycles forwards 0x20 bits
2^18%107 = 6B cycles forwards 0x18 bits
2^10%107 = 15 cycles forwards 0x10 bits
2^08%107 = 07 cycles forwards 0x08 bits
2^00%107 = 01 cycles forwards 0x00 bits
16 bit folded (cycled forward) products (can be calculated in parallel):
31·DF = 16CF
32·29 = 07E2
33·62 = 0AC6
34·16 = 03F8
35·6B = 0A17
36·15 = 038E
37·07 = 0085
38·01 = 0038
----
V = 1137 the xor of the 8 folded products
CRC(V) = 113700 % 107 = C7
To avoid having to use borrowless divide for the modulo operation, CRC(V) can be computed using carryless multiply. For example
V = FFFE
CRC(V) = FFFE00 % 107 = 23.
Implementation, again all values in hex (hex 10 = decimal 16), ⊕ is XOR.
input:
V = FFFE
constants:
P = 107 polynomial
I = 2^10 / 107 = 107 "inverse" of polynomial
by coincidence, it's the same value
2^10 % 107 = 15 for folding right 16 bits
fold the upper 8 bits of FFFE00 16 bits to the right:
U = FF·15 ⊕ FE00 = 0CF3 ⊕ FE00 = F2F3 (check: F2F3%107 = 23 = CRC)
Q = ((U>>8)·I)>>8 = (F2·107)>>8 = ...
to avoid a 9 bit operand, split up 107 = 100 ⊕ 7
Q = ((F2·100) ⊕ (F2·07))>>8 = ((F2<<8) ⊕ (F2·07))>>8 = (F200 ⊕ 02DE)>>8 = F0DE>>8 = F0
X = Q·P = F0·107 = F0·100 ⊕ F0·07 = F0<<8 ⊕ F0·07 = F000 ⊕ 02D0 = F2D0
CRC = U ⊕ X = F2F3 ⊕ F2D0 = 23
Since the CRC is 8 bits, there's no need for the upper 8 bits in the last two steps, but it doesn't help that much for the overall calculation.
X = (Q·(P&FF))&FF = (F0·07)&FF = D0
CRC = (U&FF) ⊕ X = F3 ⊕ D0 = 23
Example program to generate 2^0x10 / 0x107 and powers of 2 % 0x107:
#include <stdio.h>
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
#define poly 0x107
uint16_t geninv(void) /* generate 2^16 / 9 bit poly */
{
uint16_t q = 0x0000u; /* quotient */
uint16_t d = 0x0001u; /* initial dividend = 2^0 */
for(int i = 0; i < 16; i++){
d <<= 1;
q <<= 1;
if(d&0x0100){ /* if bit 8 set */
q |= 1; /* q |= 1 */
d ^= poly; /* d ^= poly */
}
}
return q; /* return inverse */
}
uint8_t powmodpoly(int n) /* generate 2^n % 9 bit poly */
{
uint16_t d = 0x0001u; /* initial dividend = 2^0 */
for(int i = 0; i < n; i++){
d <<= 1; /* shift dvnd left */
if(d&0x0100){ /* if bit 8 set */
d ^= poly; /* d ^= poly */
}
}
return (uint8_t)d; /* return remainder */
}
int main()
{
printf("%04x\n", geninv());
printf("%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x\n",
powmodpoly(0x00), powmodpoly(0x08), powmodpoly(0x10), powmodpoly(0x18),
powmodpoly(0x20), powmodpoly(0x28), powmodpoly(0x30), powmodpoly(0x38),
powmodpoly(0x40), powmodpoly(0x48));
printf("%02x\n", powmodpoly(0x77)); /* 0xd9, cycles crc backwards 8 bits */
return 0;
}
Long hand example for 2^0x10 / 0x107.
100000111 quotient
-------------------
divisor 100000111 | 10000000000000000 dividend
100000111
---------
111000000
100000111
---------
110001110
100000111
---------
100010010
100000111
---------
10101 remainder
I don't know how many registers you can have in your hardware design, but assume there are five 16 bit registers used to hold folded values, and either two or eight 8 bit registers (depending on how parallel the folding is done). Then following the Intel paper, you fold values for all 64 bytes, 8 bytes at a time, and only need one modulo operation. Register size, fold# = 16 bits, reg# = 8 bits. Note that powers of 2 modulo poly are pre-calculated constants.
foldv = prior buffer's folding value, equivalent to folded msg[-2 -1]
reg0 = foldv>>8
reg1 = foldv&0xFF
foldv = reg0·((2^0x18)%poly) advance by 3 bytes
foldv ^= reg1·((2^0x10)%poly) advance by 2 bytes
fold0 = msg[0 1] ^ foldv handling 2 bytes at a time
fold1 = msg[2 3]
fold2 = msg[4 5]
fold3 = msg[6 7]
for(i = 8; i < 56; i += 8){
reg0 = fold0>>8
reg1 = fold0&ff
fold0 = reg0·((2^0x48)%poly) advance by 9 bytes
fold0 ^= reg1·((2^0x40)%poly) advance by 8 bytes
fold0 ^= msg[i+0 i+1]
reg2 = fold1>>8 if not parallel, reg0
reg3 = fold1&ff and reg1
fold1 = reg2·((2^0x48)%poly) advance by 9 bytes
fold1 ^= reg3·((2^0x40)%poly) advance by 8 bytes
fold1 ^= msg[i+2 i+3]
...
fold3 ^= msg[i+6 i+7]
}
reg0 = fold0>>8
reg1 = fold0&ff
fold0 = reg0·((2^0x38)%poly) advance by 7 bytes
fold0 ^= reg1·((2^0x30)%poly) advance by 6 bytes
reg2 = fold1>>8 if not parallel, reg0
reg3 = fold1&ff and reg1
fold1 = reg2·((2^0x28)%poly) advance by 5 bytes
fold1 ^= reg3·((2^0x20)%poly) advance by 4 bytes
fold2 ... advance by 3 2 bytes
fold3 ... advance by 1 0 bytes
foldv = fold0^fold1^fold2^fold3
Say the final buffer has 5 bytes:
foldv = prior folding value, equivalent to folded msg[-2 -1]
reg0 = foldv>>8
reg1 = foldv&0xFF
foldv = reg0·((2^0x30)%poly) advance by 6 bytes
foldv ^= reg1·((2^0x28)%poly) advance by 5 bytes
fold0 = msg[0 1] ^ foldv
reg0 = fold0>>8
reg1 = fold0&ff
fold0 = reg0·((2^0x20)%poly) advance by 4 bytes
fold0 ^= reg1·((2^0x18)%poly) advance by 3 bytes
fold1 = msg[2 3]
reg2 = fold1>>8
reg3 = fold1&ff
fold1 = reg0·((2^0x10)%poly) advance by 2 bytes
fold1 ^= reg1·((2^0x08)%poly) advance by 1 bytes
fold2 = msg[4] just one byte loaded
fold3 = 0
foldv = fold0^fold1^fold2^fold3
now use the method above to calculate CRC(foldv)
As shown in your diagram, you need to calculate the CRC of 0x05 0x00, (A,0), and the CRC of 0x00 0x07, (0,B), and then exclusive-or those together. Calculating on the site you linked, you get 0x41 and 0x15 respectively. Exclusive-or those together, and, voila, you get 0x54, the CRC of 0x05 0x07.
There is a shortcut for (0,B), since for this CRC, the CRC of a string of zeros is zero. You can calculate the CRC of just 0x07 and get the same result as for 0x00 0x07, which is 0x15.
See crcany for how to combine CRCs in general. crcany will generate C code to compute any specified CRC, including code to combine CRCs. It employs a technique that applies n zeros to a CRC in O(log(n)) time instead of O(n) time.
I have I problem with multiplying two registers (or just register by float constant). One register is __m128i type and contains one channel of RGBA pixel color from 16 pixels (the array with 16 pixels is sending as a parameter to CPP dll). I want to multiply this register by constant to get grayscale value for this channel and do this operation also for other channels stored in __m128i registers.
I think that a good idea to use SIMD for convert image to grayscale is to use this algorithm.
fY(R, G, B) = R x 0.29891 + G x 0.58661 + B x 0.11448
I have this following code and now it's only decomposing the image to channels and pack it together to return as an src vector. Now I need to make it for grayscale :)
The src variable is a pointer to unsigned char array.
__m128i vecSrc = _mm_loadu_si128((__m128i*) &src[srcIndex]);
__m128i maskR = _mm_setr_epi16(1, 0, 0, 0, 1, 0, 0, 0);
__m128i maskG = _mm_setr_epi16(0, 1, 0, 0, 0, 1, 0, 0);
__m128i maskB = _mm_setr_epi16(0, 0, 1, 0, 0, 0, 1, 0);
__m128i maskA = _mm_setr_epi16(0, 0, 0, 1, 0, 0, 0, 1);
// Creating factors.
const __m128i factorR = _mm_set1_epi16((short)(0.29891 * 0x10000)); //8 coefficients - R scale factor.
const __m128i factorG = _mm_set1_epi16((short)(0.58661 * 0x10000)); //8 coefficients - G scale factor.
const __m128i factorB = _mm_set1_epi16((short)(0.11448 * 0x10000)); //8 coefficients - B scale factor.
__m128i zero = _mm_setzero_si128();
// Shifting higher part of src register to lower.
__m128i vectSrcLowInHighPart = _mm_cvtepu8_epi16(vecSrc);
__m128i vectSrcHighInHighPart = _mm_unpackhi_epi8(vecSrc, zero);
// Multiply high parts of 16 x uint8 vectors by channels masks and save lower half. Getting each channels separatly (in two parts H and L)
__m128i vecR_L = _mm_mullo_epi16(vectSrcLowInHighPart, maskR);
__m128i vecG_L = _mm_mullo_epi16(vectSrcLowInHighPart, maskG);
__m128i vecB_L = _mm_mullo_epi16(vectSrcLowInHighPart, maskB);
__m128i vecA_L = _mm_mullo_epi16(vectSrcLowInHighPart, maskA);
// Multiply lower parts of 16 x uint8 vectors by channels masks and save lower half.
__m128i vecR_H = _mm_mullo_epi16(vectSrcHighInHighPart, maskR);
__m128i vecG_H = _mm_mullo_epi16(vectSrcHighInHighPart, maskG);
__m128i vecB_H = _mm_mullo_epi16(vectSrcHighInHighPart, maskB);
__m128i vecA_H = _mm_mullo_epi16(vectSrcHighInHighPart, maskA);
// Lower and high masks using to packing.
__m128i maskLo = _mm_set_epi8(0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 14, 12, 10, 8, 6, 4, 2, 0);
__m128i maskHi = _mm_set_epi8(14, 12, 10, 8, 6, 4, 2, 0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80);
// Packed the High and Lowe part of register into one 16 x 8bit registers of each channels.
__m128i R = _mm_or_si128(_mm_shuffle_epi8(vecR_L, maskLo), _mm_shuffle_epi8(vecR_H, maskHi));
__m128i G = _mm_or_si128(_mm_shuffle_epi8(vecG_L, maskLo), _mm_shuffle_epi8(vecG_H, maskHi));
__m128i B = _mm_or_si128(_mm_shuffle_epi8(vecB_L, maskLo), _mm_shuffle_epi8(vecB_H, maskHi));
__m128i A = _mm_or_si128(_mm_shuffle_epi8(vecA_L, maskLo), _mm_shuffle_epi8(vecA_H, maskHi));
// Added all sub vectors to get in result one 128-bit vector with all edited channels.
__m128i resultVect = _mm_add_epi8(_mm_add_epi8(R, G), _mm_add_epi8(B, A));
// Put result vector into array to return as src pointer.
_mm_storel_epi64((__m128i*)&src[srcIndex], resultVect);
Thanks for help for you! It's my first program with SIMD (SSE) instructions.
Based on comments to my question, I created a solution. And also a project where I was learning how the registers exactly work when I using SSE instructions.
// Function displaying only registers with 16 x uInt8. And message.
void printRegister(__m128i registerToprint, const string &msg) {
unsigned char tab_debug[16] = { 0 };
unsigned char *dest = tab_debug;
_mm_store_si128((__m128i*)&dest[0], registerToprint);
cout << msg << endl;
cout << "\/\/\/\/ LO \/\/\/\/" << endl;
for (int i = 0; i < 16; i++)
cout << dec << (unsigned int)dest[i] << endl;
cout << "/\/\/\/\ HI /\/\/\/" << endl;
}
int main()
{
// Example array as 128-bit register with 16xuInt8. That represent each channel of pixel in BGRA configuration.
unsigned char tab[] = { 100,200,250,255, 101,201,251,255, 102,202,252,255, 103,203,253,255 };
// A pointer to source tab for simulate dll parameters reference.
unsigned char *src = tab;
// Start index of src t
int srcIndex = 0;
// How to define float numbers as integer of uInt16 type.
const __m128i r_coef = _mm_set1_epi16((short)(0.2989*32768.0 + 0.5));
const __m128i g_coef = _mm_set1_epi16((short)(0.5870*32768.0 + 0.5));
const __m128i b_coef = _mm_set1_epi16((short)(0.1140*32768.0 + 0.5));
// vecSrc - source vector (BGRA BGRA BGRA BGRA).
// Load data from tab[] into 128-bit register starting from adress at pointer src. (From 0 index so load all 16 elements x 8bit).
__m128i vecSrc = _mm_loadu_si128((__m128i*) &src[srcIndex]);
// Shuffle to configuration A0A1A2A3_R0R1R2R3_G0G1G2G3_B0B1B2B3
// Not revers so mask is read from left (Lo) to right (Hi). And counting from righ in srcVect (Lo).
__m128i shuffleMask = _mm_set_epi8(15, 11, 7, 3, 14, 10, 6, 2, 13, 9, 5, 1, 12, 8, 4, 0);
__m128i AAAA_R0RRR_G0GGG_B0BBB = _mm_shuffle_epi8(vecSrc, shuffleMask);
// Put B0BBB in lower part.
__m128i B0_XXX = _mm_slli_si128(AAAA_R0RRR_G0GGG_B0BBB, 12);
__m128i XXX_B0 = _mm_srli_si128(B0_XXX, 12);
// Put G0GGG in Lower part.
__m128i G0_B_XX = _mm_slli_si128(AAAA_R0RRR_G0GGG_B0BBB, 8);
__m128i XXX_G0 = _mm_srli_si128(G0_B_XX, 12);
// Put R0RRR in Lower part.
__m128i R0_G_XX = _mm_slli_si128(AAAA_R0RRR_G0GGG_B0BBB, 4);
__m128i XXX_R0 = _mm_srli_si128(R0_G_XX, 12);
// Unpack uint8 elements to uint16 elements.
// The sequence in uInt8 is like (Hi) XXXX XXXX XXXX XXXX (Lo) where X represent uInt8.
// In uInt16 is like (Hi) X_X_ X_X_ X_X_ X_X_ (Lo)
__m128i B0BBB = _mm_cvtepu8_epi16(XXX_B0);
__m128i G0GGG = _mm_cvtepu8_epi16(XXX_G0);
__m128i R0RRR = _mm_cvtepu8_epi16(XXX_R0);
// Multiply epi16 registers.
__m128i B0BBB_mul = _mm_mulhrs_epi16(B0BBB, b_coef);
__m128i G0GGG_mul = _mm_mulhrs_epi16(G0GGG, g_coef);
__m128i R0RRR_mul = _mm_mulhrs_epi16(R0RRR, r_coef);
__m128i BGR_gray = _mm_add_epi16(_mm_add_epi16(B0BBB_mul, G0GGG_mul), R0RRR_mul);
__m128i grayMsk = _mm_setr_epi8(0, 0, 0, 0, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6);
__m128i vectGray = _mm_shuffle_epi8(BGR_gray, grayMsk);
printRegister(vectGray, "Gray");
}
How it's work
The unsigned char tab[] contains 16 x uInt8 elements to fill one 128-bit register. This array is simulating a 8 pixels which channels is on BGRA configuration.
void printRegister(__m128i registerToprint, const string &msg);
This function is using to print as a decimal registers value sending as a parameter in console.
If someone wants to test it the full project is available at gitHub: Full project demo gitHub
I hope that all comments are valid if no, please correct me :) Thanks for the support.
I am trying to read the pixel values of an image contained in a DICOM file in my simple c++ application using the Grassroots DICOM (GDCM) library. When reading the file metadata I get the following information about the picture:
Bits allocated: 16
Bits Stored: 16
High Bit: 15
Unsigned or signed: 1
Samples pr pixel: 1
Dimensions: 2
Dimension values: 256x256
Pixel Representation: 1
SamplesPerPixel: 1
ScalarType: INT16
PhotometricInterpretation: MONOCHROME2
Pixel buffer length: 131072
Given that the image has a resolution of 256x256 and is of MONOCHROME2 type, I expected the pixel buffer length to be 256x256=65536 elements but it is in fact 131072 elements long.
If I use MATLAB instead to import the pixel data i get exactly 65536 values in the range of 0 - 850 where 0 is black and 850 is white.
When i look at the pixel buffer i get from the GDCM readout in my c++ application the pixelbuffer is 131072 elements where every even indexed element is in the range -128 to +127 and every odd indexed element is in the range 0-3. like this:
Exerpt:
PixelBuffer[120] = -35
PixelBuffer[121] = 0
PixelBuffer[122] = 51
PixelBuffer[123] = 2
PixelBuffer[124] = 71
PixelBuffer[125] = 2
PixelBuffer[126] = 9
PixelBuffer[127] = 2
PixelBuffer[128] = -80
PixelBuffer[129] = 2
PixelBuffer[130] = 87
PixelBuffer[131] = 3
PixelBuffer[132] = 121
PixelBuffer[133] = 3
PixelBuffer[134] = -27
PixelBuffer[135] = 2
PixelBuffer[136] = 27
PixelBuffer[137] = 2
PixelBuffer[138] = -111
PixelBuffer[139] = 1
PixelBuffer[140] = 75
PixelBuffer[141] = 1
PixelBuffer[142] = 103
What does this arrangement of values mean? Is this some kind of typical pixel representation for monochrome images? I have been "googeling image pixel structure" and similar but cant find what I am looking for. Is there some resource available that can help me understand this arrangement of values and how they correlate to each pixel?
I use this code to read 16 bit MONOCHROME2 Dicom file:
byte[] signedData = new byte[2];
List<int> tempInt = new List<int>();
List<ushort> returnValue = new List<ushort>();
for (i = 0; i < PixelBuffer.Length; ++i)
{
i1 = i * 2;
signedData[0] = PixelBuffer[i1];
signedData[1] = PixelBuffer[i1 + 1];
short sVal = System.BitConverter.ToInt16(signedData, 0);
int pixVal = (int)(sVal * rescaleSlope + rescaleIntercept);
tempInt.Add(pixVal);
}
int minPixVal = tempInt.Min();
SignedImage = false;
if (minPixVal < 0) SignedImage = true;
foreach (int pixel in tempInt)
{
ushort val;
if (SignedImage)
val = (ushort)(pixel - short.MinValue);
else
{
if (pixel > ushort.MaxValue) val = ushort.MaxValue;
else val = (ushort)(pixel);
}
returnValue.Add(val);
}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hello I have being trying to develop a C++ function or algorithm that behaves like bit shifting the function will always return 4 byte 00 00 00 00 of any number input ranging from 0 to 99999999
input (int) -> expected output (char or string)
0 -> 00 00 00 00
20 -> 00 00 20 00
200 -> 00 02 00 00
2000 -> 00 20 00 00
99999999-> 99 99 99 99
and can be reversed to return original numbers.
input (char or string)-> expected output (int/double)
00 00 20 00 -> 20
00 02 00 00 -> 200
00 20 00 00 -> 2000
99 99 99 99 -> 99999999
EDIT:
This is the code I have. It came close to what am looking for but still work in progress:
void Convert_to_Decimal(std::string str)
{
double ret;
///insert . after string number 6.
str.insert(5,1,'.');
str.erase(0, str.find_first_not_of('0'));
///remove leading zeros
ret =std::stod(str.c_str());
printf("%g\n", ret);
}
Convert_to_Decimal("00020000");
I will appreciate any pointers or solution to solve this, thank you in advance
Here is a simple solution:
#include <stdint.h>
/* encode a number given as a string into a 4 byte buffer */
void number_convert(unsigned char *dest, const char *str) {
uint32_t v = 0;
while (*str >= '0' && *str <= '9') {
/* parse digits and encode as BCD */
v = (v << 4) + (*str++ - '0');
}
/* make room for 2 decimal places */
v <<= 8;
if (*str == '.') {
if (str[1] >= '0' && str[1] <= '9') {
/* set number of tenths */
v += (str[1] - '0') << 4;
if (str[2] >= '0' && str[2] <= '9') {
/* set number of hundredths */
v += (str[2] - '0');
}
}
}
/* store the BCD value in big endian order */
dest[0] = (v >> 24) & 255;
dest[1] = (v >> 16) & 255;
dest[2] = (v >> 8) & 255;
dest[3] = (v >> 0) & 255;
}
void test(const char *str) {
unsigned char buf[4];
number_convert(buf, str);
printf("%s -> %02X %02X %02X %02X\n", str, buf[0], buf[1], buf[2], buf[3]);
}
int main(void) {
test("0");
test("20");
test("200");
test("2000");
test("123.1");
test("999999.99");
return 0;
}
EDIT
Your code uses a float variable. Your question is unclear: do you want to compute 4 bytes? To do this, you should use a byte array, otherwise, please expand with a more precise explanation for chat you are trying to achieve.
To perform the conversion from your 4 byte digit array back to a number, you can do this:
double convert_BCD_to_double(unsigned char *str) {
long res = 0;
for (int i = 0; i < 4; i++) {
res = res * 100 + (str[i] >> 4) * 10 + (str[i] & 15);
}
return (double)res / 100;
}
For integers, let us define shifting as multiplying or dividing the number by its representation base.
For decimal, a shift right:
300 --> 30
hexadecimal:
0x345 --> 0x34
binary:
1101 --> 110
For decimal, shifting right one digit requires dividing by 10. For hexadecimal, divide by 16 and for binary, divide by 2.
Shifting left is multiplying by the base: decimal - multiply by 10, hexadecimal by 16 and binary by 2.
When the shift goes beyond the edges of the number, you cannot restore the original number by shifting the other direction.
For example, shifting 345 right one digit yields 34. There is no way to get the 5 back by shifting left one digit. The common rule is when a number is shifted, a new digit of 0 is introduced. Thus 34 shifted left one digit yields 340.
Regarding your floating point. I don't see how the bytes 99 99 99 99 produces 999999.99. Is the last byte always to the right of the decimal point?
For shifting bytes, use the operators << and >>. You want to use the largest size integer that contains your byte quantity, such as uint32_t for 4 byte values. Also, use unsigned numbers because you don't want the signed representation to interfere with the shifting.
Edit 1: Example functions
uint32_t Shift_Left(unsigned int value, unsigned int quantity)
{
while (quantity > 0)
{
value = value * 2;
}
return value;
}
uint32_t Shift_Left(unsigned value, unsigned int quantity)
{
return value << quantity;
}
For shifting by bytes, set quantity to 8 or multiples of 8 (8 bits per byte).
Could somebody explain how this actually works for example the char input = 'a'.
I understand that << shift the bits over by four places (for more than one character). But why in the second part add 9? I know 0xf = 15.....Am I missing something obvious.
result = result << 4 | *str + 9 & 0xf;
Here is my understand so far:
char input = 'a' ascii value is 97. Add 9 is 106, 106 in binary is 01101010. 0xf = 15 (00001111), therefore 01101010 & 00001111 = 00001010, this gives the value of 10 and the result is then appended on to result.
Thanks in advance.
First, let's rewrite this with parenthesis to make the order of operations more clear:
result = (result << 4) | ((*str + 9) & 0xf);
If result is 0 on input, then we have:
result = (0 << 4) | ((*str + 9) & 0xf);
Which simplifies to:
result = (0) | ((*str + 9) & 0xf);
And again to:
result = (*str + 9) & 0xf;
Now let's look at the hex and binary representations of a - f:
a = 0x61 = 01100001
b = 0x62 = 01100010
c = 0x63 = 01100011
d = 0x64 = 01100100
e = 0x65 = 01100101
f = 0x66 = 01100110
After adding 9, the & 0xf operation clears out the top 4 bits, so we don't need to worry about those. So we're effectively just adding 9 to the lower 4 bits. In the case of a, the lower 4 bits are 1, so adding 9 gives you 10, and similarly for the others.
As chux mentioned in his comment, a more straightforward way of achieving this is as follows:
result = *str - 'a' + 10;