CRC calculation - bit by bit - C++ - c++

I need to create a program, that calculates CRC from file. It needs to be done bit by bit.
The way I would like to read a file:
unsigned char byte;
ifstream file;
bool result;
int number;
file.open("test.txt", ios::binary);
while(true)
{
byte = file.get();
number = (int)byte;
result = file.good();
if(!result)
break;
}
However, I don't know how to read it bit by bit.
My CRC's divisor (called a "polynomial") is 0x04C11DB7 and I need to import 1 new bit from file each time I calculate my buffer.
My idea is to add first 4 bytes to variable (for let's say "1234" it would be 0x31323334), then remove last bit (by moving the number 1 bit to the left), but I don't know how to add a new bit from the next char.

Do you mean something along these lines?
The CRC calculation may vary, but the focus here is on getting the file content "bit by bit".
#include <iostream>
#include <fstream>
int main(int argc, char* argv[])
{
unsigned char next;
unsigned long crc = 0;
if (argc < 2)
return -1;
std::fstream fs(argv[1], std::fstream::in);
while (!fs.bad() && !fs.eof())
{
fs >> next;
for (int i = 0; i < 8; i++)
{
crc += next & 1;
next >>= 1;
}
}
std::cout << "CRC " << crc << std::endl;
return 0;
}

The divisor is not just called a polynomial. It means that each bit is a coefficient of a polynomial (of degree 32) and thus the way of computing with polynomial differs significantly from working with integers. You can add (and substract, which is the same in this case) two polynomials with a simple XOR operation. Multiplying/Dividing with/by X means shifting. To the right or to the left depends on the order in which the coefficients of the polynomials are written. This is important to know because both directions (left to right and right to left) actually exist. In the case of 0x04C11DB7, the coefficient of X^0 is bit 0 and the coefficient of X^31 is bit 31. Be aware that the popular implementation of the IEEE802.3 CRC has the opposite bit order. So, just copying the implementation of an Ethernet CRC will not work.
This means the next bit to process is always bit 31. You must therefore check for 0x80000000. If the bit is set, XOR your polynomial. This means, you subtract the polynomical from your work register. In any case, shift the result to the left afterwards. Then a 0 bit is shifted in at the right. Replace it with the next bit to process by a binary or operation (| in C++). You obtain that bit in the same way: if you are reading byte by byte, your next bit is 1 or 0, depending on whether 0x80 is set in your input. Then shift your input to the left.

Related

How to convert boost multiprecision integers into big endian from little endian?

I am trying to fix this part of an abandonware program because I failed to find an alternative program.
As you can see the data of PUSH instructions are in the wrong order whereas Ethereum is a big endian machine (address are correctly represented because they use a smaller type).
An alternative is to run porosity.exe --code '0x61004b60026319e44e32' --disassm
Theu256 type is defined as
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
Here’s a minimal example to reproduce the bug:
#include <sstream>
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_int.hpp>
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
int main() {
std::stringstream stream;
u256 data=0xFEDEFA;
for (int i = 0; i<5; ++i) { // print only the first 5 digits
uint8_t dataByte = int(data & 0xFF);
data >>= 8;
stream << std::setfill('0') << std::setw(sizeof(char) * 2) << std::hex << int(dataByte) << " ";
}
std::cout << stream.str();
}
So numbers are converted to string with a space between each byte (and only the first bytes).
But then I ran into an endianness problem: bytes were printed in the reverse order. I mean for example, 31722 is written 8a 02 02 on my machine and 02 02 8a when compiled for a big endian target.
So as I don’t which boost function to call, I modified the code:
#include <sstream>
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_int.hpp>
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
int main() {
std::stringstream stream;
u256 data=0xFEDEFA;
for (int i = 0; i<5; ++i) {
uint8_t dataByte = int(data >> ((32 - i - 1) * 8));
stream << std::setfill('0') << std::setw(sizeof(char) * 2) << std::hex << int(dataByte) << " ";
}
std::cout << stream.str();
}
Now, why are my 256 bits integers printed mostly as series of 00 00 00 00 00?
BTW, this is not an endianness issue; you aren't doing byte accesses to the object-representation. You're operating on it as a 256-bit integer and simply asking for the low 8 bits at a time with data & 0xFF.
If you did know the endianness of the target C implementation, and the data layout of the boost object, you could efficiently loop over it in descending address order with unsigned char*.
You're introducing the idea of endianness only because it's associated with byte-reversal, which is what you're trying to do. But that's really inefficient, just loop over the bytes of your bigint the other way.
I'm hesitant to recommend a specific solution because I don't know what will compile efficiently. But you might want something like this instead of byte-reversing ahead of time:
for (outer loop) {
uint64_t chunk = data >> (64*3); // grab the highest 64-bit chunk
data <<= 64; // and shift everything up
// alternative: maybe keep a shift-count in a variable instead of modifying `data`
// Then pick apart the chunk into its component bytes, in MSB first order
for (int = 0 ; i<8 ; i++) {
unsigned tmp = (chunk >> 56) & 0xFF;
// do something with it
chunk <<= 8; // bring the next byte to the top
}
}
In the inner loop, more efficient than using two shifts can be using a rotate to bring the high byte to the bottom (for & 0xFF) at the same time as shifting lower bytes upward.
Best practices for circular shift (rotate) operations in C++
In the outer loop, IDK if boost::multiprecision::number has any APIs for efficient indexing of chunks built in; if so using that is probably more efficient.
I used nested loops because I assume data <<= 8 doesn't compile particularly efficiently, and neither would (data >> (256-8)) & 0xFF. But that's how you'd grab bytes from the top instead of the bottom.
Another option is the standard trick for converting numbers to strings: store characters into a buffer in descending order. A 256-bit (32-byte) number will take 64 hex digits, and you want another 32 bytes of spaces between them.
For example:
// 97 = 32 * 2 + 32, plus 1 byte for an implicit-length C string terminator
// plus another 1 for an extra space
char buf[98]; // small enough to use automatic storage
char *outp = buf+96; // pointer to the end
*outp = 0; // terminator
const char *hex_lut = "0123456789abcdef";
for (int i=0 ; i<32 ; i++) {
uint8_t byte = data & 0xFF;
*--outp = hex_lut[byte >> 4];
*--outp = hex_lut[byte & 0xF];
*--outp = ' ';
data >>= 8;
}
// outp points at an extra ' '
outp++;
// outp points at the first byte of a string like "12 ab cd"
stream << outp;
If you want to break that up into chunks to put a line break in there, you can do that too.
If you're interested in efficient conversion to hex for 8, 16 or 32 bytes of data at once, see How to convert a number to hex? for some x86 SIMD ways. The asm should port easily to C++ intrinsics. (You can use SIMD shuffles to handle putting bytes into MSB-first printing order after loading from little-endian integers.)
You could also use a SIMD shuffle to space-separate your pairs of hex digits before storing to memory like you apparently want here.
Bug in the code you added:
So I added this code before the loop above:
for(unsigned int i=0,data,_data;i<33;++i)
unsigned i, data, _data declares new variables of type unsigned int that shadow the previous declarations of data and _data. That loop has zero effect on data or _data outside the scope of the loop. (And contains UB because you read _data and data without initializing them.)
If those vars are actually both still the u256 vars of the outer scope, I don't see an obvious problem other than efficiency, but maybe I'm missing the obvious too. I didn't look very hard because using 64x 256-bit shifts and 32x ORs seems like a horrible idea. It's possible it could optimize away completely, or into bswap byte-reverse instructions on ISAs that have them, but I doubt it. Especially not through the extra complication of the boost::multiprecision::number wrapper functions.

Converting 24 bit integer (2s complement) to 32 bit integer in C++

The dataFile.bin is a binary file with 6-byte records. The first 3
bytes of each record contain the latitude and the last 3 bytes contain
the longitude. Each 24 bit value represents radians multiplied by
0X1FFFFF
This is a task I've been working on. I havent done C++ in years so its taking me way longer than I thought it would -_-. After googling around I saw this algorthim which made sense to me.
int interpret24bitAsInt32(byte[] byteArray) {
int newInt = (
((0xFF & byteArray[0]) << 16) |
((0xFF & byteArray[1]) << 8) |
(0xFF & byteArray[2])
);
if ((newInt & 0x00800000) > 0) {
newInt |= 0xFF000000;
} else {
newInt &= 0x00FFFFFF;
}
return newInt;
}
The problem is a syntax issue I am restricting to working by the way the other guy had programmed this. I am not understanding how I can store the CHAR "data" into an INT. Wouldn't it make more sense if "data" was an Array? Since its receiving 24 integers of information stored into a BYTE.
double BinaryFile::from24bitToDouble(char *data) {
int32_t iValue;
// ****************************
// Start code implementation
// Task: Fill iValue with the 24bit integer located at data.
// The first byte is the LSB.
// ****************************
//iValue +=
// ****************************
// End code implementation
// ****************************
return static_cast<double>(iValue) / FACTOR;
}
bool BinaryFile::readNext(DataRecord &record)
{
const size_t RECORD_SIZE = 6;
char buffer[RECORD_SIZE];
m_ifs.read(buffer,RECORD_SIZE);
if (m_ifs) {
record.latitude = toDegrees(from24bitToDouble(&buffer[0]));
record.longitude = toDegrees(from24bitToDouble(&buffer[3]));
return true;
}
return false;
}
double BinaryFile::toDegrees(double radians) const
{
static const double PI = 3.1415926535897932384626433832795;
return radians * 180.0 / PI;
}
I appreciate any help or hints even if you dont understand a clue or hint will help me alot. I just need to talk to someone.
I am not understanding how I can store the CHAR "data" into an INT.
Since char is a numeric type, there is no problem combining them into a single int.
Since its receiving 24 integers of information stored into a BYTE
It's 24 bits, not bytes, so there are only three integer values that need to be combined.
An easier way of producing the same result without using conditionals is as follows:
int interpret24bitAsInt32(byte[] byteArray) {
return (
(byteArray[0] << 24)
| (byteArray[1] << 16)
| (byteArray[2] << 8)
) >> 8;
}
The idea is to store the three bytes supplied as an input into the upper three bytes of the four-byte int, and then shift it down by one byte. This way the program would sign-extend your number automatically, avoiding conditional execution.
Note on portability: This code is not portable, because it assumes 32-bit integer size. To make it portable use <cstdint> types:
int32_t interpret24bitAsInt32(const std::array<uint8_t,3> byteArray) {
return (
(const_cast<int32_t>(byteArray[0]) << 24)
| (const_cast<int32_t>(byteArray[1]) << 16)
| (const_cast<int32_t>(byteArray[2]) << 8)
) >> 8;
}
It also assumes that the most significant byte of the 24-bit number is stored in the initial element of byteArray, then comes the middle element, and finally the least significant byte.
Note on sign extension: This code automatically takes care of sign extension by constructing the value in the upper three bytes and then shifting it to the right, as opposed to constructing the value in the lower three bytes right away. This additional shift operation ensures that C++ takes care of sign-extending the result for us.
When an unsigned char is casted to an int the higher order bits are filled with 0's
When a signed char is casted to a casted int, the sign bit is extended.
ie:
int x;
char y;
unsigned char z;
y=0xFF
z=0xFF
x=y;
/*x will be 0xFFFFFFFF*/
x=z;
/*x will be 0x000000FF*/
So, your algorithm, uses 0xFF as a mask to remove C' sign extension, ie
0xFF == 0x000000FF
0xABCDEF10 & 0x000000FF == 0x00000010
Then uses bit shifts and logical ands to put the bits in their proper place.
Lastly checks the most significant bit (newInt & 0x00800000) > 0 to decide if completing with 0's or ones the highest byte.
int32_t upperByte = ((int32_t) dataRx[0] << 24);
int32_t middleByte = ((int32_t) dataRx[1] << 16);
int32_t lowerByte = ((int32_t) dataRx[2] << 8);
int32_t ADCdata32 = (((int32_t) (upperByte | middleByte | lowerByte)) >> 8); // Right-shift of signed data maintains signed bit

FP number's exponent field is not what I expected, why?

I've been stumped on this one for days. I've written this program from a book called Write Great Code Volume 1 Understanding the Machine Chapter four.
The project is to do Floating Point operations in C++. I plan to implement the other operations in C++ on my own; the book uses HLA (High Level Assembly) in the project for other operations like multiplication and division.
I wanted to display the exponent and other field values after they've been extracted from the FP number; for debugging. Yet I have a problem: when I look at these values in memory they are not what I think they should be. Key words: what I think. I believe I understand the IEEE FP format; its fairly simple and I understand all I've read so far in the book.
The big problem is why the Rexponent variable seems to be almost unpredictable; in this example with the given values its 5. Why is that? By my guess it should be two. Two because the decimal point is two digits right of the implied one.
I've commented the actual values that are produced in the program in to the code so you don't have to run the program to get a sense of whats happening (at least in the important parts).
It is unfinished at this point. The entire project has not been created on my computer yet.
Here is the code (quoted from the file which I copied from the book and then modified):
#include<iostream>
typedef long unsigned real; //typedef our long unsigned ints in to the label "real" so we don't confuse it with other datatypes.
using namespace std; //Just so I don't have to type out std::cout any more!
#define asreal(x) (*((float *) &x)) //Cast the address of X as a float pointer as a pointer. So we don't let the compiler truncate our FP values when being converted.
inline int extractExponent(real from) {
return ((from >> 23) & 0xFF) - 127; //Shift right 23 bits; & with eight ones (0xFF == 1111_1111 ) and make bias with the value by subtracting all ones from it.
}
void fpadd ( real left, real right, real *dest) {
//Left operand field containers
long unsigned int Lexponent = 0;
long unsigned Lmantissa = 0;
int Lsign = 0;
//RIGHT operand field containers
long unsigned int Rexponent = 0;
long unsigned Rmantissa = 0;
int Rsign = 0;
//Resulting operand field containers
long int Dexponent = 0;
long unsigned Dmantissa = 0;
int Dsign = 0;
std::cout << "Size of datatype: long unsigned int is: " << sizeof(long unsigned int); //For debugging
//Properly initialize the above variable's:
//Left
Lexponent = extractExponent(left); //Zero. This value is NOT a flat zero when displayed because we subtract 127 from the exponent after extracting it! //Value is: 0xffffff81
Lmantissa = extractMantissa (left); //Zero. We don't do anything to this number except add a whole number one to it. //Value is: 0x00000000
Lsign = extractSign(left); //Simple.
//Right
**Rexponent = extractExponent(right); //Value is: 0x00000005 <-- why???**
Rmantissa = extractMantissa (right);
Rsign = extractSign(right);
}
int main (int argc, char *argv[]) {
real a, b, c;
asreal(a) = -0.0;
asreal(b) = 45.67;
fpadd(a,b, &c);
printf("Sum of A and B is: %f", c);
std::cin >> a;
return 0;
}
Help would be much appreciated; I'm several days in to this project and very frustrated!
in this example with the given values its 5. Why is that?
The floating point number 45.67 is internally represented as
2^5 * 1.0110110101011100001010001111010111000010100011110110
which actually represents the number
45.6700000000000017053025658242404460906982421875
This is as close as you can get to 45.67 inside float.
If all you are interested in is the exponent of a number, simply compute its base 2 logarithm and round down. Since 45.67 is between 32 (2^5) and 64 (2^6), the exponent is 5.
Computers use binary representation for all numbers. Hence, the exponent is for base two, not base ten. int(log2(45.67)) = 5.

bit pattern matching and replacing

I come across a very tricky problem with bit manipulation.
As far as I know, the smallest variable size to hold a value is one byte of 8 bits. The bit operations available in C/C++ apply to an entire unit of bytes.
Imagine that I have a map to replace a binary pattern 100100 (6 bits) with a signal 10000 (5 bits). If the 1st byte of input data from a file is 10010001 (8 bits) being stored in a char variable, part of it matches the 6 bit pattern and therefore be replaced by the 5 bit signal to give a result of 1000001 (7 bits).
I can use a mask to manipulate the bits within a byte to get a result of the left most bits to 10000 (5 bit) but the right most 3 bits become very tricky to manipulate. I cannot shift the right most 3 bits of the original data to get the correct result 1000001 (7 bit) followed by 1 padding bit in that char variable that should be filled by the 1st bit of next followed byte of input.
I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes). Can C/C++ do the trick or we have to go for other assembly languages that deal with single bits manipulations?
I heard that Power Basic may be able to do the bit-by-bit manipulation better than C/C++.
If time and space are not important then you can convert the bits to a string representation and perform replaces on the string, then convert back when needed. Not an elegant solution but one that works.
<< shiftleft
^ XOR
>> shift right
~ one's complement
Using these operations, you could easily isolate the pieces that you are interested in and compare them as integers.
say the byte 001000100 and you want to check if it contains 1000:
char k = (char)68;
char c = (char)8;
int i = 0;
while(i<5){
if((k<<i)>>(8-3-i) == c){
//do stuff
break;
}
}
This is very sketchy code, just meant to be a demonstration.
I wonder if C/C++ can actually do this
sort of replacement of bit patterns of
length that do not fit into a Char (1
byte) variable or even Int (4 bytes).
What about std::bitset?
Here's a small bit reader class which may suit your needs. Of course, you may want to create a bit writer for your use case.
#include <iostream>
#include <sstream>
#include <cassert>
class BitReader {
public:
typedef unsigned char BitBuffer;
BitReader(std::istream &input) :
input(input), bufferedBits(8) {
}
BitBuffer peekBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
skipBits(0); // Make sure we have a non-empty buffer
return (((input.peek() << 8) | buffer) >> bufferedBits) & ((1 << numBits) - 1);
}
void skipBits(int numBits) {
assert(numBits >= 0);
numBits += bufferedBits;
while (numBits > 8) {
buffer = input.get();
numBits -= 8;
}
bufferedBits = numBits;
}
BitBuffer readBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
BitBuffer ret = peekBits(numBits);
skipBits(numBits);
return ret;
}
bool eof() const {
return input.eof();
}
private:
std::istream &input;
BitBuffer buffer;
int bufferedBits; // How many bits are buffered into 'buffer' (0 = empty)
};
Use a vector<bool> if you can read your data into the vector mostly at once. It may be more difficult to find-and-replace sequences of bits, though.
If I understood your questions correctly, you have an input stream and and output stream and you want to replace the 6bits of the input with 5 in the output - and your output still should be a bit stream?
So, the most important programmer's rule can be applied: Divide et impera!
You should split your component in three parts:
Input Stream converter: Convert every pattern in the input stream to a char array (ring) buffer. If I understood you correctly your input "commands" are 8bit long, so there is nothing special about this.
Do the replacement on the ring buffer in a way that you replace every matching 6-bit pattern with the 5bit one, but "pad" the 5 bit with a leading zero, so the total length is still 8bit.
Write an output handler that reads from the ring buffer and let this output handler write only the 7 LSB to the output stream from each input byte. Of course some bit manipulation is necessary again for this.
If your ring buffer size can be divided by 8 and 7 (= is a multiple of 56) you will have a clean buffer at the end and can start again with 1.
The most simplest way to implement this is to iterate over this 3 steps as long as input data is available.
If a performance really matters and you are running on a multi-core CPU you even could split the steps and 3 threads, but then you must carefully synchronize the access to the ring buffer.
I think the following does what you want.
PATTERN_LEN = 6
PATTERNMASK = 0x3F //6 bits
PATTERN = 0x24 //b100100
REPLACE_LEN = 5
REPLACEMENT = 0x10 //b10000
void compress(uint8* inbits, uint8* outbits, int len)
{
uint16 accumulator=0;
int nbits=0;
uint8 candidate;
while (len--) //for all input bytes
{
//for each bit (msb first)
for (i=7;i<=0;i--)
{
//add 1 bit to accumulator
accumulator<<=1;
accumulator|=(*inbits&(1<<i));
nbits++;
//check for pattern
candidate = accumulator&PATTERNMASK;
if (candidate==PATTERN)
{
//remove pattern
accumulator>>=PATTERN_LEN;
//add replacement
accumulator<<=REPLACE_LEN;
accumulator|=REPLACMENT;
nbits+= (REPLACE_LEN - PATTERN_LEN);
}
}
inbits++;
//move accumulator to output to prevent overflow
while (nbits>8)
{
//copy the highest 8 bits
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
//clear them from accumulator
accumulator&= ~(0xFF<<nbits);
}
}
//copy remainder of accumulator to output
while (nbits>0)
{
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
accumulator&= ~(0xFF<<nbits);
}
}
You could use a switch or a loop in the middle to check the candidate against multiple patterns. There might have to be some special handling after doing a replacment to ensure the replacement pattern is not re-checked for matches.
#include <iostream>
#include <cstring>
size_t matchCount(const char* str, size_t size, char pat, size_t bsize) noexcept
{
if (bsize > 8) {
return 0;
}
size_t bcount = 0; // curr bit number
size_t pcount = 0; // curr bit in pattern char
size_t totalm = 0; // total number of patterns matched
const size_t limit = size*8;
while (bcount < limit)
{
auto offset = bcount%8;
char c = str[bcount/8];
c >>= offset;
char tpat = pat >> pcount;
if ((c & 1) == (tpat & 1))
{
++pcount;
if (pcount == bsize)
{
++totalm;
pcount = 0;
}
}
else // mismatch
{
bcount -= pcount; // backtrack
//reset
pcount = 0;
}
++bcount;
}
return totalm;
}
int main(int argc, char** argv)
{
const char* str = "abcdefghiibcdiixyz";
char pat = 'i';
std::cout << "Num matches = " << matchCount(str, 18, pat, 7) << std::endl;
return 0;
}

C/C++: Bitwise operators on dynamically allocated memory

In C/C++, is there an easy way to apply bitwise operators (specifically left/right shifts) to dynamically allocated memory?
For example, let's say I did this:
unsigned char * bytes=new unsigned char[3];
bytes[0]=1;
bytes[1]=1;
bytes[2]=1;
I would like a way to do this:
bytes>>=2;
(then the 'bytes' would have the following values):
bytes[0]==0
bytes[1]==64
bytes[2]==64
Why the values should be that way:
After allocation, the bytes look like this:
[00000001][00000001][00000001]
But I'm looking to treat the bytes as one long string of bits, like this:
[000000010000000100000001]
A right shift by two would cause the bits to look like this:
[000000000100000001000000]
Which finally looks like this when separated back into the 3 bytes (thus the 0, 64, 64):
[00000000][01000000][01000000]
Any ideas? Should I maybe make a struct/class and overload the appropriate operators? Edit: If so, any tips on how to proceed? Note: I'm looking for a way to implement this myself (with some guidance) as a learning experience.
I'm going to assume you want bits carried from one byte to the next, as John Knoeller suggests.
The requirements here are insufficient. You need to specify the order of the bits relative to the order of the bytes - when the least significant bit falls out of one byte, does to go to the next higher or next lower byte.
What you are describing, though, used to be very common for graphics programming. You have basically described a monochrome bitmap horizontal scrolling algorithm.
Assuming that "right" means higher addresses but less significant bits (ie matching the normal writing conventions for both) a single-bit shift will be something like...
void scroll_right (unsigned char* p_Array, int p_Size)
{
unsigned char orig_l = 0;
unsigned char orig_r;
unsigned char* dest = p_Array;
while (p_Size > 0)
{
p_Size--;
orig_r = *p_Array++;
*dest++ = (orig_l << 7) + (orig_r >> 1);
orig_l = orig_r;
}
}
Adapting the code for variable shift sizes shouldn't be a big problem. There's obvious opportunities for optimisation (e.g. doing 2, 4 or 8 bytes at a time) but I'll leave that to you.
To shift left, though, you should use a separate loop which should start at the highest address and work downwards.
If you want to expand "on demand", note that the orig_l variable contains the last byte above. To check for an overflow, check if (orig_l << 7) is non-zero. If your bytes are in an std::vector, inserting at either end should be no problem.
EDIT I should have said - optimising to handle 2, 4 or 8 bytes at a time will create alignment issues. When reading 2-byte words from an unaligned char array, for instance, it's best to do the odd byte read first so that later word reads are all at even addresses up until the end of the loop.
On x86 this isn't necessary, but it is a lot faster. On some processors it's necessary. Just do a switch based on the base (address & 1), (address & 3) or (address & 7) to handle the first few bytes at the start, before the loop. You also need to special case the trailing bytes after the main loop.
Decouple the allocation from the accessor/mutators
Next, see if a standard container like bitset can do the job for you
Otherwise check out boost::dynamic_bitset
If all fails, roll your own class
Rough example:
typedef unsigned char byte;
byte extract(byte value, int startbit, int bitcount)
{
byte result;
result = (byte)(value << (startbit - 1));
result = (byte)(result >> (CHAR_BITS - bitcount));
return result;
}
byte *right_shift(byte *bytes, size_t nbytes, size_t n) {
byte rollover = 0;
for (int i = 0; i < nbytes; ++i) {
bytes[ i ] = (bytes[ i ] >> n) | (rollover < n);
byte rollover = extract(bytes[ i ], 0, n);
}
return &bytes[ 0 ];
}
Here's how I would do it for two bytes:
unsigned int rollover = byte[0] & 0x3;
byte[0] >>= 2;
byte[1] = byte[1] >> 2 | (rollover << 6);
From there, you can generalize this into a loop for n bytes. For flexibility, you will want to generate the magic numbers (0x3 and 6) rather then hardcode them.
I'd look into something similar to this:
#define number_of_bytes 3
template<size_t num_bytes>
union MyUnion
{
char bytes[num_bytes];
__int64 ints[num_bytes / sizeof(__int64) + 1];
};
void main()
{
MyUnion<number_of_bytes> mu;
mu.bytes[0] = 1;
mu.bytes[1] = 1;
mu.bytes[2] = 1;
mu.ints[0] >>= 2;
}
Just play with it. You'll get the idea I believe.
Operator overloading is syntactic sugar. It's really just a way of calling a function and passing your byte array without having it look like you are calling a function.
So I would start by writing this function
unsigned char * ShiftBytes(unsigned char * bytes, size_t count_of_bytes, int shift);
Then if you want to wrap this up in an operator overload in order to make it easier to use or because you just prefer that syntax, you can do that as well. Or you can just call the function.