C++ Compressing eight booleans into a character - c++

I have a large mass of integers that I'm reading from a file. All of them will be either 0 or 1, so I have converted each read integer to a boolean.
What I need to do is take advantage of the space (8 bits) that a character provides by packing every 8 bits/booleans into a single character. How can I do this?
I have experimented with binary operations, but I'm not coming up with what I want.
int count = 7;
unsigned char compressedValue = 0x00;
while(/*Not end of file*/)
{
...
compressedValue |= booleanValue << count;
count--;
if (count == 0)
{
count = 7;
//write char to stream
compressedValue &= 0;
}
}
Update
I have updated the code to reflect some corrections suggested so far. My next question is, how should I initialize/clear the unsigned char?
Update
Reflected the changes to clear the character bits.
Thanks for the help, everyone.

Several notes:
while(!in.eof()) is wrong, you have to first try(!) to read something and if that succeeded, you can use the data.
Use an unsigned char to get an integer of at least eight bits. Alternatively, look into stdint.h and use uint8_t (or uint_least8_t).
The shift operation is in the wrong direction, use uint8_t(1) << count instead.
If you want to do something like that in memory, I'd use a bigger type, like 32 or 64 bits, because reading a byte is still a single RAM access even if much more than a byte could be read at once.
After writing a byte, don't forget to zero the temporary.

As Mooing Duck suggested, you can use a bitset.
The source code is only a proof of concept - especially the file-read has to be implemented.
#include <bitset>
#include <cstdint>
#include <iostream>
int main() {
char const fcontent[56] { "\0\001\0\001\0\001\0\001\0\001"
"\001\001\001\001\001\001\001\001\001\001\001\001"
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
"\0\001\0\001\0\001\0\001" };
for( int i { 0 }; i < 56; i += 8 ) {
std::bitset<8> const bs(fcontent+i, 8, '\0', '\001');
std::cout << bs.to_ulong() << " ";
}
std::cout << std::endl;
return 0;
}
Output:
85 127 252 0 0 1 84

The standard guaranties that vector<bool> is packed the way you want. Don't reinvent the wheel.more info here

Related

How to convert boost multiprecision integers into big endian from little endian?

I am trying to fix this part of an abandonware program because I failed to find an alternative program.
As you can see the data of PUSH instructions are in the wrong order whereas Ethereum is a big endian machine (address are correctly represented because they use a smaller type).
An alternative is to run porosity.exe --code '0x61004b60026319e44e32' --disassm
Theu256 type is defined as
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
Here’s a minimal example to reproduce the bug:
#include <sstream>
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_int.hpp>
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
int main() {
std::stringstream stream;
u256 data=0xFEDEFA;
for (int i = 0; i<5; ++i) { // print only the first 5 digits
uint8_t dataByte = int(data & 0xFF);
data >>= 8;
stream << std::setfill('0') << std::setw(sizeof(char) * 2) << std::hex << int(dataByte) << " ";
}
std::cout << stream.str();
}
So numbers are converted to string with a space between each byte (and only the first bytes).
But then I ran into an endianness problem: bytes were printed in the reverse order. I mean for example, 31722 is written 8a 02 02 on my machine and 02 02 8a when compiled for a big endian target.
So as I don’t which boost function to call, I modified the code:
#include <sstream>
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_int.hpp>
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
int main() {
std::stringstream stream;
u256 data=0xFEDEFA;
for (int i = 0; i<5; ++i) {
uint8_t dataByte = int(data >> ((32 - i - 1) * 8));
stream << std::setfill('0') << std::setw(sizeof(char) * 2) << std::hex << int(dataByte) << " ";
}
std::cout << stream.str();
}
Now, why are my 256 bits integers printed mostly as series of 00 00 00 00 00?
BTW, this is not an endianness issue; you aren't doing byte accesses to the object-representation. You're operating on it as a 256-bit integer and simply asking for the low 8 bits at a time with data & 0xFF.
If you did know the endianness of the target C implementation, and the data layout of the boost object, you could efficiently loop over it in descending address order with unsigned char*.
You're introducing the idea of endianness only because it's associated with byte-reversal, which is what you're trying to do. But that's really inefficient, just loop over the bytes of your bigint the other way.
I'm hesitant to recommend a specific solution because I don't know what will compile efficiently. But you might want something like this instead of byte-reversing ahead of time:
for (outer loop) {
uint64_t chunk = data >> (64*3); // grab the highest 64-bit chunk
data <<= 64; // and shift everything up
// alternative: maybe keep a shift-count in a variable instead of modifying `data`
// Then pick apart the chunk into its component bytes, in MSB first order
for (int = 0 ; i<8 ; i++) {
unsigned tmp = (chunk >> 56) & 0xFF;
// do something with it
chunk <<= 8; // bring the next byte to the top
}
}
In the inner loop, more efficient than using two shifts can be using a rotate to bring the high byte to the bottom (for & 0xFF) at the same time as shifting lower bytes upward.
Best practices for circular shift (rotate) operations in C++
In the outer loop, IDK if boost::multiprecision::number has any APIs for efficient indexing of chunks built in; if so using that is probably more efficient.
I used nested loops because I assume data <<= 8 doesn't compile particularly efficiently, and neither would (data >> (256-8)) & 0xFF. But that's how you'd grab bytes from the top instead of the bottom.
Another option is the standard trick for converting numbers to strings: store characters into a buffer in descending order. A 256-bit (32-byte) number will take 64 hex digits, and you want another 32 bytes of spaces between them.
For example:
// 97 = 32 * 2 + 32, plus 1 byte for an implicit-length C string terminator
// plus another 1 for an extra space
char buf[98]; // small enough to use automatic storage
char *outp = buf+96; // pointer to the end
*outp = 0; // terminator
const char *hex_lut = "0123456789abcdef";
for (int i=0 ; i<32 ; i++) {
uint8_t byte = data & 0xFF;
*--outp = hex_lut[byte >> 4];
*--outp = hex_lut[byte & 0xF];
*--outp = ' ';
data >>= 8;
}
// outp points at an extra ' '
outp++;
// outp points at the first byte of a string like "12 ab cd"
stream << outp;
If you want to break that up into chunks to put a line break in there, you can do that too.
If you're interested in efficient conversion to hex for 8, 16 or 32 bytes of data at once, see How to convert a number to hex? for some x86 SIMD ways. The asm should port easily to C++ intrinsics. (You can use SIMD shuffles to handle putting bytes into MSB-first printing order after loading from little-endian integers.)
You could also use a SIMD shuffle to space-separate your pairs of hex digits before storing to memory like you apparently want here.
Bug in the code you added:
So I added this code before the loop above:
for(unsigned int i=0,data,_data;i<33;++i)
unsigned i, data, _data declares new variables of type unsigned int that shadow the previous declarations of data and _data. That loop has zero effect on data or _data outside the scope of the loop. (And contains UB because you read _data and data without initializing them.)
If those vars are actually both still the u256 vars of the outer scope, I don't see an obvious problem other than efficiency, but maybe I'm missing the obvious too. I didn't look very hard because using 64x 256-bit shifts and 32x ORs seems like a horrible idea. It's possible it could optimize away completely, or into bswap byte-reverse instructions on ISAs that have them, but I doubt it. Especially not through the extra complication of the boost::multiprecision::number wrapper functions.

C hack for storing a bit that takes 1 bit space?

I have a long list of numbers between 0 and 67600. Now I want to store them using an array that is 67600 elements long. An element is set to 1 if a number was in the set and it is set to 0 if the number is not in the set. ie. each time I need only 1bit information for storing the presence of a number. Is there any hack in C/C++ that helps me achieve this?
In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector, see this) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!
In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. << and >> are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t. If you want to store N bits, it will take N/32 of these uint32_ts. Bit i is stored in the i % 32'th bit of the i / 32'th uint32_t. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.
There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.
Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc/new it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)
You should use std::bitset.
std::bitset functions like an array of bool (actually like std::array, since it copies by value), but only uses 1 bit of storage for each element.
Another option is vector<bool>, which I don't recommend because:
It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.
*For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool>. Perhaps a nitpick for your usage case, but still worth knowing about.
There is in fact! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool
See the doc, it stores it as efficiently as possible.
Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset
If you want to write it in C, have an array of char that is 67601 bits in length (67601/8 = 8451) and then turn on/off the appropriate bit for each value.
Others have given the right idea. Here's my own implementation of a bitsarr, or 'array' of bits. An unsigned char is one byte, so it's essentially an array of unsigned chars that stores information in individual bits. I added the option of storing TWO or FOUR bit values in addition to ONE bit values, because those both divide 8 (the size of a byte), and would be useful if you want to store a huge number of integers that will range from 0-3 or 0-15.
When setting and getting, the math is done in the functions, so you can just give it an index as if it were a normal array--it knows where to look.
Also, it's the user's responsibility to not pass a value to set that's too large, or it will screw up other values. It could be modified so that overflow loops back around to 0, but that would just make it more convoluted, so I decided to trust myself.
#include<stdio.h>
#include <stdlib.h>
#define BYTE 8
typedef enum {ONE=1, TWO=2, FOUR=4} numbits;
typedef struct bitsarr{
unsigned char* buckets;
numbits n;
} bitsarr;
bitsarr new_bitsarr(int size, numbits n)
{
int b = sizeof(unsigned char)*BYTE;
int numbuckets = (size*n + b - 1)/b;
bitsarr ret;
ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
ret.n = n;
return ret;
}
void bitsarr_delete(bitsarr xp)
{
free(xp.buckets);
}
void bitsarr_set(bitsarr *xp, int index, int value)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);
//longer version
/*unsigned int width, width_in_place, zeros, old, newbits, new;
width = (1 << xp->n) - 1;
width_in_place = width << innerdex*xp->n;
zeros = ~width_in_place;
old = xp->buckets[buckdex];
old = old & zeros;
newbits = value << innerdex*xp->n;
new = newbits | old;
xp->buckets[buckdex] = new; */
}
int bitsarr_get(bitsarr *xp, int index)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;
//longer version
/*unsigned int width = (1 << xp->n) - 1;
unsigned int width_in_place = width << innerdex*xp->n;
unsigned int val = xp->buckets[buckdex];
unsigned int retshifted = width_in_place & val;
unsigned int ret = retshifted >> innerdex*xp->n;
return ret; */
}
int main()
{
bitsarr x = new_bitsarr(100, FOUR);
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, 15-i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
bitsarr_delete(x);
}

Parsing a binary message in C++. Any lib with examples?

I am looking for any library of example parsing a binary msg in C++. Most people asks for reading a binary file, or data received in a socket, but I just have a set of binary messages I need to decode. Somebody mentioned boost::spirit, but I haven't been able to find a suitable example for my needs.
As an example:
9A690C12E077033811FFDFFEF07F042C1CE0B704381E00B1FEFFF78004A92440
where first 8 bits are a preamble, next 6 bits the msg ID (an integer from 0 to 63), next 212 bits are data, and final 24 bits are a CRC24.
So in this case, msg 26, I have to get this data from the 212 data bits:
4 bits integer value
4 bits integer value
A 9 bit float value from 0 to 63.875, where LSB is 0.125
4 bits integer value
EDIT: I need to operate at bit level, so a memcpy is not a good solution, since it copies a number of bytes. To get first 4-bit integer value I should get 2 bits from a byte, and another 2 bits from the next byte, shift each pair and compose. What I am asking for is a more elegant way of extracting the values, because I have about 20 different messages and wanted to reach a common solution to parse them at bit level.
And so on.
Do you know os any library which can easily achieve this?
I also found other Q/A where static_cast is being used. I googled about it, and for each person recommending this approach, there is another one warning about endians. Since I already have my message, I don't know if such a warning applies to me, or is just for socket communications.
EDIT: boost:dynamic_bitset looks promising. Any help using it?
If you can't find a generic library to parse your data, use bitfields to get the data and memcpy() it into an variable of the struct. See the link Bitfields. This will be more streamlined towards your application.
Don't forget to pack the structure.
Example:
#pragma pack
include "order32.h"
struct yourfields{
#if O32_HOST_ORDER == O32_BIG_ENDIAN
unsigned int preamble:8;
unsigned int msgid:6;
unsigned data:212;
unsigned crc:24;
#else
unsigned crc:24;
unsigned data:212;
unsigned int msgid:6;
unsigned int preamble:8;
#endif
}/*__attribute__((packed)) for gcc*/;
You can do a little compile time check to assert if your machine uses LITTLE ENDIAN or BIG ENDIAN format. After that define it into a PREPROCESSOR SYMBOL::
//order32.h
#ifndef ORDER32_H
#define ORDER32_H
#include <limits.h>
#include <stdint.h>
#if CHAR_BIT != 8
#error "unsupported char size"
#endif
enum
{
O32_LITTLE_ENDIAN = 0x03020100ul,
O32_BIG_ENDIAN = 0x00010203ul,
O32_PDP_ENDIAN = 0x01000302ul
};
static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
{ { 0, 1, 2, 3 } };
#define O32_HOST_ORDER (o32_host_order.value)
#endif
Thanks to code by Christoph # here
Example program for using bitfields and their outputs:
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <memory.h>
using namespace std;
struct bitfields{
unsigned opcode:5;
unsigned info:3;
}__attribute__((packed));
struct bitfields opcodes;
/* info: 3bits; opcode: 5bits;*/
/* 001 10001 => 0x31*/
/* 010 10010 => 0x52*/
void set_data(unsigned char data)
{
memcpy(&opcodes,&data,sizeof(data));
}
void print_data()
{
cout << opcodes.opcode << ' ' << opcodes.info << endl;
}
int main(int argc, char *argv[])
{
set_data(0x31);
print_data(); //must print 17 1 on my little-endian machine
set_data(0x52);
print_data(); //must print 18 2
cout << sizeof(opcodes); //must print 1
return 0;
}
You can manipulate bits for your own, for example to parse 4 bit integer value do:
char[64] byte_data;
size_t readPos = 3; //any byte
int value = 0;
int bits_to_read = 4;
for (size_t i = 0; i < bits_to_read; ++i) {
value |= static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
}
Floats usually sent as string data:
std::string temp;
temp.assign(_data+readPos, 9);
flaot value = std::stof(temp);
If your data contains custom float format then just extract bits and do your math:
char[64] byte_data;
size_t readPos = 3; //any byte
float value = 0;
int i = 0;
int bits_to_read = 9;
while (bits_to_read) {
if (i > 8) {
++readPos;
i = 0;
}
const int bit = static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
//here your code
++i;
--bits_to_read;
}
Here is a good article that describes several solutions to the problem.
It even contains the reference to the ibstream class that the author created specifically for this purpose (the link seems dead, though). The only other mention of this class I could find is in the bit C++ library here - it might be what you need, though it's not popular and it's under GPL.
Anyway, the boost::dynamic_bitset might be the best choice as it's time-tested and community-proven. But I have no personal experience with it.

bit pattern matching and replacing

I come across a very tricky problem with bit manipulation.
As far as I know, the smallest variable size to hold a value is one byte of 8 bits. The bit operations available in C/C++ apply to an entire unit of bytes.
Imagine that I have a map to replace a binary pattern 100100 (6 bits) with a signal 10000 (5 bits). If the 1st byte of input data from a file is 10010001 (8 bits) being stored in a char variable, part of it matches the 6 bit pattern and therefore be replaced by the 5 bit signal to give a result of 1000001 (7 bits).
I can use a mask to manipulate the bits within a byte to get a result of the left most bits to 10000 (5 bit) but the right most 3 bits become very tricky to manipulate. I cannot shift the right most 3 bits of the original data to get the correct result 1000001 (7 bit) followed by 1 padding bit in that char variable that should be filled by the 1st bit of next followed byte of input.
I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes). Can C/C++ do the trick or we have to go for other assembly languages that deal with single bits manipulations?
I heard that Power Basic may be able to do the bit-by-bit manipulation better than C/C++.
If time and space are not important then you can convert the bits to a string representation and perform replaces on the string, then convert back when needed. Not an elegant solution but one that works.
<< shiftleft
^ XOR
>> shift right
~ one's complement
Using these operations, you could easily isolate the pieces that you are interested in and compare them as integers.
say the byte 001000100 and you want to check if it contains 1000:
char k = (char)68;
char c = (char)8;
int i = 0;
while(i<5){
if((k<<i)>>(8-3-i) == c){
//do stuff
break;
}
}
This is very sketchy code, just meant to be a demonstration.
I wonder if C/C++ can actually do this
sort of replacement of bit patterns of
length that do not fit into a Char (1
byte) variable or even Int (4 bytes).
What about std::bitset?
Here's a small bit reader class which may suit your needs. Of course, you may want to create a bit writer for your use case.
#include <iostream>
#include <sstream>
#include <cassert>
class BitReader {
public:
typedef unsigned char BitBuffer;
BitReader(std::istream &input) :
input(input), bufferedBits(8) {
}
BitBuffer peekBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
skipBits(0); // Make sure we have a non-empty buffer
return (((input.peek() << 8) | buffer) >> bufferedBits) & ((1 << numBits) - 1);
}
void skipBits(int numBits) {
assert(numBits >= 0);
numBits += bufferedBits;
while (numBits > 8) {
buffer = input.get();
numBits -= 8;
}
bufferedBits = numBits;
}
BitBuffer readBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
BitBuffer ret = peekBits(numBits);
skipBits(numBits);
return ret;
}
bool eof() const {
return input.eof();
}
private:
std::istream &input;
BitBuffer buffer;
int bufferedBits; // How many bits are buffered into 'buffer' (0 = empty)
};
Use a vector<bool> if you can read your data into the vector mostly at once. It may be more difficult to find-and-replace sequences of bits, though.
If I understood your questions correctly, you have an input stream and and output stream and you want to replace the 6bits of the input with 5 in the output - and your output still should be a bit stream?
So, the most important programmer's rule can be applied: Divide et impera!
You should split your component in three parts:
Input Stream converter: Convert every pattern in the input stream to a char array (ring) buffer. If I understood you correctly your input "commands" are 8bit long, so there is nothing special about this.
Do the replacement on the ring buffer in a way that you replace every matching 6-bit pattern with the 5bit one, but "pad" the 5 bit with a leading zero, so the total length is still 8bit.
Write an output handler that reads from the ring buffer and let this output handler write only the 7 LSB to the output stream from each input byte. Of course some bit manipulation is necessary again for this.
If your ring buffer size can be divided by 8 and 7 (= is a multiple of 56) you will have a clean buffer at the end and can start again with 1.
The most simplest way to implement this is to iterate over this 3 steps as long as input data is available.
If a performance really matters and you are running on a multi-core CPU you even could split the steps and 3 threads, but then you must carefully synchronize the access to the ring buffer.
I think the following does what you want.
PATTERN_LEN = 6
PATTERNMASK = 0x3F //6 bits
PATTERN = 0x24 //b100100
REPLACE_LEN = 5
REPLACEMENT = 0x10 //b10000
void compress(uint8* inbits, uint8* outbits, int len)
{
uint16 accumulator=0;
int nbits=0;
uint8 candidate;
while (len--) //for all input bytes
{
//for each bit (msb first)
for (i=7;i<=0;i--)
{
//add 1 bit to accumulator
accumulator<<=1;
accumulator|=(*inbits&(1<<i));
nbits++;
//check for pattern
candidate = accumulator&PATTERNMASK;
if (candidate==PATTERN)
{
//remove pattern
accumulator>>=PATTERN_LEN;
//add replacement
accumulator<<=REPLACE_LEN;
accumulator|=REPLACMENT;
nbits+= (REPLACE_LEN - PATTERN_LEN);
}
}
inbits++;
//move accumulator to output to prevent overflow
while (nbits>8)
{
//copy the highest 8 bits
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
//clear them from accumulator
accumulator&= ~(0xFF<<nbits);
}
}
//copy remainder of accumulator to output
while (nbits>0)
{
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
accumulator&= ~(0xFF<<nbits);
}
}
You could use a switch or a loop in the middle to check the candidate against multiple patterns. There might have to be some special handling after doing a replacment to ensure the replacement pattern is not re-checked for matches.
#include <iostream>
#include <cstring>
size_t matchCount(const char* str, size_t size, char pat, size_t bsize) noexcept
{
if (bsize > 8) {
return 0;
}
size_t bcount = 0; // curr bit number
size_t pcount = 0; // curr bit in pattern char
size_t totalm = 0; // total number of patterns matched
const size_t limit = size*8;
while (bcount < limit)
{
auto offset = bcount%8;
char c = str[bcount/8];
c >>= offset;
char tpat = pat >> pcount;
if ((c & 1) == (tpat & 1))
{
++pcount;
if (pcount == bsize)
{
++totalm;
pcount = 0;
}
}
else // mismatch
{
bcount -= pcount; // backtrack
//reset
pcount = 0;
}
++bcount;
}
return totalm;
}
int main(int argc, char** argv)
{
const char* str = "abcdefghiibcdiixyz";
char pat = 'i';
std::cout << "Num matches = " << matchCount(str, 18, pat, 7) << std::endl;
return 0;
}

C/C++: Bitwise operators on dynamically allocated memory

In C/C++, is there an easy way to apply bitwise operators (specifically left/right shifts) to dynamically allocated memory?
For example, let's say I did this:
unsigned char * bytes=new unsigned char[3];
bytes[0]=1;
bytes[1]=1;
bytes[2]=1;
I would like a way to do this:
bytes>>=2;
(then the 'bytes' would have the following values):
bytes[0]==0
bytes[1]==64
bytes[2]==64
Why the values should be that way:
After allocation, the bytes look like this:
[00000001][00000001][00000001]
But I'm looking to treat the bytes as one long string of bits, like this:
[000000010000000100000001]
A right shift by two would cause the bits to look like this:
[000000000100000001000000]
Which finally looks like this when separated back into the 3 bytes (thus the 0, 64, 64):
[00000000][01000000][01000000]
Any ideas? Should I maybe make a struct/class and overload the appropriate operators? Edit: If so, any tips on how to proceed? Note: I'm looking for a way to implement this myself (with some guidance) as a learning experience.
I'm going to assume you want bits carried from one byte to the next, as John Knoeller suggests.
The requirements here are insufficient. You need to specify the order of the bits relative to the order of the bytes - when the least significant bit falls out of one byte, does to go to the next higher or next lower byte.
What you are describing, though, used to be very common for graphics programming. You have basically described a monochrome bitmap horizontal scrolling algorithm.
Assuming that "right" means higher addresses but less significant bits (ie matching the normal writing conventions for both) a single-bit shift will be something like...
void scroll_right (unsigned char* p_Array, int p_Size)
{
unsigned char orig_l = 0;
unsigned char orig_r;
unsigned char* dest = p_Array;
while (p_Size > 0)
{
p_Size--;
orig_r = *p_Array++;
*dest++ = (orig_l << 7) + (orig_r >> 1);
orig_l = orig_r;
}
}
Adapting the code for variable shift sizes shouldn't be a big problem. There's obvious opportunities for optimisation (e.g. doing 2, 4 or 8 bytes at a time) but I'll leave that to you.
To shift left, though, you should use a separate loop which should start at the highest address and work downwards.
If you want to expand "on demand", note that the orig_l variable contains the last byte above. To check for an overflow, check if (orig_l << 7) is non-zero. If your bytes are in an std::vector, inserting at either end should be no problem.
EDIT I should have said - optimising to handle 2, 4 or 8 bytes at a time will create alignment issues. When reading 2-byte words from an unaligned char array, for instance, it's best to do the odd byte read first so that later word reads are all at even addresses up until the end of the loop.
On x86 this isn't necessary, but it is a lot faster. On some processors it's necessary. Just do a switch based on the base (address & 1), (address & 3) or (address & 7) to handle the first few bytes at the start, before the loop. You also need to special case the trailing bytes after the main loop.
Decouple the allocation from the accessor/mutators
Next, see if a standard container like bitset can do the job for you
Otherwise check out boost::dynamic_bitset
If all fails, roll your own class
Rough example:
typedef unsigned char byte;
byte extract(byte value, int startbit, int bitcount)
{
byte result;
result = (byte)(value << (startbit - 1));
result = (byte)(result >> (CHAR_BITS - bitcount));
return result;
}
byte *right_shift(byte *bytes, size_t nbytes, size_t n) {
byte rollover = 0;
for (int i = 0; i < nbytes; ++i) {
bytes[ i ] = (bytes[ i ] >> n) | (rollover < n);
byte rollover = extract(bytes[ i ], 0, n);
}
return &bytes[ 0 ];
}
Here's how I would do it for two bytes:
unsigned int rollover = byte[0] & 0x3;
byte[0] >>= 2;
byte[1] = byte[1] >> 2 | (rollover << 6);
From there, you can generalize this into a loop for n bytes. For flexibility, you will want to generate the magic numbers (0x3 and 6) rather then hardcode them.
I'd look into something similar to this:
#define number_of_bytes 3
template<size_t num_bytes>
union MyUnion
{
char bytes[num_bytes];
__int64 ints[num_bytes / sizeof(__int64) + 1];
};
void main()
{
MyUnion<number_of_bytes> mu;
mu.bytes[0] = 1;
mu.bytes[1] = 1;
mu.bytes[2] = 1;
mu.ints[0] >>= 2;
}
Just play with it. You'll get the idea I believe.
Operator overloading is syntactic sugar. It's really just a way of calling a function and passing your byte array without having it look like you are calling a function.
So I would start by writing this function
unsigned char * ShiftBytes(unsigned char * bytes, size_t count_of_bytes, int shift);
Then if you want to wrap this up in an operator overload in order to make it easier to use or because you just prefer that syntax, you can do that as well. Or you can just call the function.