bit pattern matching and replacing - c++

I come across a very tricky problem with bit manipulation.
As far as I know, the smallest variable size to hold a value is one byte of 8 bits. The bit operations available in C/C++ apply to an entire unit of bytes.
Imagine that I have a map to replace a binary pattern 100100 (6 bits) with a signal 10000 (5 bits). If the 1st byte of input data from a file is 10010001 (8 bits) being stored in a char variable, part of it matches the 6 bit pattern and therefore be replaced by the 5 bit signal to give a result of 1000001 (7 bits).
I can use a mask to manipulate the bits within a byte to get a result of the left most bits to 10000 (5 bit) but the right most 3 bits become very tricky to manipulate. I cannot shift the right most 3 bits of the original data to get the correct result 1000001 (7 bit) followed by 1 padding bit in that char variable that should be filled by the 1st bit of next followed byte of input.
I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes). Can C/C++ do the trick or we have to go for other assembly languages that deal with single bits manipulations?
I heard that Power Basic may be able to do the bit-by-bit manipulation better than C/C++.

If time and space are not important then you can convert the bits to a string representation and perform replaces on the string, then convert back when needed. Not an elegant solution but one that works.

<< shiftleft
^ XOR
>> shift right
~ one's complement
Using these operations, you could easily isolate the pieces that you are interested in and compare them as integers.
say the byte 001000100 and you want to check if it contains 1000:
char k = (char)68;
char c = (char)8;
int i = 0;
while(i<5){
if((k<<i)>>(8-3-i) == c){
//do stuff
break;
}
}
This is very sketchy code, just meant to be a demonstration.

I wonder if C/C++ can actually do this
sort of replacement of bit patterns of
length that do not fit into a Char (1
byte) variable or even Int (4 bytes).
What about std::bitset?

Here's a small bit reader class which may suit your needs. Of course, you may want to create a bit writer for your use case.
#include <iostream>
#include <sstream>
#include <cassert>
class BitReader {
public:
typedef unsigned char BitBuffer;
BitReader(std::istream &input) :
input(input), bufferedBits(8) {
}
BitBuffer peekBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
skipBits(0); // Make sure we have a non-empty buffer
return (((input.peek() << 8) | buffer) >> bufferedBits) & ((1 << numBits) - 1);
}
void skipBits(int numBits) {
assert(numBits >= 0);
numBits += bufferedBits;
while (numBits > 8) {
buffer = input.get();
numBits -= 8;
}
bufferedBits = numBits;
}
BitBuffer readBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
BitBuffer ret = peekBits(numBits);
skipBits(numBits);
return ret;
}
bool eof() const {
return input.eof();
}
private:
std::istream &input;
BitBuffer buffer;
int bufferedBits; // How many bits are buffered into 'buffer' (0 = empty)
};

Use a vector<bool> if you can read your data into the vector mostly at once. It may be more difficult to find-and-replace sequences of bits, though.

If I understood your questions correctly, you have an input stream and and output stream and you want to replace the 6bits of the input with 5 in the output - and your output still should be a bit stream?
So, the most important programmer's rule can be applied: Divide et impera!
You should split your component in three parts:
Input Stream converter: Convert every pattern in the input stream to a char array (ring) buffer. If I understood you correctly your input "commands" are 8bit long, so there is nothing special about this.
Do the replacement on the ring buffer in a way that you replace every matching 6-bit pattern with the 5bit one, but "pad" the 5 bit with a leading zero, so the total length is still 8bit.
Write an output handler that reads from the ring buffer and let this output handler write only the 7 LSB to the output stream from each input byte. Of course some bit manipulation is necessary again for this.
If your ring buffer size can be divided by 8 and 7 (= is a multiple of 56) you will have a clean buffer at the end and can start again with 1.
The most simplest way to implement this is to iterate over this 3 steps as long as input data is available.
If a performance really matters and you are running on a multi-core CPU you even could split the steps and 3 threads, but then you must carefully synchronize the access to the ring buffer.

I think the following does what you want.
PATTERN_LEN = 6
PATTERNMASK = 0x3F //6 bits
PATTERN = 0x24 //b100100
REPLACE_LEN = 5
REPLACEMENT = 0x10 //b10000
void compress(uint8* inbits, uint8* outbits, int len)
{
uint16 accumulator=0;
int nbits=0;
uint8 candidate;
while (len--) //for all input bytes
{
//for each bit (msb first)
for (i=7;i<=0;i--)
{
//add 1 bit to accumulator
accumulator<<=1;
accumulator|=(*inbits&(1<<i));
nbits++;
//check for pattern
candidate = accumulator&PATTERNMASK;
if (candidate==PATTERN)
{
//remove pattern
accumulator>>=PATTERN_LEN;
//add replacement
accumulator<<=REPLACE_LEN;
accumulator|=REPLACMENT;
nbits+= (REPLACE_LEN - PATTERN_LEN);
}
}
inbits++;
//move accumulator to output to prevent overflow
while (nbits>8)
{
//copy the highest 8 bits
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
//clear them from accumulator
accumulator&= ~(0xFF<<nbits);
}
}
//copy remainder of accumulator to output
while (nbits>0)
{
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
accumulator&= ~(0xFF<<nbits);
}
}
You could use a switch or a loop in the middle to check the candidate against multiple patterns. There might have to be some special handling after doing a replacment to ensure the replacement pattern is not re-checked for matches.

#include <iostream>
#include <cstring>
size_t matchCount(const char* str, size_t size, char pat, size_t bsize) noexcept
{
if (bsize > 8) {
return 0;
}
size_t bcount = 0; // curr bit number
size_t pcount = 0; // curr bit in pattern char
size_t totalm = 0; // total number of patterns matched
const size_t limit = size*8;
while (bcount < limit)
{
auto offset = bcount%8;
char c = str[bcount/8];
c >>= offset;
char tpat = pat >> pcount;
if ((c & 1) == (tpat & 1))
{
++pcount;
if (pcount == bsize)
{
++totalm;
pcount = 0;
}
}
else // mismatch
{
bcount -= pcount; // backtrack
//reset
pcount = 0;
}
++bcount;
}
return totalm;
}
int main(int argc, char** argv)
{
const char* str = "abcdefghiibcdiixyz";
char pat = 'i';
std::cout << "Num matches = " << matchCount(str, 18, pat, 7) << std::endl;
return 0;
}

Related

How to do memory implementation of Huffman Agorithm?

I have a Huffman code algorithm that compresses characters into sequences of bits of arbitrary length, smaller than the default size of a char (8 bits on most modern platforms)
If the Huffman Code compresses an 8-bit character into 3 bits, how do I represent that 3-bit value in memory? To take this further, how do I combine multiple compressed characters into a compressed representation?
For example consider l which is "00000" (5x8 bits since 0 is also character). How do I represent l with 00000 (5 bits) instead of a character sequence?
A C or C++ implementation is preferred.
Now that this question is re-opened...
To make a variable that holds a variable number of bits, we just use use the lower bits of one unsigned int to store the bits, and use another unsigned int to remember how many bits we have stored.
When writing out a Huffman-compressed file, we wait until we have at least 8 bits stored. Then we write out a char using the top 8 bits and subtract 8 from the stored bit count.
Finally, at the end if you have any bits left to write out, you round up to an even multiple of 8 and write chars.
In C++, it's useful to encapsulate the output in some kind of BitOutputStream class, like:
class BitOutputStream
{
std::ostream m_out;
unsigned m_bitsPending;
unsigned m_numPending;
public:
BitOutputStream(const char *fileName)
:m_out(... /* you can do this part */)
{
m_bitsPending = 0;
m_numPending = 0;
}
// write out the lower <count> bits of <bits>
void write(unsigned bits, unsigned count)
{
if (count > 16)
{
//do it in two steps to prevent overflow
write(bits>>16, count-16);
count=16;
}
//make space for new bits
m_numPending += count;
m_bitsPending <<= count;
//store new bits
m_bitsPending |= (bits & ((1<<count)-1));
//write out any complete bytes
while(m_numPending >= 8)
{
m_numPending-=8;
m_out.put((char)(m_bitsPending >> m_numPending));
}
}
//write out any remaining bits
void flush()
{
if (m_numPending > 0)
{
m_out.put((char)(m_bitsPending << (8-m_numPending)));
}
m_bitsPending = m_numPending = 0;
m_out.flush();
}
}
If your Huffman coder returns an array of 1s and 0s representing the bits that should and should not be set in the output, you can shift these bits onto an unsigned char. Every eight shifts, you start writing to the next character, ultimately outputting an array of unsigned char. The number of these compressed characters that you will output is equal to the number of bits divided by eight, rounded up to the nearest natural number.
In C, this is a relatively simple function, consisting of a left shift (<<) and a bitwise OR (|). Here is the function, with an example to make it runnable. To see it with more extensive comments, please refer to this GitHub gist.
#include <stdlib.h>
#include <stdio.h>
#define BYTE_SIZE 8
size_t compress_code(const int *code, const size_t code_length, unsigned char **compressed)
{
if (code == NULL || code_length == 0 || compressed == NULL) {
return 0;
}
size_t compressed_length = (code_length + BYTE_SIZE - 1) / BYTE_SIZE;
*compressed = calloc(compressed_length, sizeof(char));
for (size_t char_counter = 0, i = 0; char_counter < compressed_length && i < code_length; ++i) {
if (i > 0 && (i % BYTE_SIZE) == 0) {
++char_counter;
}
// Shift the last bit to be set left by one
(*compressed)[char_counter] <<= 1;
// Put the next bit onto the end of the unsigned char
(*compressed)[char_counter] |= (code[i] & 1);
}
// Pad the remaining space with 0s on the right-hand-side
(*compressed)[compressed_length - 1] <<= compressed_length * BYTE_SIZE - code_length;
return compressed_length;
}
int main(void)
{
const int code[] = { 0, 1, 0, 0, 0, 0, 0, 1, // 65: A
0, 1, 0, 0, 0, 0, 1, 0 }; // 66: B
const size_t code_length = 16;
unsigned char *compressed = NULL;
size_t compressed_length = compress_code(code, code_length, &compressed);
for (size_t i = 0; i < compressed_length; ++i) {
printf("%c\n", compressed[i]);
}
return 0;
}
You can then just write the characters in the array to a file, or even copy the array's memory directly to a file, to write the compressed output.
Reading the compressed characters into bits, which will allow you to traverse your Huffman tree for decoding, is done with right shifts (>>) and checking the rightmost bit with bitwise AND (&).

Split up 32 bit value in C++ and concatenate the chunks in MATLAB

I'm working on a project where I have to send values of 32 bits over UART to MATLAB where I need to print them in the MATLAB terminal. I do this by splitting up the 32 bit value into 8 bit values like so (:
void Configurator::send(void) {
/**
* Split the 32 bits in chunks of 4 bytes of 8 bits
*/
union {
uint32_t data;
uint8_t bytes[4];
} splitData;
splitData.data = 1234587;
for (int n : splitData.bytes) {
XUartPs_SendByte(STDOUT_BASEADDRESS, splitData.bytes[n]);
}
}
In MATLAB I receive the following 4 bytes:
252
230
25
155
Now the question is, how do I restore the 1234587?
Am I correct in creating an array of size 4 as uint8_t? I would also like to note that I'm using union for readability. If I'm doing it wrong, I'd be happy to hear why!
You could use left shift to restore the value
uint32_t value = (byte[3]<<24) + (byte[2]<<16) + (byte[1]<<8) + (byte[0]<<0);
Try to avoid using unions for this sort of thing. It is not (in principle) portable, and can cause undefined behaviour. Instead write it like this:
void Configurator::send(void) {
/**
* Split the 32 bits in chunks of 4 bytes of 8 bits
*/
uint32_t data = 1234587;
for (int n = 0; n<4; n++) {
unsigned char octet = (data >> (n*8)) & 0xFF;
XUartPs_SendByte(STDOUT_BASEADDRESS, octet);
}
}
uint32_t recieveBytes(
{
uint32_t result = 0;
for (int n = 0; n<4; n++)
{
unsigned char octet = getOctet();
uint32_t octet32 = octet;
result != octet32 << (n*8);
}
return result;
}
The point is that by shifting out byte like this, you avoid any problems with endianness. The masking also means that if either end has 32-bit chars (such platforms exist), it all works anyway.

C++ Compressing eight booleans into a character

I have a large mass of integers that I'm reading from a file. All of them will be either 0 or 1, so I have converted each read integer to a boolean.
What I need to do is take advantage of the space (8 bits) that a character provides by packing every 8 bits/booleans into a single character. How can I do this?
I have experimented with binary operations, but I'm not coming up with what I want.
int count = 7;
unsigned char compressedValue = 0x00;
while(/*Not end of file*/)
{
...
compressedValue |= booleanValue << count;
count--;
if (count == 0)
{
count = 7;
//write char to stream
compressedValue &= 0;
}
}
Update
I have updated the code to reflect some corrections suggested so far. My next question is, how should I initialize/clear the unsigned char?
Update
Reflected the changes to clear the character bits.
Thanks for the help, everyone.
Several notes:
while(!in.eof()) is wrong, you have to first try(!) to read something and if that succeeded, you can use the data.
Use an unsigned char to get an integer of at least eight bits. Alternatively, look into stdint.h and use uint8_t (or uint_least8_t).
The shift operation is in the wrong direction, use uint8_t(1) << count instead.
If you want to do something like that in memory, I'd use a bigger type, like 32 or 64 bits, because reading a byte is still a single RAM access even if much more than a byte could be read at once.
After writing a byte, don't forget to zero the temporary.
As Mooing Duck suggested, you can use a bitset.
The source code is only a proof of concept - especially the file-read has to be implemented.
#include <bitset>
#include <cstdint>
#include <iostream>
int main() {
char const fcontent[56] { "\0\001\0\001\0\001\0\001\0\001"
"\001\001\001\001\001\001\001\001\001\001\001\001"
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
"\0\001\0\001\0\001\0\001" };
for( int i { 0 }; i < 56; i += 8 ) {
std::bitset<8> const bs(fcontent+i, 8, '\0', '\001');
std::cout << bs.to_ulong() << " ";
}
std::cout << std::endl;
return 0;
}
Output:
85 127 252 0 0 1 84
The standard guaranties that vector<bool> is packed the way you want. Don't reinvent the wheel.more info here

C/C++: Bitwise operators on dynamically allocated memory

In C/C++, is there an easy way to apply bitwise operators (specifically left/right shifts) to dynamically allocated memory?
For example, let's say I did this:
unsigned char * bytes=new unsigned char[3];
bytes[0]=1;
bytes[1]=1;
bytes[2]=1;
I would like a way to do this:
bytes>>=2;
(then the 'bytes' would have the following values):
bytes[0]==0
bytes[1]==64
bytes[2]==64
Why the values should be that way:
After allocation, the bytes look like this:
[00000001][00000001][00000001]
But I'm looking to treat the bytes as one long string of bits, like this:
[000000010000000100000001]
A right shift by two would cause the bits to look like this:
[000000000100000001000000]
Which finally looks like this when separated back into the 3 bytes (thus the 0, 64, 64):
[00000000][01000000][01000000]
Any ideas? Should I maybe make a struct/class and overload the appropriate operators? Edit: If so, any tips on how to proceed? Note: I'm looking for a way to implement this myself (with some guidance) as a learning experience.
I'm going to assume you want bits carried from one byte to the next, as John Knoeller suggests.
The requirements here are insufficient. You need to specify the order of the bits relative to the order of the bytes - when the least significant bit falls out of one byte, does to go to the next higher or next lower byte.
What you are describing, though, used to be very common for graphics programming. You have basically described a monochrome bitmap horizontal scrolling algorithm.
Assuming that "right" means higher addresses but less significant bits (ie matching the normal writing conventions for both) a single-bit shift will be something like...
void scroll_right (unsigned char* p_Array, int p_Size)
{
unsigned char orig_l = 0;
unsigned char orig_r;
unsigned char* dest = p_Array;
while (p_Size > 0)
{
p_Size--;
orig_r = *p_Array++;
*dest++ = (orig_l << 7) + (orig_r >> 1);
orig_l = orig_r;
}
}
Adapting the code for variable shift sizes shouldn't be a big problem. There's obvious opportunities for optimisation (e.g. doing 2, 4 or 8 bytes at a time) but I'll leave that to you.
To shift left, though, you should use a separate loop which should start at the highest address and work downwards.
If you want to expand "on demand", note that the orig_l variable contains the last byte above. To check for an overflow, check if (orig_l << 7) is non-zero. If your bytes are in an std::vector, inserting at either end should be no problem.
EDIT I should have said - optimising to handle 2, 4 or 8 bytes at a time will create alignment issues. When reading 2-byte words from an unaligned char array, for instance, it's best to do the odd byte read first so that later word reads are all at even addresses up until the end of the loop.
On x86 this isn't necessary, but it is a lot faster. On some processors it's necessary. Just do a switch based on the base (address & 1), (address & 3) or (address & 7) to handle the first few bytes at the start, before the loop. You also need to special case the trailing bytes after the main loop.
Decouple the allocation from the accessor/mutators
Next, see if a standard container like bitset can do the job for you
Otherwise check out boost::dynamic_bitset
If all fails, roll your own class
Rough example:
typedef unsigned char byte;
byte extract(byte value, int startbit, int bitcount)
{
byte result;
result = (byte)(value << (startbit - 1));
result = (byte)(result >> (CHAR_BITS - bitcount));
return result;
}
byte *right_shift(byte *bytes, size_t nbytes, size_t n) {
byte rollover = 0;
for (int i = 0; i < nbytes; ++i) {
bytes[ i ] = (bytes[ i ] >> n) | (rollover < n);
byte rollover = extract(bytes[ i ], 0, n);
}
return &bytes[ 0 ];
}
Here's how I would do it for two bytes:
unsigned int rollover = byte[0] & 0x3;
byte[0] >>= 2;
byte[1] = byte[1] >> 2 | (rollover << 6);
From there, you can generalize this into a loop for n bytes. For flexibility, you will want to generate the magic numbers (0x3 and 6) rather then hardcode them.
I'd look into something similar to this:
#define number_of_bytes 3
template<size_t num_bytes>
union MyUnion
{
char bytes[num_bytes];
__int64 ints[num_bytes / sizeof(__int64) + 1];
};
void main()
{
MyUnion<number_of_bytes> mu;
mu.bytes[0] = 1;
mu.bytes[1] = 1;
mu.bytes[2] = 1;
mu.ints[0] >>= 2;
}
Just play with it. You'll get the idea I believe.
Operator overloading is syntactic sugar. It's really just a way of calling a function and passing your byte array without having it look like you are calling a function.
So I would start by writing this function
unsigned char * ShiftBytes(unsigned char * bytes, size_t count_of_bytes, int shift);
Then if you want to wrap this up in an operator overload in order to make it easier to use or because you just prefer that syntax, you can do that as well. Or you can just call the function.

c++ stl: a good way to parse a sensor response

Attention please:
I already implemented this stuff, just not in any way generic or elegant. This question is motivated by my wanting to learn more tricks with the stl, not the problem itself.
This I think is clear in the way I stated that I already solved the problem, but many people have answered in their best intentions with solutions to the problem, not answers to the question "how to solve this the stl way". I am really sorry if I phrased this question in a confusing way. I hate to waste people's time.
Ok, here it comes:
I get a string full of encoded Data.
It comes in:
N >> 64 bytes
every 3 byte get decoded into an int value
after at most 64 byte (yes,not divisible by 3!) comes a byte as checksum
followed by a line feed.
and so it goes on.
It ends when 2 successive linefeeds are found.
It looks like a nice or at least ok data format, but parsing it elegantly
the stl way is a real bit**.
I have done the thing "manually".
But I would be interested if there is an elegant way with the stl- or maybe boost- magic that doesn't incorporate copying the thing.
Clarification:
It gets really big sometimes. The N >> 64byte was more like a N >>> 64 byte ;-)
UPDATE
Ok, the N>64 bytes seems to be confusing. It is not important.
The sensor takes M measurements as integers. Encodes each of them into 3 bytes. and sends them one after another
when the sensor has sent 64byte of data, it inserts a checksum over the 64 byte and an LF. It doesn't care if one of the encoded integers is "broken up" by that. It just continues in the next line.(That has only the effect to make the data nicely human readable but kindof nasty to parse elegantly.)
if it has finished sending data it inserts a checksum-byte and LFLF
So one data chunk can look like this, for N=129=43x3:
|<--64byte-data-->|1byte checksum|LF
|<--64byte-data-->|1byte checksum|LF
|<--1byte-data-->|1byte checksum|LF
LF
When I have M=22 measurements, this means I have N=66 bytes of data.
After 64 byte it inserts the checksum and LF and continues.
This way it breaks up my last measurement
which is encoded in byte 64, 65 and 66. It now looks like this: 64, checksum, LF, 65, 66.
Since a multiple of 3 divided by 64 carries a residue 2 out of 3 times, and everytime
another one, it is nasty to parse.
I had 2 solutions:
check checksum, concatenate data to one string that only has data bytes, decode.
run through with iterators and one nasty if construct to avoid copying.
I just thought there might be someting better. I mused about std::transform, but it wouldn't work because of the 3 byte is one int thing.
As much as I like STL, I don't think there's anything wrong with doing things manually, especially if the problem does not really fall into the cases the STL has been made for. Then again, I'm not sure why you ask. Maybe you need an STL input iterator that (checks and) discards the check sums and LF characters and emits the integers?
I assume the encoding is such that LF can only appear at those places, i.e., some kind of Base-64 or similar?
It seems to me that something as simple as the following should solve the problem:
string line;
while( getline( input, line ) && line != "" ) {
int val = atoi( line.substr(0, 3 ).c_str() );
string data = line.substr( 3, line.size() - 4 );
char csum = line[ line.size() - 1 ];
// process val, data and csum
}
In a real implementation you would want to add error checking, but the basic logic should remain the same.
As others have said, there is no silver bullet in stl/boost to elegantly solve your problem. If you want to parse your chunk directly via pointer arithmetic, perhaps you can take inspiration from std::iostream and hide the messy pointer arithmetic in a custom stream class. Here's a half-arsed solution I came up with:
#include <cctype>
#include <iostream>
#include <vector>
#include <boost/lexical_cast.hpp>
class Stream
{
public:
enum StateFlags
{
goodbit = 0,
eofbit = 1 << 0, // End of input packet
failbit = 1 << 1 // Corrupt packet
};
Stream() : state_(failbit), csum_(0), pos_(0), end_(0) {}
Stream(char* begin, char* end) {open(begin, end);}
void open(char* begin, char* end)
{state_=goodbit; csum_=0; pos_=begin, end_=end;}
StateFlags rdstate() const {return static_cast<StateFlags>(state_);}
bool good() const {return state_ == goodbit;}
bool fail() const {return (state_ & failbit) != 0;}
bool eof() const {return (state_ & eofbit) != 0;}
Stream& read(int& measurement)
{
measurement = readDigit() * 100;
measurement += readDigit() * 10;
measurement += readDigit();
return *this;
}
private:
int readDigit()
{
int digit = 0;
// Check if we are at end of packet
if (pos_ == end_) {state_ |= eofbit; return 0;}
/* We should be at least csum|lf|lf away from end, and we are
not expecting csum or lf here. */
if (pos_+3 >= end_ || pos_[0] == '\n' || pos_[1] == '\n')
{
state_ |= failbit;
return 0;
}
if (!getDigit(digit)) {return 0;}
csum_ = (csum_ + digit) % 10;
++pos_;
// If we are at checksum, check and consume it, along with linefeed
if (pos_[1] == '\n')
{
int checksum = 0;
if (!getDigit(checksum) || (checksum != csum_)) {state_ |= failbit;}
csum_ = 0;
pos_ += 2;
// If there is a second linefeed, we are at end of packet
if (*pos_ == '\n') {pos_ = end_;}
}
return digit;
}
bool getDigit(int& digit)
{
bool success = std::isdigit(*pos_);
if (success)
digit = boost::lexical_cast<int>(*pos_);
else
state_ |= failbit;
return success;
}
int csum_;
unsigned int state_;
char* pos_;
char* end_;
};
int main()
{
// Use (8-byte + csum + LF) fragments for this example
char data[] = "\
001002003\n\
300400502\n\
060070081\n\n";
std::vector<int> measurements;
Stream s(data, data + sizeof(data));
int meas = 0;
while (s.read(meas).good())
{
measurements.push_back(meas);
std::cout << meas << " ";
}
return 0;
}
Maybe you'll want to add extra StateFlags to determine if failure is due to checksum error or framing error. Hope this helps.
You should think of your communication protocol as being layered. Treat
|<--64byte-data-->|1byte checksum|LF
as fragments to be reassembled into larger packets of contiguous data. Once the larger packet is reconstituted, it is easier to parse its data contiguously (you don't have to deal with measurements being split up across fragments). Many existing network protocols (such as UDP/IP) does this sort of reassembly of fragments into packets.
It's possible to read the fragments directly into their proper "slot" in the packet buffer. Since your fragments have footers instead of headers, and there is no out-of-order arrival of your fragments, this should be fairly easy to code (compared to copyless IP reassembly algorithms). Once you receive an "empty" fragment (the duplicate LF), this marks the end of the packet.
Here is some sample code to illustrate the idea:
#include <vector>
#include <cassert>
class Reassembler
{
public:
// Constructs reassembler with given packet buffer capacity
Reassembler(int capacity) : buf_(capacity) {reset();}
// Returns bytes remaining in packet buffer
int remaining() const {return buf_.end() - pos_;}
// Returns a pointer to where the next fragment should be read
char* back() {return &*pos_;}
// Advances the packet's position cursor for the next fragment
void push(int size) {pos_ += size; if (size == 0) complete_ = true;}
// Returns true if an empty fragment was pushed to indicate end of packet
bool isComplete() const {return complete_;}
// Resets the reassembler so it can process a new packet
void reset() {pos_ = buf_.begin(); complete_ = false;}
// Returns a pointer to the accumulated packet data
char* data() {return &buf_[0];}
// Returns the size in bytes of the accumulated packet data
int size() const {return pos_ - buf_.begin();}
private:
std::vector<char> buf_;
std::vector<char>::iterator pos_;
bool complete_;
};
int readFragment(char* dest, int maxBytes, char delimiter)
{
// Read next fragment from source and save to dest pointer
// Return number of bytes in fragment, except delimiter character
}
bool verifyChecksum(char* fragPtr, int size)
{
// Returns true if fragment checksum is valid
}
void processPacket(char* data, int size)
{
// Extract measurements which are now stored contiguously in packet
}
int main()
{
const int kChecksumSize = 1;
Reassembler reasm(1000); // Use realistic capacity here
while (true)
{
while (!reasm.isComplete())
{
char* fragDest = reasm.back();
int fragSize = readFragment(fragDest, reasm.remaining(), '\n');
if (fragSize > 1)
assert(verifyChecksum(fragDest, fragSize));
reasm.push(fragSize - kChecksumSize);
}
processPacket(reasm.data(), reasm.size());
reasm.reset();
}
}
The trick will be making an efficient readFragment function that stops at every newline delimiter and stores the incoming data into the given destination buffer pointer. If you tell me how you acquire your sensor data, then I can perhaps give you more ideas.
An elegant solution this isn't. It would be more so by using a "transition matrix", and only reading one character at a time. Not my style. Yet this code has a minimum of redundant data movement, and it seems to do the job. Minimally C++, it really is just a C program. Adding iterators is left as an exercise for the reader. The data stream wasn't completely defined, and there was no defined destination for the converted data. Assumptions noted in comments. Lots of printing should show functionality.
// convert series of 3 ASCII decimal digits to binary
// there is a checksum byte at least once every 64 bytes - it can split a digit series
// if the interval is less than 64 bytes, it must be followd by LF (to identify it)
// if the interval is a full 64 bytes, the checksum may or may not be followed by LF
// checksum restricted to a simple sum modulo 10 to keep ASCII format
// checksum computations are only printed to allowed continuation of demo, and so results can be
// inserted back in data for testing
// there is no verification of the 3 byte sets of digits
// results are just printed, non-zero return indicates error
int readData(void) {
int binValue = 0, digitNdx = 0, sensorCnt = 0, lineCnt = 0;
char oneDigit;
string sensorTxt;
while( getline( cin, sensorTxt ) ) {
int i, restart = 0, checkSum = 0, size = sensorTxt.size()-1;
if(size < 0)
break;
lineCnt++;
if(sensorTxt[0] == '#')
continue;
printf("INPUT: %s\n", &sensorTxt[0]); // gag
while(restart<size) {
for(i=0; i<min(64, size); i++) {
oneDigit = sensorTxt[i+restart] & 0xF;
checkSum += oneDigit;
binValue = binValue*10 + oneDigit;
//printf("%3d-%X ", binValue, sensorTxt[i+restart]);
digitNdx++;
if(digitNdx == 3) {
sensorCnt++;
printf("READING# %d (LINE %d) = %d CKSUM %d\n",
sensorCnt, lineCnt, binValue, checkSum);
digitNdx = 0;
binValue = 0;
}
}
oneDigit = sensorTxt[i+restart] & 0x0F;
char compCheckDigit = (10-(checkSum%10)) % 10;
printf(" CKSUM at sensorCnt %d ", sensorCnt);
if((checkSum+oneDigit) % 10)
printf("ERR got %c exp %c\n", oneDigit|0x30, compCheckDigit|0x30);
else
printf("OK\n");
i++;
restart += i;
}
}
if(digitNdx)
return -2;
else
return 0;
}
The data definition was extended with comments, you you can use the following as is:
# normal 64 byte lines with 3 digit value split across lines
00100200300400500600700800901001101201301401501601701801902002105
22023024025026027028029030031032033034035036037038039040041042046
# short lines, partial values - remove checksum digit to combine short lines
30449
0451
0460479
0480490500510520530540550560570580590600610620630641
# long line with embedded checksums every 64 bytes
001002003004005006007008009010011012013014015016017018019020021052202302402502602702802903003103203303403503603703803904004104204630440450460470480490500510520530540550560570580590600610620630640
# dangling digit at end of file (with OK checksum)
37
Why are you concerned with copying? Is it the time overhead or the space overhead?
It sounds like you are reading all of the unparsed data into a big buffer, and now you want to write code that makes the big buffer of unparsed data look like a slightly smaller buffer of parsed data (the data minus the checksums and linefeeds), to avoid the space overhead involved in copying it into the slightly smaller buffer.
Adding a complicated layer of abstraction isn't going to help with the time overhead unless you only need a small portion of the data. And if that's the case, maybe you could just figure out which small portion you need, and copy that. (Then again, most of the abstraction layer may be already written for you, e.g. the Boost iterator library.)
An easier way to reduce the space overhead is to read the data in smaller chunks (or a line at a time), and parse it as you go. Then you only need to store the parsed version in a big buffer. (This assumes you're reading it from a file / socket / port, rather than being passed a large buffer that is not under your control.)
Another way to reduce the space overhead is to overwrite the data in place as you parse it. You will incur the cost of copying it, but you will only need one buffer. (This assumes that the 64-byte data doesn't grow when you parse it, i.e. it is not compressed.)