Split array of m bytes into chunks of n bytes

Split array of m bytes into chunks of n bytes - c++

I'm working on a program that manipulates brain data. It recieves a value represents the current magnitude of 8 commonly-recognized types of EEG (brain-waves). This data value is output as a series of eight 3-byte unsigned integers in little-endian format.
Here is a piece of my code:
if (extendedCodeLevel == 0 && code == ASIC_EEG_POWER_CODE)
{
fprintf(arq4, "EXCODE level: %d CODE: 0x%02X vLength: %d\n", extendedCodeLevel, code, valueLength );
fprintf(arq4, "Data value(s):" );
for( i=0; i<valueLength; i++ ) fprintf(arq4, " %d", value[0] & 0xFF );
}
The value value[0] is my output. It is the series of bytes that represents the brain waves. The current output file contains is the following data:
EXCODE level: 0x00 CODE: 0x83 vLength: 24
Data value(s): 16 2 17 5 3 2 22 1 2 1 0 0 0 4 0 0 3 0 0 5 1 0 4 8
What I need is to divide the sequence of bytes above into 3-byte chunks, to identify the EEG. The wave delta is represented by the first 3-byte sequence, theta is represented by the next bytes, and so on. How can I do it?

Assuming that you know that your input will always be exactly eight three-bit integers, all you need is a simple loop that reads three bytes from the input and writes them out as a four-byte value. The easiest way to do this is to treat the input as an array of bytes and then pull bytes off of this array in groups of three.
// Convert an array of eight 3-byte integers into an array
// of eight 4-byte integers.
void convert_3to4(const void* input, void* output)
{
uint32_t tmp;
uint32_t* pOut = output;
uint8_t* pIn = input;
int i;
for (i=0; i<24; i+=3)
{
tmp = pIn[i];
tmp += (pIn[i+1] << 8);
tmp += (pIn[i+2] << 16);
pOut[((i+2) / 3)] = tmp;
}
}

Like this? The last bytes are not be printed if are not aligned by 3. Do you need them?
for( i=0; i<valueLength; i+=3 ) fprintf(arq4, "%d %d %d - ", value[i] & 0xFF,
value[i+1] & 0xFF,
value[i+2] & 0xFF );

Converting eight 3-byte little endian character-steams into eight 4-byte integers is fairly trivial:
for( int i = 0; i < 24; ++i )
{
output[ i & 0x07 ] |= input[ i ] << ( i & 0x18 );
}
I think that (untested) code will do it. Assuming input is a 24-entry char array, and output is an eight-entry int array.

You might try s.th. like this:
union _32BitValue
{
uint8_t bytes[4];
uint32_t uval;
}
size_t extractUint32From3ByteSegemented(const std::vector<uint8_t>& rawData, size_t index, uint32_t& result)
{
// May be do some checks, if the vector size fits extracting the data from it,
// throwing exception or return 0 etc. ...
_32BitValue tmp;
tmp.bytes[0] = 0;
tmp.bytes[1] = rawData[index + 2];
tmp.bytes[2] = rawData[index + 1];
tmp.bytes[3] = rawData[index];
result = ntohl(tmp.uval);
return index + 3;
}
The code used to parse the values from the raw data array:
size_t index = 0;
std::vector<uint8_t> rawData = readRawData(); // Provide such method to read the raw data into the vector
std::vector<uint32_t> myDataValues;
while(index < rawData.size())
{
uint32_t extractedValue;
index = extractUint32From3ByteSegemented(rawData,index,extractedValue);
// Depending on what error detection you choose do check for index returned
// != 0, or catch exception ...
myDataValues.push_back(extractedValue);
}
// Continue with calculations on the extracted values ...
Using the left shift operator and addition as shown in other answers will do the trick as well. But IMHO this sample shows clearly what's going on. It fills the unions byte array with a value in big-endian (network) order and uses ntohl() to retrieve the result in the host machine's used format (big- or little-endian) portably.

What I need is, instead of displaying the whole sequence of 24 bytes, I need to get the 3-byte sequences separately.
You can easily copy the 1d byte array to the desired 2d shape.
Example:
#include <inttypes.h>
#include <stdio.h>
#include <string.h>
int main() {
/* make up data */
uint8_t bytes[] =
{ 16, 2, 17,
5, 3, 2,
22, 1, 2,
1, 0, 0,
0, 4, 0,
0, 3, 0,
0, 5, 1,
0, 4, 8 };
int32_t n_bytes = sizeof(bytes);
int32_t chunksize = 3;
int32_t n_chunks = n_bytes/chunksize + (n_bytes%chunksize ? 1 : 0);
/* chunkify */
uint8_t chunks[n_chunks][chunksize];
memset(chunks, 0, sizeof(uint8_t[n_chunks][chunksize]));
memcpy(chunks, bytes, n_bytes);
/* print result */
size_t i, j;
for (i = 0; i < n_chunks; i++)
{
for (j = 0; j < chunksize; j++)
printf("%02hhd ", chunks[i][j]);
printf("\n");
}
return 0;
}
The output is:
16 02 17
05 03 02
22 01 02
01 00 00
00 04 00
00 03 00
00 05 01
00 04 08

I used some of the examples here to come up with a solution, so I thought I'd share it. It could be a basis for an interface so that objects can transmit copies of themselves over a network with the hton and ntoh functions, which is actually what I am trying to do.
#include <iostream>
#include <string>
#include <exception>
#include <arpa/inet.h>
using namespace std;
void DispLength(string name, size_t var){
cout << "The size of " << name << " is : " << var << endl;
}
typedef int8_t byte;
class Bytes {
public:
Bytes(void* data_ptr, size_t size)
: size_(size)
{ this->bytes_ = (byte*)data_ptr; }
~Bytes(){ bytes_ = NULL; } // Caller is responsible for data deletion.
const byte& operator[] (int idx){
if((size_t)idx <= size_ && idx >= 0)
return bytes_[idx];
else
throw exception();
}
int32_t ret32(int idx) //-- Return a 32 bit value starting at index idx
{
int32_t* ret_ptr = (int32_t*)&((*this)[idx]);
int32_t ret = *ret_ptr;
return ret;
}
int64_t ret64(int idx) //-- Return a 64 bit value starting at index idx
{
int64_t* ret_ptr = (int64_t*)&((*this)[idx]);
int64_t ret = *ret_ptr;
return ret;
}
template <typename T>
T retVal(int idx) //-- Return a value of type T starting at index idx
{
T* T_ptr = (T*)&((*this)[idx]);
T T_ret = *T_ptr;
return T_ret;
}
protected:
Bytes() : bytes_(NULL), size_(0) {}
private:
byte* bytes_; //-- pointer used to scan for bytes
size_t size_;
};
int main(int argc, char** argv){
long double LDouble = 1.0;
Bytes bytes(&LDouble, sizeof(LDouble));
DispLength(string("LDouble"), sizeof(LDouble));
DispLength(string("bytes"), sizeof(bytes));
cout << "As a long double LDouble is " << LDouble << endl;
for( int i = 0; i < 16; i++){
cout << "Byte " << i << " : " << bytes[i] << endl;
}
cout << "Through the eyes of bytes : " <<
(long double) bytes.retVal<long double>(0) << endl;
return 0;
}

you can use bit manipulation operators
I would use, not following actual code, just show example
(for I =0 until 7){
temp val = Value && 111 //AND operation with 111
Value = Value >> 3; //to shift right
}

Some self documenting, maintainable code might look something like this (untested).
typedef union
{
struct {
uint8_t padding;
uint8_t value[3];
} raw;
int32_t data;
} Measurement;
void convert(uint8_t* rawValues, int32_t* convertedValues, int convertedSize)
{
Measurement sample;
int i;
memset(&sample, '\0', sizeof(sample));
for(i=0; i<convertedSize; ++i)
{
memcpy(&sample.raw.value[0], &rawValues[i*sizeof(sample.raw.value)], sizeof(sample.raw.value));
convertedValues[i]=sample.data;
}
}

Related

Accessing 8-bit data as 7-bit

I have an array of 100 uint8_t's, which is to be treated as a stream of 800 bits, and dealt with 7 bits at a time. So in other words, if the first element of the 8-bit array holds 0b11001100 and the second holds ob11110000 then when I come to read it in 7-bit format, the first element of the 7-bit array would be 0b1100110 and the second would be 0b0111100 with the remaining 2 bits being held in the 3rd.
The first thing I tried was a union...
struct uint7_t {
uint8_t i1:7;
};
union uint7_8_t {
uint8_t u8[100];
uint7_t u7[115];
};
but of course everything's byte aligned and I essentially end up simply loosing the 8th bit of each element.
Does anyone have any idea's on how I can go about doing this?
Just to be clear, this is something of a visual representation of the result of the union:
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 32 bits of 8 bit data
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx 32 bits of 7-bit data.
And this represents what it is that I want to do instead:
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 32 bits of 8 bit data
xxxxxxx xxxxxxx xxxxxxx xxxxxxx xxxx 32 bits of 7-bit data.
I'm aware the last bits may be padded but that's fine, I just want someway of accessing each byte 7 bits at a time without losing any of the 800 bits. So far the only way I can think of is lots of bit shifting, which of course would work but I'm sure there's a cleaner way of going about it(?)
Thanks in advance for any answers.

Not sure what you mean by "cleaner". Generally people who work on this sort of problem regularly consider shifting and masking to be the right primitive tool to use. One can do something like defining a bitstream abstraction with a method to read an arbitrary number of bits off the stream. This abstraction sometimes shows up in compression applications. The internals of the method of course do use shifting and masking.
One fairly clean approach is to write a function which extracts a 7-bit number at any bit index in an array of unsigned char's. Use a division to convert the bit index to a byte index, and modulus to get the bit index within the byte. Then shift and mask. The input bits can span two bytes, so you either have to glue together a 16-bit value before extraction, or do two smaller extractions and or them together to construct the result.
If I were aiming for something moderately performant, I'd likely take one of two approaches:
The first has two state variables saying how many bits to take from the current and next byte. It would use shifting, masking, and bitwise or, to produce the current output (a number between 0 and 127 as an int for example), then the loop would update both state variables via adding and modulus, and would increment the current byte pointers if all bits in the first byte were consumed.
The second approach is to load 56-bits (8 outputs worth of input) into a 64-bit integer and use a fully unrolled structure to extract each of the 8 outputs. Doing this without using unaligned memory reads requires constructing the 64-bit integer piecemeal. (56-bits is special because the starting bit position is byte aligned.)
To go real fast, I might try writing SIMD code in Halide. That's beyond scope here I believe. (And not clear it is going to win much actually.)
Designs which read more than one byte into a integer at a time will likely have to consider processor byte ordering.

Process them in groups of 8 (since 8x7 nicely rounds to something 8bit aligned). Bitwise operators are the order of the day here. Faffing around with the last (upto) 7 numbers is a little faffy, but not impossible. (This code assumes these are unsigned 7 bit integers! Signed conversion would require you to do consider flipping the top bit if bit[6] is 1)
// convert 8 x 7bit ints in one go
void extract8(const uint8_t input[7], uint8_t output[8])
{
output[0] = input[0] & 0x7F;
output[1] = (input[0] >> 7) | ((input[1] << 1) & 0x7F);
output[2] = (input[1] >> 6) | ((input[2] << 2) & 0x7F);
output[3] = (input[2] >> 5) | ((input[3] << 3) & 0x7F);
output[4] = (input[3] >> 4) | ((input[4] << 4) & 0x7F);
output[5] = (input[4] >> 3) | ((input[5] << 5) & 0x7F);
output[6] = (input[5] >> 2) | ((input[6] << 6) & 0x7F);
output[7] = input[6] >> 1;
}
// convert array of 7bit ints to 8bit
void seven_bit_to_8bit(const uint8_t* const input, uint8_t* const output, const size_t count)
{
size_t count8 = count >> 3;
for(size_t i = 0; i < count8; ++i)
{
extract8(input + 7 * i, output + 8 * i);
}
// handle remaining (upto) 7 bytes
const size_t countr = (count % 8);
if(countr)
{
// how many bytes do we need to copy from the input?
size_t remaining_bits = 7 * countr;
if(remaining_bits % 8)
{
// round to next nearest multiple of 8
remaining_bits += (8 - remaining_bits % 8);
}
remaining_bits /= 8;
{
uint8_t in[7] = {0}, out[8] = {0};
for(size_t i = 0; i < remaining_bits; ++i)
{
in[i] = input[count8 * 7 + i];
}
extract8(in, out);
for(size_t i = 0; i < countr; ++i)
{
output[count8 * 8 + i] = in[i];
}
}
}
}

Here is a solution that uses the vector bool specialization. It also uses a similar mechanism to allow access to the seven-bit elements via reference objects.
The member functions allow for the following operations:
uint7_t x{5}; // simple value
Arr<uint7_t> arr(10); // array of size 10
arr[0] = x; // set element
uint7_t y = arr[0]; // get element
arr.push_back(uint7_t{9}); // add element
arr.push_back(x); //
std::cout << "Array size is "
<< arr.size() << '\n'; // get size
for(auto&& i : arr)
std::cout << i << '\n'; // range-for to read values
int z{50};
for(auto&& i : arr)
i = z++; // range-for to change values
auto&& v = arr[1]; // get reference to second element
v = 99; // change second element via reference
Full program:
#include <vector>
#include <iterator>
#include <iostream>
struct uint7_t {
unsigned int i : 7;
};
struct seven_bit_ref {
size_t begin;
size_t end;
std::vector<bool>& bits;
seven_bit_ref& operator=(const uint7_t& right)
{
auto it{bits.begin()+begin};
for(int mask{1}; mask != 1 << 7; mask <<= 1)
*it++ = right.i & mask;
return *this;
}
operator uint7_t() const
{
uint7_t r{};
auto it{bits.begin() + begin};
for(int i{}; i < 7; ++i)
r.i += *it++ << i;
return r;
}
seven_bit_ref operator*()
{
return *this;
}
void operator++()
{
begin += 7;
end += 7;
}
bool operator!=(const seven_bit_ref& right)
{
return !(begin == right.begin && end == right.end);
}
seven_bit_ref operator=(int val)
{
uint7_t temp{};
temp.i = val;
operator=(temp);
return *this;
}
};
template<typename T>
class Arr;
template<>
class Arr<uint7_t> {
public:
Arr(size_t size) : bits(size * 7, false) {}
seven_bit_ref operator[](size_t index)
{
return {index * 7, index * 7 + 7, bits};
}
size_t size()
{
return bits.size() / 7;
}
void push_back(uint7_t val)
{
for(int mask{1}; mask != 1 << 7; mask <<= 1){
bits.push_back(val.i & mask);
}
}
seven_bit_ref begin()
{
return {0, 7, bits};
}
seven_bit_ref end()
{
return {size() * 7, size() * 7 + 7, bits};
}
std::vector<bool> bits;
};
std::ostream& operator<<(std::ostream& os, uint7_t val)
{
os << val.i;
return os;
}
int main()
{
uint7_t x{5}; // simple value
Arr<uint7_t> arr(10); // array of size 10
arr[0] = x; // set element
uint7_t y = arr[0]; // get element
arr.push_back(uint7_t{9}); // add element
arr.push_back(x); //
std::cout << "Array size is "
<< arr.size() << '\n'; // get size
for(auto&& i : arr)
std::cout << i << '\n'; // range-for to read values
int z{50};
for(auto&& i : arr)
i = z++; // range-for to change values
auto&& v = arr[1]; // get reference
v = 99; // change via reference
std::cout << "\nAfter changes:\n";
for(auto&& i : arr)
std::cout << i << '\n';
}

The following code works as you have asked for it, but first the output and live example on ideone.
Output:
Before changing values...:
7 bit representation: 1111111 0000000 0000000 0000000 0000000 0000000 0000000 0000000
8 bit representation: 11111110 00000000 00000000 00000000 00000000 00000000 00000000
After changing values...:
7 bit representation: 1000000 1001100 1110010 1011010 1010100 0000111 1111110 0000000
8 bit representation: 10000001 00110011 10010101 10101010 10000001 11111111 00000000
8 Bits: 11111111 to ulong: 255
7 Bits: 1111110 to ulong: 126
After changing values...:
7 bit representation: 0010000 0101010 0100000 0000000 0000000 0000000 0000000 0000000
8 bit representation: 00100000 10101001 00000000 00000000 00000000 00000000 00000000
It is very straight forward using a std::bitset in a class called BitVector. I implement one getter and setter. The getter returns also a std::bitset at the given index selIdx with a given template argument size M. The given idx will be multiplied by the given size M to get the right position. The returned bitset can also be converted to numerical or string values.
The setter uses an uint8_t value as input and again the index selIdx. The bits will be shifted to the right position into the bitset.
Further you can use the getter and setter with different sizes because of the template argument M, which means you can work with either 7 or 8 bit representation but also 3 or what ever you like.
I'm sure this code is not the best concerning speed, but I think it is a very clear and clean solution. Also it is not complete at all as there are just one getter, one setter and two constructors. Remember to implement error checking concerning indexes and sizes.
Code:
#include <iostream>
#include <bitset>
template <size_t N> class BitVector
{
private:
std::bitset<N> _data;
public:
BitVector (unsigned long num) : _data (num) { };
BitVector (const std::string& str) : _data (str) { };
template <size_t M>
std::bitset<M> getBits (size_t selIdx)
{
std::bitset<M> retBitset;
for (size_t idx = 0; idx < M; ++idx)
{
retBitset |= (_data[M * selIdx + idx] << (M - 1 - idx));
}
return retBitset;
}
template <size_t M>
void setBits (size_t selIdx, uint8_t num)
{
const unsigned char* curByte = reinterpret_cast<const unsigned char*> (&num);
for (size_t bitIdx = 0; bitIdx < 8; ++bitIdx)
{
bool bitSet = (1 == ((*curByte & (1 << (8 - 1 - bitIdx))) >> (8 - 1 - bitIdx)));
_data.set(M * selIdx + bitIdx, bitSet);
}
}
void print_7_8()
{
std:: cout << "\n7 bit representation: ";
for (size_t idx = 0; idx < (N / 7); ++idx)
{
std::cout << getBits<7>(idx) << " ";
}
std:: cout << "\n8 bit representation: ";
for (size_t idx = 0; idx < N / 8; ++idx)
{
std::cout << getBits<8>(idx) << " ";
}
}
};
int main ()
{
BitVector<56> num = 127;
std::cout << "Before changing values...:";
num.print_7_8();
num.setBits<8>(0, 0x81);
num.setBits<8>(1, 0b00110011);
num.setBits<8>(2, 0b10010101);
num.setBits<8>(3, 0xAA);
num.setBits<8>(4, 0x81);
num.setBits<8>(5, 0xFF);
num.setBits<8>(6, 0x00);
std::cout << "\n\nAfter changing values...:";
num.print_7_8();
std::cout << "\n\n8 Bits: " << num.getBits<8>(5) << " to ulong: " << num.getBits<8>(5).to_ulong();
std::cout << "\n7 Bits: " << num.getBits<7>(6) << " to ulong: " << num.getBits<7>(6).to_ulong();
num = BitVector<56>(std::string("1001010100000100"));
std::cout << "\n\nAfter changing values...:";
num.print_7_8();
return 0;
}

Here is one approach without the manual shifting. This is just a crude POC, but hopefully you will be able to get something out of it. I don't know if you are able to easily transform your input into bitset, but i think it should be possible.
int bytes = 0x01234567;
bitset<32> bs(bytes);
cout << "Input: " << bs << endl;
for(int i = 0; i < 5; i++)
{
bitset<7> slice(bs.to_string().substr(i*7, 7));
cout << slice << endl;
}
Also this is probably much less performant then the bitshifting version, so i wouldn't recommend it for heavy lifting.

You can use this to get the index'th 7-bit element from in (note that it doesn't have proper end of array handling). Simple, fast.
int get7(const uint8_t *in, int index) {
int fidx = index*7;
int idx = fidx>>3;
int sidx = fidx&7;
return (in[idx]>>sidx|in[idx+1]<<(8-sidx))&0x7f;
}

You can use direct access or bulk bit packing/unpacking as in TurboPFor:Integer Compression
// Direct read access
// b : bit width 0-16 (7 in your case)
#define bzhi32(u,b) ((u) & ((1u <<(b))-1))
static inline unsigned bitgetx16(unsigned char *in,
unsigned idx,
unsigned b) {
unsigned bidx = b*idx;
return bzhi32( *(unsigned *)((uint16_t *)in+(bidx>>4)) >> (bidx& 0xf), b );
}

Using c++ is it possible to convert an Ascii character to Hex?

I have written a program that sets up a client/server TCP socket over which the user sends an integer value to the server through the use of a terminal interface. On the server side I am executing byte commands for which I need hex values stored in my array.
sprint(mychararray, %X, myintvalue);
This code takes my integer and prints it as a hex value into a char array. The only problem is when I use that array to set my commands it registers as an ascii char. So for example if I send an integer equal to 3000 it is converted to 0x0BB8 and then stored as 'B''B''8' which corresponds to 42 42 38 in hex. I have looked all over the place for a solution, and have not been able to come up with one.
Finally came up with a solution to my problem. First I created an array and stored all hex values from 1 - 256 in it.
char m_list[256]; //array defined in class
m_list[0] = 0x00; //set first array index to zero
int count = 1; //count variable to step through the array and set members
while (count < 256)
{
m_list[count] = m_list[count -1] + 0x01; //populate array with hex from 0x00 - 0xFF
count++;
}
Next I created a function that lets me group my hex values into individual bytes and store into the array that will be processing my command.
void parse_input(char hex_array[], int i, char ans_array[])
{
int n = 0;
int j = 0;
int idx = 0;
string hex_values;
while (n < i-1)
{
if (hex_array[n] = '\0')
{
hex_values = '0';
}
else
{
hex_values = hex_array[n];
}
if (hex_array[n+1] = '\0')
{
hex_values += '0';
}
else
{
hex_values += hex_array[n+1];
}
cout<<"This is the string being used in stoi: "<<hex_values; //statement for testing
idx = stoul(hex_values, nullptr, 16);
ans_array[j] = m_list[idx];
n = n + 2;
j++;
}
}
This function will be called right after my previous code.
sprint(mychararray, %X, myintvalue);
void parse_input(arrayA, size of arrayA, arrayB)
Example: arrayA = 8byte char array, and arrayB is a 4byte char array. arrayA should be double the size of arrayB since you are taking two ascii values and making a byte pair. e.g 'A' 'B' = 0xAB

While I was trying to understand your question I realized what you needed was more than a single variable. You needed a class, this is because you wished to have a string that represents the hex code to be printed out and also the number itself in the form of an unsigned 16 bit integer, which I deduced would be something like unsigned short int. So I created a class that did all this for you named hexset (I got the idea from bitset), here:
#include <iostream>
#include <string>
class hexset {
public:
hexset(int num) {
this->hexnum = (unsigned short int) num;
this->hexstring = hexset::to_string(num);
}
unsigned short int get_hexnum() {return this->hexnum;}
std::string get_hexstring() {return this->hexstring;}
private:
static std::string to_string(int decimal) {
int length = int_length(decimal);
std::string ret = "";
for (int i = (length > 1 ? int_length(decimal) - 1 : length); i >= 0; i--) {
ret = hex_arr[decimal%16]+ret;
decimal /= 16;
}
if (ret[0] == '0') {
ret = ret.substr(1,ret.length()-1);
}
return "0x"+ret;
}
static int int_length(int num) {
int ret = 1;
while (num > 10) {
num/=10;
++ret;
}
return ret;
}
static constexpr char hex_arr[16] = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
unsigned short int hexnum;
std::string hexstring;
};
constexpr char hexset::hex_arr[16];
int main() {
int number_from_file = 3000; // This number is in all forms technically, hex is just another way to represent this number.
hexset hex(number_from_file);
std::cout << hex.get_hexstring() << ' ' << hex.get_hexnum() << std::endl;
return 0;
}
I assume you'll probably want to do some operator overloading to make it so you can add and subtract from this number or assign new numbers or do any kind of mathematical or bit shift operation.

Shifting arrays of bytes and skipping bits

I'm trying to make a function that would return N number of bits of a given memory chunk, and optionally skipping M bits.
Example:
unsigned char *data = malloc(3);
data[0] = 'A'; data[1] = 'B'; data[2] = 'C';
read(data, 8, 4);
would skip 12 bits and then read 8 bits from the data chunk "ABC".
"Skipping" bits means it would actually bitshift the entire array, carrying bits from the right to the left.
In this example ABC is
01000001 01000010 01000011
and the function would need to return
0001 0100
This question is a follow up of my previous question
Minimal compilable code
#include <ios>
#include <cmath>
#include <bitset>
#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <iostream>
using namespace std;
typedef unsigned char byte;
typedef struct bit_data {
byte *data;
size_t length;
} bit_data;
/*
Asume skip_n_bits will be 0 >= skip_n_bits <= 8
*/
bit_data *read(size_t n_bits, size_t skip_n_bits) {
bit_data *bits = (bit_data *) malloc(sizeof(struct bit_data));
size_t bytes_to_read = ceil(n_bits / 8.0);
size_t bytes_to_read_with_skip = ceil(n_bits / 8.0) + ceil(skip_n_bits / 8.0);
bits->data = (byte *) calloc(1, bytes_to_read);
bits->length = n_bits;
/* Hardcoded for the sake of this example*/
byte *tmp = (byte *) malloc(3);
tmp[0] = 'A'; tmp[1] = 'B'; tmp[2] = 'C';
/*not working*/
if(skip_n_bits > 0){
unsigned char *tmp2 = (unsigned char *) calloc(1, bytes_to_read_with_skip);
size_t i;
for(i = bytes_to_read_with_skip - 1; i > 0; i--) {
tmp2[i] = tmp[i] << skip_n_bits;
tmp2[i - 1] = (tmp[i - 1] << skip_n_bits) | (tmp[i] >> (8 - skip_n_bits));
}
memcpy(bits->data, tmp2, bytes_to_read);
free(tmp2);
}else{
memcpy(bits->data, tmp, bytes_to_read);
}
free(tmp);
return bits;
}
int main(void) {
//Reading "ABC"
//01000001 01000010 01000011
bit_data *res = read(8, 4);
cout << bitset<8>(*res->data);
cout << " -> Should be '00010100'";
return 0;
}
The current code returns 00000000 instead of 00010100.
I feel like the error is something small, but I'm missing it. Where is the problem?

Your code is tagged as C++, and indeed you're already using C++ constructs like bitset, however it's very C-like. The first thing to do I think would be to use more C++.
Turns out bitset is pretty flexible already. My approach would be to create one to store all the bits in our input data, and then grab a subset of that based on the number you wish to skip, and return the subset:
template<size_t N, size_t M, typename T = unsigned char>
std::bitset<N> read(size_t skip_n_bits, const std::array<T, M>& data)
{
const size_t numBits = sizeof(T) * 8;
std::bitset<N> toReturn; // initially all zeros
// if we want to skip all bits, return all zeros
if (M*numBits <= skip_n_bits)
return toReturn;
// create a bitset to store all the bits represented in our data array
std::bitset<M*numBits> tmp;
// set bits in tmp based on data
// convert T into bit representations
size_t pos = M*numBits-1;
for (const T& element : data)
{
for (size_t i=0; i < numBits; ++i)
{
tmp.set(pos-i, (1 << (numBits - i-1)) & element);
}
pos -= numBits;
}
// grab just the bits we need
size_t startBit = tmp.size()-skip_n_bits-1;
for (size_t i = 0; i < N; ++i)
{
toReturn[N-i-1] = tmp[startBit];
tmp <<= 1;
}
return toReturn;
}
Full working demo
And now we can call it like so:
// return 8-bit bitset, skip 12 bits
std::array<unsigned char, 3> data{{'A', 'B', 'C'}};
auto&& returned = read<8>(12, data);
std::cout << returned << std::endl;
Prints
00100100
which is precisely our input 01000001 01000010 01000011 skipping the first twelve bits (from the left towards the right), and only grabbing the next 8 available.
I'd argue this is a bit easier to read than what you've got, esp. from a C++ programmer's point of view.

Reading 6 byte 8-bit integer from binary file

This is what my file looks like:
00 00 00 00 00 34 ....
I have read it already to a unsigned char array using fread, but I don't know, how I can now turn it into a unsigned integer.
The array looks like this:
0, 0, 0, 0, 0, 52

This is how I got it to work:
unsigned char table_index[6];
fread(table_index, 1, 6, file);
unsigned long long tindex = 0;
tindex = (tindex << 8);
tindex = (tindex << 8);
tindex = (tindex << 8) + table_index[0];
tindex = (tindex << 8) + table_index[1];
tindex = (tindex << 8) + table_index[2];
tindex = (tindex << 8) + table_index[3];
tindex = (tindex << 8) + table_index[4];
tindex = (tindex << 8) + table_index[5];

You're starting with a 48 bit value but there's probably no 48 bit integer type on your system. There is probably a 64 bit type though, and it might be a "long long".
Assuming your 6 bytes are ordered most significant first, and understanding that you need to fill out two extra bytes for a long long, you might do something such as:
long long myNumber;
char *ptr = (char *)&myNumber;
*ptr++ = 0; // pad the msb
*ptr++ = 0; // pad the 2nd msb
fread(ptr, 1, 6, fp);
Now you've got a value in myNumber

If the file is filled with 48-bit integers like I am assuming you are talking about, from the char array, you can do this:
char temp[8];
unsigned char *data = //...
unsigned char *data_ptr = data;
vector<unsigned long long> numbers;
size_t sz = // Num of 48-bit numbers
for (size_t i = 0; i < sz; i++, data_ptr += 6)
{
memcpy(temp + 2, data_ptr, 6);
numbers.push_back((unsigned long long)*temp);
}
This algorithm assumes that the numbers are all already encoded properly in the file. It also assumes an endianness that I cannot name off the top of my head.

if you want to interpret 4 bytes of your uchar array as one uint do this :
unsigned char uchararray[totalsize];
unsigned int * uintarray = (unsigned int *)uchararray;
if you want one byte of your uchar array to be transformed to one uint do this :
unsigned char uchararray[totalsize];
unsigned int uintarray[totalsize];
for(int i = 0 ; i < totalsize; i++)
uintarray[i] = (unsigned int)uchararray[i];

Is this what you're talking about?
// long long because it's usually 8 bytes (and there's not usually a 6 byte int type)
vector<unsigned long long> numbers;
fstream infile("testfile.txt");
if (!infile) {
cout << "fail" << endl;
cin.get();
return 0;
}
while (true) {
stringstream numstr;
string tmp;
unsigned long long num;
for (int i = 0; i < 6 && infile >> tmp; ++i)
numstr << hex << tmp;
if (cin.bad())
break;
cout << numstr.str() << endl;
numstr >> num;
numbers.push_back(num);
}
I tested it with the input you gave (00 00 23 51 A4 D2) and the contents of the vector were 592553170.

base32 conversion in C++

does anybody know any commonly used library for C++ that provides methods for encoding and decoding numbers from base 10 to base 32 and viceversa?
Thanks,
Stefano

[Updated] Apparently, the C++ std::setbase() IO manipulator and normal << and >> IO operators only handle bases 8, 10, and 16, and is therefore useless for handling base 32.
So to solve your issue of converting
strings with base 10/32 representation of numbers read from some input to integers in the program
integers in the program to strings with base 10/32 representations to be output
you will need to resort to other functions.
For converting C style strings containing base 2..36 representations to integers, you can use #include <cstdlib> and use the strtol(3) & Co. set of functions.
As for converting integers to strings with arbitrary base... I cannot find an easy answer. printf(3) style format strings only handle bases 8,10,16 AFAICS, just like std::setbase. Anyone?

Did you mean "base 10 to base 32", rather than integer to base32? The latter seems more likely and more useful; by default standard formatted I/O functions generate base 10 string format when dealing with integers.
For the base 32 to integer conversion the standard library strtol() function will do that. For the reciprocal, you don't need a library for something you can easily implement yourself (not everything is a lego brick).
Here's an example, not necessarily the most efficient, but simple;
#include <cstring>
#include <string>
long b32tol( std::string b32 )
{
return strtol( b32.c_str(), 0, 32 ) ;
}
std::string itob32( long i )
{
unsigned long u = *(reinterpret_cast<unsigned long*>)( &i ) ;
std::string b32 ;
do
{
int d = u % 32 ;
if( d < 10 )
{
b32.insert( 0, 1, '0' + d ) ;
}
else
{
b32.insert( 0, 1, 'a' + d - 10 ) ;
}
u /= 32 ;
} while( u > 0 );
return b32 ;
}
#include <iostream>
int main()
{
long i = 32*32*11 + 32*20 + 5 ; // BK5 in base 32
std::string b32 = itob32( i ) ;
long ii = b32tol( b32 ) ;
std::cout << i << std::endl ; // Original
std::cout << b32 << std::endl ; // Converted to b32
std::cout << ii << std::endl ; // Converted back
return 0 ;
}

In direct answer to the original (and now old) question, I don't know of any common library for encoding byte arrays in base32, or for decoding them again afterward. However, I was presented last week with a need to decode SHA1 hash values represented in base32 into their original byte arrays. Here's some C++ code (with some notable Windows/little endian artifacts) that I wrote to do just that, and to verify the results.
Note that in contrast with Clifford's code above, which, if I'm not mistaken, assumes the "base32hex" alphabet mentioned on RFC 4648, my code assumes the "base32" alphabet ("A-Z" and "2-7").
// This program illustrates how SHA1 hash values in base32 encoded form can be decoded
// and then re-encoded in base16.
#include "stdafx.h"
#include <string>
#include <vector>
#include <iostream>
#include <cassert>
using namespace std;
unsigned char Base16EncodeNibble( unsigned char value )
{
if( value >= 0 && value <= 9 )
return value + 48;
else if( value >= 10 && value <= 15 )
return (value-10) + 65;
else //assert(false);
{
cout << "Error: trying to convert value: " << value << endl;
}
return 42; // sentinal for error condition
}
void Base32DecodeBase16Encode(const string & input, string & output)
{
// Here's the base32 decoding:
// The "Base 32 Encoding" section of http://tools.ietf.org/html/rfc4648#page-8
// shows that every 8 bytes of base32 encoded data must be translated back into 5 bytes
// of original data during a decoding process. The following code does this.
int input_len = input.length();
assert( input_len == 32 );
const char * input_str = input.c_str();
int output_len = (input_len*5)/8;
assert( output_len == 20 );
// Because input strings are assumed to be SHA1 hash values in base32, it is also assumed
// that they will be 32 characters (and bytes in this case) in length, and so the output
// string should be 20 bytes in length.
unsigned char *output_str = new unsigned char[output_len];
char curr_char, temp_char;
long long temp_buffer = 0; //formerly: __int64 temp_buffer = 0;
for( int i=0; i<input_len; i++ )
{
curr_char = input_str[i];
if( curr_char >= 'A' && curr_char <= 'Z' )
temp_char = curr_char - 'A';
if( curr_char >= '2' && curr_char <= '7' )
temp_char = curr_char - '2' + 26;
if( temp_buffer )
temp_buffer <<= 5; //temp_buffer = (temp_buffer << 5);
temp_buffer |= temp_char;
// if 8 encoded characters have been decoded into the temp location,
// then copy them to the appropriate section of the final decoded location
if( (i>0) && !((i+1) % 8) )
{
unsigned char * source = reinterpret_cast<unsigned char*>(&temp_buffer);
//strncpy(output_str+(5*(((i+1)/8)-1)), source, 5);
int start_index = 5*(((i+1)/8)-1);
int copy_index = 4;
for( int x=start_index; x<(start_index+5); x++, copy_index-- )
output_str[x] = source[copy_index];
temp_buffer = 0;
// I could be mistaken, but I'm guessing that the necessity of copying
// in "reverse" order results from temp_buffer's little endian byte order.
}
}
// Here's the base16 encoding (for human-readable output and the chosen validation tests):
// The "Base 16 Encoding" section of http://tools.ietf.org/html/rfc4648#page-10
// shows that every byte original data must be encoded as two characters from the
// base16 alphabet - one charactor for the original byte's high nibble, and one for
// its low nibble.
unsigned char out_temp, chr_temp;
for( int y=0; y<output_len; y++ )
{
out_temp = Base16EncodeNibble( output_str[y] >> 4 ); //encode the high nibble
output.append( 1, static_cast<char>(out_temp) );
out_temp = Base16EncodeNibble( output_str[y] & 0xF ); //encode the low nibble
output.append( 1, static_cast<char>(out_temp) );
}
delete [] output_str;
}
int _tmain(int argc, _TCHAR* argv[])
{
//string input = "J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH";
vector<string> input_b32_strings, output_b16_strings, expected_b16_strings;
input_b32_strings.push_back("J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH");
expected_b16_strings.push_back("4EEC41C9238B127268B4392348FB165989BA4A27");
input_b32_strings.push_back("2HPUCIVW2EVBANIWCXOIQZX6N5NDIUSX");
expected_b16_strings.push_back("D1DF4122B6D12A10351615DC8866FE6F5A345257");
input_b32_strings.push_back("U4BDNCBAQFCPVDBL4FBG3AANGWVESI5J");
expected_b16_strings.push_back("A7023688208144FA8C2BE1426D800D35AA4923A9");
// Use the base conversion tool at http://darkfader.net/toolbox/convert/
// to verify that the above base32/base16 pairs are equivalent.
int num_input_strs = input_b32_strings.size();
for(int i=0; i<num_input_strs; i++)
{
string temp;
Base32DecodeBase16Encode(input_b32_strings[i], temp);
output_b16_strings.push_back(temp);
}
for(int j=0; j<num_input_strs; j++)
{
cout << input_b32_strings[j] << endl;
cout << output_b16_strings[j] << endl;
cout << expected_b16_strings[j] << endl;
if( output_b16_strings[j] != expected_b16_strings[j] )
{
cout << "Error in conversion for string " << j << endl;
}
}
return 0;
}

I'm not aware of any commonly-used library devoted to base32 encoding but Crypto++ includes a public domain base32 encoder and decoder.

I don't use cpp, so correct me if I'm wrong. I wrote this code for the sake of translating it from C# to save my acquaintance the trouble. The original source, that which I used to create these methods, is on a different post, here, on stackoverflow:
https://stackoverflow.com/a/10981113/13766753
That being said, here's my solution:
#include <iostream>
#include <math.h>
class Base32 {
public:
static std::string dict;
static std::string encode(int number) {
std::string result = "";
bool negative = false;
if (number < 0) {
negative = true;
}
number = abs(number);
do {
result = Base32::dict[fmod(floor(number), 32)] + result;
number /= 32;
} while(number > 0);
if (negative) {
result = "-" + result;
}
return result;
}
static int decode(std::string str) {
int result = 0;
int negative = 1;
if (str.rfind("-", 0) == 0) {
negative = -1;
str = str.substr(1);
}
for(char& letter : str) {
result += Base32::dict.find(letter);
result *= 32;
}
return result / 32 * negative;
}
};
std::string Base32::dict = "0123456789abcdefghijklmnopqrstuvwxyz";
int main() {
std::cout << Base32::encode(0) + "\n" << Base32::decode(Base32::encode(0)) << "\n";
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js