encoding / decoding issue - c++

I have a issue with my code below which does encoding of a vector of long into a string by storing the differences of the sequence.
The encode / decode works fine as long as the value is same or below 2^30
Any value above it, the logic fails. Note that the sizeof(long) is 8 bytes.
static std::string encode(const std::vector<long>& path) {
long lastValue = 0L;
std::stringstream result;
for (long value : path) {
long delta = value - lastValue;
lastValue = value;
long var = 0;
// Shift the delta value left by 1 bit and encode each 5-bit chunk into a character
for (var = delta < 0 ? ~(delta << 1) : delta << 1; var >= 32L; var >>= 5) {
result << (char)((32L | var & 31L) + 63L); //char is getting written to result stringstream
}
// Encode the last 5-bit chunk into a character
result << (char)(var + 63L); // char is getting written to result stringstream
}
std::cout << std::endl;
return result.str();
}
static std::unique_ptr<std::vector<long>> decode(const std::string& encoded) {
auto decoded = std::make_unique<std::vector<long>>();
long last_val = 0;
int index = 0;
while (index < encoded.length()) {
int shift = 0;
long current = 1;
int c;
do {
c = encoded[index++] - 63 - 1;
current += c << shift;
shift += 5;
} while (c >= 31);
long v = ( (current & 1) == 0 ? current >> 1 : ~(current >> 1) );
last_val += v;
decoded->push_back(last_val);
}
return std::move(decoded);
}
Can someone please provide insight what might be going wrong ?

inside the decode function, it was required to declare c as "long" and not as "int".

Related

Unpacking bitfields with c++ produces wrong result

I am trying to unpack mp3 frames using bitfields.
The header of mp3 frames starts with the syncword 0xFFF followed by 20 bits of header data. The structure of the header is represented as follows:
struct Mp3FrameRaw {
unsigned short fff:12; // Should always be 0xFFF = 4095
unsigned short mpeg_standard : 1;
unsigned short layer : 2;
unsigned short error_protection : 1;
unsigned short bitrate : 4;
unsigned short frequency : 2;
unsigned short pad_bit : 1;
unsigned short : 1;
unsigned short mode :2;
unsigned short mode_extension :2;
unsigned short copyrighted : 1;
unsigned short original: 1;
unsigned short emphasis: 2;
};
In total the header is 32 bit long.
My program first finds the syncword:
size_t find_sync_word(std::vector<unsigned char> & input) {
bool previous_was_ff = false;
for (size_t offset = 0; offset < input.size(); ++offset) {
if (previous_was_ff && (input[offset] & 0xF0 == 0xF0))
return offset - 1;
previous_was_ff = 0xFF == input[offset];
}
return -1;
}
And then tries to unpack the first header:
int parse(std::vector<unsigned char> & input) {
size_t offset = find_sync_word(input);
if (offset < 0) {
std::cerr << "Not a valid Mp3 file" << std::endl;
return -1;
}
Mp3FrameRaw *frame_ptr = reinterpret_cast<Mp3FrameRaw * >(input.data() + offset);
std::cout << frame_ptr->fff << " (Should always be 4095)" << std::endl;
std::cout << frame_ptr->layer << " (Should be 1 )" << std::endl;
std::cout << frame_ptr->bitrate << " (Should be 1-14)" << std::endl;
return 0;
}
The main.cpp reads:
int main() {
std::ifstream mp3_file("/path/to/file.mp3", std::ios::binary);
std::vector<unsigned char> file_contents((std::istreambuf_iterator<char>(mp3_file)),
std::istreambuf_iterator<char>());
return parse(file_contents);
}
The result reads:
3071 (Should always be 4095)
3 (Should be 1 )
0 (Should be 1 - 14)
Contrary, if I unpack the fields manually bit by bit, everything works as expected. e.g
{
size_t offset;
Mp3FrameRaw frame;
...
frame.fff = input[offset++];
frame.fff = (frame.fff << 4) | (input[offset] >> 4);
frame.mpeg_standard = (input[offset] >> 3) & 1;
frame.layer = (input[offset] >> 1) & 0x3;
frame.error_protection = (input[offset++]) & 0x1;
frame.bitrate = input[offset] >> 4;
...
}
I assume that the bitfields are not located in a way they intuitively should do. What am I doing wrong?
I am using gcc on Ubuntu 18.04.

Converting many bits to Base 10

I am building a class in C++ which can be used to store arbitrarily large integers. I am storing them as binary in a vector. I need to be able to print this vector in base 10 so it is easier for a human to understand. I know that I could convert it to an int and then output that int. However, my numbers will be much larger than any primitive types. How can I convert this directly to a string.
Here is my code so far. I am new to C++ so if you have any other suggestions that would be great too. I need help filling in the string toBaseTenString() function.
class BinaryInt
{
private:
bool lastDataUser = true;
vector<bool> * data;
BinaryInt(vector<bool> * pointer)
{
data = pointer;
}
public:
BinaryInt(int n)
{
data = new vector<bool>();
while(n > 0)
{
data->push_back(n % 2);
n = n >> 1;
}
}
BinaryInt(const BinaryInt & from)
{
from.lastDataUser = false;
this->data = from.data;
}
~BinaryInt()
{
if(lastDataUser)
delete data;
}
string toBinaryString();
string toBaseTenString();
static BinaryInt add(BinaryInt a, BinaryInt b);
static BinaryInt mult(BinaryInt a, BinaryInt b);
};
BinaryInt BinaryInt::add(BinaryInt a, BinaryInt b)
{
int aSize = a.data->size();
int bSize = b.data->size();
int newDataSize = max(aSize, bSize);
vector<bool> * newData = new vector<bool>(newDataSize);
bool carry = 0;
for(int i = 0; i < newDataSize; i++)
{
int sum = (i < aSize ? a.data->at(i) : 0) + (i < bSize ? b.data->at(i) : 0) + carry;
(*newData)[i] = sum % 2;
carry = sum >> 1;
}
if(carry)
newData->push_back(carry);
return BinaryInt(newData);
}
string BinaryInt::toBinaryString()
{
stringstream ss;
for(int i = data->size() - 1; i >= 0; i--)
{
ss << (*data)[i];
}
return ss.str();
}
string BinaryInt::toBaseTenString()
{
//Not sure how to do this
}
I know you said in your OP that "my numbers will be much larger than any primitive types", but just hear me out on this.
In the past, I've used std::bitset to work with binary representations of numbers and converting back and forth from various other representations. std::bitset is basically a fancy std::vector with some added functionality. You can read more about it here if it sounds interesting, but here's some small stupid example code to show you how it could work:
std::bitset<8> myByte;
myByte |= 1; // mByte = 00000001
myByte <<= 4; // mByte = 00010000
myByte |= 1; // mByte = 00010001
std::cout << myByte.to_string() << '\n'; // Outputs '00010001'
std::cout << myByte.to_ullong() << '\n'; // Outputs '17'
You can access the bitset by standard array notation as well. By the way, that second conversion I showed (to_ullong) converts to an unsigned long long, which I believe has a max value of 18,446,744,073,709,551,615. If you need larger values than that, good luck!
Just iterate (backwards) your vector<bool> and accumulate the corresponding value when the iterator is true:
int base10(const std::vector<bool> &value)
{
int result = 0;
int bit = 1;
for (vb::const_reverse_iterator b = value.rbegin(), e = value.rend(); b != e; ++b, bit <<= 1)
result += (*b ? bit : 0);
return result;
}
Live demo.
Beware! this code is only a guide, you will need to take care of int overflowing if the value is pretty big.
Hope it helps.

Base 64 Encoding Losing data

This is my fourth attempt at doing base64 encoding. My first tries work but it isn't standard. It's also extremely slow!!! I used vectors and push_back and erase a lot.
So I decided to re-write it and this is much much faster! Except that it loses data. -__-
I need as much speed as I can possibly get because I'm compressing a pixel buffer and base64 encoding the compressed string. I'm using ZLib. The images are 1366 x 768 so yeah.
I do not want to copy any code I find online because... Well, I like to write things myself and I don't like worrying about copyright stuff or having to put a ton of credits from different sources all over my code..
Anyway, my code is as follows below. It's very short and simple.
const static std::string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
inline bool IsBase64(std::uint8_t C)
{
return (isalnum(C) || (C == '+') || (C == '/'));
}
std::string Copy(std::string Str, int FirstChar, int Count)
{
if (FirstChar <= 0)
FirstChar = 0;
else
FirstChar -= 1;
return Str.substr(FirstChar, Count);
}
std::string DecToBinStr(int Num, int Padding)
{
int Bin = 0, Pos = 1;
std::stringstream SS;
while (Num > 0)
{
Bin += (Num % 2) * Pos;
Num /= 2;
Pos *= 10;
}
SS.fill('0');
SS.width(Padding);
SS << Bin;
return SS.str();
}
int DecToBinStr(std::string DecNumber)
{
int Bin = 0, Pos = 1;
int Dec = strtol(DecNumber.c_str(), NULL, 10);
while (Dec > 0)
{
Bin += (Dec % 2) * Pos;
Dec /= 2;
Pos *= 10;
}
return Bin;
}
int BinToDecStr(std::string BinNumber)
{
int Dec = 0;
int Bin = strtol(BinNumber.c_str(), NULL, 10);
for (int I = 0; Bin > 0; ++I)
{
if(Bin % 10 == 1)
{
Dec += (1 << I);
}
Bin /= 10;
}
return Dec;
}
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
int PaddingAmount = ((-Result.size() * 3) & 3);
for (int I = 0; I < PaddingAmount; ++I)
Result += '=';
return Result;
}
std::string DecodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = Data.size(); I > 0; --I)
{
if (Data[I - 1] != '=')
{
std::string Characters = Copy(Data, 0, I);
for (std::size_t J = 0; J < Characters.size(); ++J)
Binary += DecToBinStr(Base64Chars.find(Characters[J]), 6);
break;
}
}
for (std::size_t I = 0; I < Binary.size(); I += 8)
{
Result += (char)BinToDecStr(Copy(Binary, I, 8));
if (I == 0) ++I;
}
return Result;
}
I've been using the above like this:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604)); //IMG.677*604
std::cout<<DecodeBase64(Data); //Prints IMG.677*601
}
As you can see in the above, it prints the wrong string. It's fairly close but for some reason, the 4 is turned into a 1!
Now if I do:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(1366) + "*" + ::ToString(768)); //IMG.1366*768
std::cout<<DecodeBase64(Data); //Prints IMG.1366*768
}
It prints correctly.. I'm not sure what is going on at all or where to begin looking.
Just in-case anyone is curious and want to see my other attempts (the slow ones): http://pastebin.com/Xcv03KwE
I'm really hoping someone could shed some light on speeding things up or at least figuring out what's wrong with my code :l
The main encoding issue is that you are not accounting for data that is not a multiple of 6 bits. In this case, the final 4 you have is being converted into 0100 instead of 010000 because there are no more bits to read. You are supposed to pad with 0s.
After changing your Copy like this, the final encoded character is Q, instead of the original E.
std::string data = Str.substr(FirstChar, Count);
while(data.size() < Count) data += '0';
return data;
Also, it appears that your logic for adding padding = is off because it is adding one too many = in this case.
As far as comments on speed, I'd focus primarily on trying to reduce your usage of std::string. The way you are currently converting the data into a string with 0 and 1 is pretty inefficent considering that the source could be read directly with bitwise operators.
I'm not sure whether I could easily come up with a slower method of doing Base-64 conversions.
The code requires 4 headers (on Mac OS X 10.7.5 with G++ 4.7.1) and the compiler option -std=c++11 to make the #include <cstdint> acceptable:
#include <string>
#include <iostream>
#include <sstream>
#include <cstdint>
It also requires a function ToString() that was not defined; I created:
std::string ToString(int value)
{
std::stringstream ss;
ss << value;
return ss.str();
}
The code in your main() — which is what uses the ToString() function — is a little odd: why do you need to build a string from pieces instead of simply using "IMG.677*604"?
Also, it is worth printing out the intermediate result:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));
std::cout << Data << std::endl;
std::cout << DecodeBase64(Data) << std::endl; //Prints IMG.677*601
}
This yields:
SU1HLjY3Nyo2MDE===
IMG.677*601
The output string (SU1HLjY3Nyo2MDE===) is 18 bytes long; that has to be wrong as a valid Base-64 encoded string has to be a multiple of 4 bytes long (as three 8-bit bytes are encoded into four bytes each containing 6 bits of the original data). This immediately tells us there are problems. You should only get zero, one or two pad (=) characters; never three. This also confirms that there are problems.
Removing two of the pad characters leaves a valid Base-64 string. When I use my own home-brew Base-64 encoding and decoding functions to decode your (truncated) output, it gives me:
Base64:
0x0000: SU1HLjY3Nyo2MDE=
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 31 00 IMG.677*601.
Thus it appears you have encode the null terminating the string. When I encode IMG.677*604, the output I get is:
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 34 IMG.677*604
Base64: SU1HLjY3Nyo2MDQ=
You say you want to speed up your code. Quite apart from fixing it so that it encodes correctly (I've not really studied the decoding), you will want to avoid all the string manipulation you do. It should be a bit manipulation exercise, not a string manipulation exercise.
I have 3 small encoding routines in my code, to encode triplets, doublets and singlets:
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
This is written as C code rather than using native C++ idioms, but the code shown should compile with C++ (unlike the C99 initializers elsewhere in the source). The base_64_map[] array corresponds to your Base64Chars string. The pad character passed in is normally '=', but can be '\0' since the system I work with has eccentric ideas about not needing padding (pre-dating my involvement in the code, and it uses a non-standard alphabet to boot) and the code handles both the non-standard and the RFC 3548 standard.
The driving code is:
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
int base64_encode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
return(base64_encode_internal(data, datalen, buffer, buflen, base64_pad));
}
The base64_pad constant is the '='; there's also a base64_encode_nopad() function that supplies '\0' instead. The errors are somewhat arbitrary but relevant to the code.
The main point to take away from this is that you should be doing bit manipulation and building up a string that is an exact multiple of 4 bytes for a given input.
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
if (Binary.size() % 6)
{
Binary.resize(Binary.size() + 6 - Binary.size() % 6, '0');
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
if (Result.size() % 4)
{
Result.resize(Result.size() + 4 - Result.size() % 4, '=');
}
return Result;
}

Convert integer to binary and store it in an integer array of specified size:c++

I want to convert an integer to binary string and then store each bit of the integer string to an element of a integer array of a given size. I am sure that the input integer's binary expression won't exceed the size of the array specified. How to do this in c++?
Pseudo code:
int value = ???? // assuming a 32 bit int
int i;
for (i = 0; i < 32; ++i) {
array[i] = (value >> i) & 1;
}
template<class output_iterator>
void convert_number_to_array_of_digits(const unsigned number,
output_iterator first, output_iterator last)
{
const unsigned number_bits = CHAR_BIT*sizeof(int);
//extract bits one at a time
for(unsigned i=0; i<number_bits && first!=last; ++i) {
const unsigned shift_amount = number_bits-i-1;
const unsigned this_bit = (number>>shift_amount)&1;
*first = this_bit;
++first;
}
//pad the rest with zeros
while(first != last) {
*first = 0;
++first;
}
}
int main() {
int number = 413523152;
int array[32];
convert_number_to_array_of_digits(number, std::begin(array), std::end(array));
for(int i=0; i<32; ++i)
std::cout << array[i] << ' ';
}
Proof of compilation here
You could use C++'s bitset library, as follows.
#include<iostream>
#include<bitset>
int main()
{
int N;//input number in base 10
cin>>N;
int O[32];//The output array
bitset<32> A=N;//A will hold the binary representation of N
for(int i=0,j=31;i<32;i++,j--)
{
//Assigning the bits one by one.
O[i]=A[j];
}
return 0;
}
A couple of points to note here:
First, 32 in the bitset declaration statement tells the compiler that you want 32 bits to represent your number, so even if your number takes fewer bits to represent, the bitset variable will have 32 bits, possibly with many leading zeroes.
Second, bitset is a really flexible way of handling binary, you can give a string as its input or a number, and again you can use the bitset as an array or as a string.It's a really handy library.
You can print out the bitset variable A as
cout<<A;
and see how it works.
You can do like this:
while (input != 0) {
if (input & 1)
result[index] = 1;
else
result[index] =0;
input >>= 1;// dividing by two
index++;
}
As Mat mentioned above, an int is already a bit-vector (using bitwise operations, you can check each bit). So, you can simply try something like this:
// Note: This depends on the endianess of your machine
int x = 0xdeadbeef; // Your integer?
int arr[sizeof(int)*CHAR_BIT];
for(int i = 0 ; i < sizeof(int)*CHAR_BIT ; ++i) {
arr[i] = (x & (0x01 << i)) ? 1 : 0; // Take the i-th bit
}
Decimal to Binary: Size independent
Two ways: both stores binary represent into a dynamic allocated array bits (in msh to lsh).
First Method:
#include<limits.h> // include for CHAR_BIT
int* binary(int dec){
int* bits = calloc(sizeof(int) * CHAR_BIT, sizeof(int));
if(bits == NULL) return NULL;
int i = 0;
// conversion
int left = sizeof(int) * CHAR_BIT - 1;
for(i = 0; left >= 0; left--, i++){
bits[i] = !!(dec & ( 1u << left ));
}
return bits;
}
Second Method:
#include<limits.h> // include for CHAR_BIT
int* binary(unsigned int num)
{
unsigned int mask = 1u << ((sizeof(int) * CHAR_BIT) - 1);
//mask = 1000 0000 0000 0000
int* bits = calloc(sizeof(int) * CHAR_BIT, sizeof(int));
if(bits == NULL) return NULL;
int i = 0;
//conversion
while(mask > 0){
if((num & mask) == 0 )
bits[i] = 0;
else
bits[i] = 1;
mask = mask >> 1 ; // Right Shift
i++;
}
return bits;
}
I know it doesn't add as many Zero's as you wish for positive numbers. But for negative binary numbers, it works pretty well.. I just wanted to post a solution for once :)
int BinToDec(int Value, int Padding = 8)
{
int Bin = 0;
for (int I = 1, Pos = 1; I < (Padding + 1); ++I, Pos *= 10)
{
Bin += ((Value >> I - 1) & 1) * Pos;
}
return Bin;
}
This is what I use, it also lets you give the number of bits that will be in the final vector, fills any unused bits with leading 0s.
std::vector<int> to_binary(int num_to_convert_to_binary, int num_bits_in_out_vec)
{
std::vector<int> r;
// make binary vec of minimum size backwards (LSB at .end() and MSB at .begin())
while (num_to_convert_to_binary > 0)
{
//cout << " top of loop" << endl;
if (num_to_convert_to_binary % 2 == 0)
r.push_back(0);
else
r.push_back(1);
num_to_convert_to_binary = num_to_convert_to_binary / 2;
}
while(r.size() < num_bits_in_out_vec)
r.push_back(0);
return r;
}

base32 conversion in C++

does anybody know any commonly used library for C++ that provides methods for encoding and decoding numbers from base 10 to base 32 and viceversa?
Thanks,
Stefano
[Updated] Apparently, the C++ std::setbase() IO manipulator and normal << and >> IO operators only handle bases 8, 10, and 16, and is therefore useless for handling base 32.
So to solve your issue of converting
strings with base 10/32 representation of numbers read from some input to integers in the program
integers in the program to strings with base 10/32 representations to be output
you will need to resort to other functions.
For converting C style strings containing base 2..36 representations to integers, you can use #include <cstdlib> and use the strtol(3) & Co. set of functions.
As for converting integers to strings with arbitrary base... I cannot find an easy answer. printf(3) style format strings only handle bases 8,10,16 AFAICS, just like std::setbase. Anyone?
Did you mean "base 10 to base 32", rather than integer to base32? The latter seems more likely and more useful; by default standard formatted I/O functions generate base 10 string format when dealing with integers.
For the base 32 to integer conversion the standard library strtol() function will do that. For the reciprocal, you don't need a library for something you can easily implement yourself (not everything is a lego brick).
Here's an example, not necessarily the most efficient, but simple;
#include <cstring>
#include <string>
long b32tol( std::string b32 )
{
return strtol( b32.c_str(), 0, 32 ) ;
}
std::string itob32( long i )
{
unsigned long u = *(reinterpret_cast<unsigned long*>)( &i ) ;
std::string b32 ;
do
{
int d = u % 32 ;
if( d < 10 )
{
b32.insert( 0, 1, '0' + d ) ;
}
else
{
b32.insert( 0, 1, 'a' + d - 10 ) ;
}
u /= 32 ;
} while( u > 0 );
return b32 ;
}
#include <iostream>
int main()
{
long i = 32*32*11 + 32*20 + 5 ; // BK5 in base 32
std::string b32 = itob32( i ) ;
long ii = b32tol( b32 ) ;
std::cout << i << std::endl ; // Original
std::cout << b32 << std::endl ; // Converted to b32
std::cout << ii << std::endl ; // Converted back
return 0 ;
}
In direct answer to the original (and now old) question, I don't know of any common library for encoding byte arrays in base32, or for decoding them again afterward. However, I was presented last week with a need to decode SHA1 hash values represented in base32 into their original byte arrays. Here's some C++ code (with some notable Windows/little endian artifacts) that I wrote to do just that, and to verify the results.
Note that in contrast with Clifford's code above, which, if I'm not mistaken, assumes the "base32hex" alphabet mentioned on RFC 4648, my code assumes the "base32" alphabet ("A-Z" and "2-7").
// This program illustrates how SHA1 hash values in base32 encoded form can be decoded
// and then re-encoded in base16.
#include "stdafx.h"
#include <string>
#include <vector>
#include <iostream>
#include <cassert>
using namespace std;
unsigned char Base16EncodeNibble( unsigned char value )
{
if( value >= 0 && value <= 9 )
return value + 48;
else if( value >= 10 && value <= 15 )
return (value-10) + 65;
else //assert(false);
{
cout << "Error: trying to convert value: " << value << endl;
}
return 42; // sentinal for error condition
}
void Base32DecodeBase16Encode(const string & input, string & output)
{
// Here's the base32 decoding:
// The "Base 32 Encoding" section of http://tools.ietf.org/html/rfc4648#page-8
// shows that every 8 bytes of base32 encoded data must be translated back into 5 bytes
// of original data during a decoding process. The following code does this.
int input_len = input.length();
assert( input_len == 32 );
const char * input_str = input.c_str();
int output_len = (input_len*5)/8;
assert( output_len == 20 );
// Because input strings are assumed to be SHA1 hash values in base32, it is also assumed
// that they will be 32 characters (and bytes in this case) in length, and so the output
// string should be 20 bytes in length.
unsigned char *output_str = new unsigned char[output_len];
char curr_char, temp_char;
long long temp_buffer = 0; //formerly: __int64 temp_buffer = 0;
for( int i=0; i<input_len; i++ )
{
curr_char = input_str[i];
if( curr_char >= 'A' && curr_char <= 'Z' )
temp_char = curr_char - 'A';
if( curr_char >= '2' && curr_char <= '7' )
temp_char = curr_char - '2' + 26;
if( temp_buffer )
temp_buffer <<= 5; //temp_buffer = (temp_buffer << 5);
temp_buffer |= temp_char;
// if 8 encoded characters have been decoded into the temp location,
// then copy them to the appropriate section of the final decoded location
if( (i>0) && !((i+1) % 8) )
{
unsigned char * source = reinterpret_cast<unsigned char*>(&temp_buffer);
//strncpy(output_str+(5*(((i+1)/8)-1)), source, 5);
int start_index = 5*(((i+1)/8)-1);
int copy_index = 4;
for( int x=start_index; x<(start_index+5); x++, copy_index-- )
output_str[x] = source[copy_index];
temp_buffer = 0;
// I could be mistaken, but I'm guessing that the necessity of copying
// in "reverse" order results from temp_buffer's little endian byte order.
}
}
// Here's the base16 encoding (for human-readable output and the chosen validation tests):
// The "Base 16 Encoding" section of http://tools.ietf.org/html/rfc4648#page-10
// shows that every byte original data must be encoded as two characters from the
// base16 alphabet - one charactor for the original byte's high nibble, and one for
// its low nibble.
unsigned char out_temp, chr_temp;
for( int y=0; y<output_len; y++ )
{
out_temp = Base16EncodeNibble( output_str[y] >> 4 ); //encode the high nibble
output.append( 1, static_cast<char>(out_temp) );
out_temp = Base16EncodeNibble( output_str[y] & 0xF ); //encode the low nibble
output.append( 1, static_cast<char>(out_temp) );
}
delete [] output_str;
}
int _tmain(int argc, _TCHAR* argv[])
{
//string input = "J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH";
vector<string> input_b32_strings, output_b16_strings, expected_b16_strings;
input_b32_strings.push_back("J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH");
expected_b16_strings.push_back("4EEC41C9238B127268B4392348FB165989BA4A27");
input_b32_strings.push_back("2HPUCIVW2EVBANIWCXOIQZX6N5NDIUSX");
expected_b16_strings.push_back("D1DF4122B6D12A10351615DC8866FE6F5A345257");
input_b32_strings.push_back("U4BDNCBAQFCPVDBL4FBG3AANGWVESI5J");
expected_b16_strings.push_back("A7023688208144FA8C2BE1426D800D35AA4923A9");
// Use the base conversion tool at http://darkfader.net/toolbox/convert/
// to verify that the above base32/base16 pairs are equivalent.
int num_input_strs = input_b32_strings.size();
for(int i=0; i<num_input_strs; i++)
{
string temp;
Base32DecodeBase16Encode(input_b32_strings[i], temp);
output_b16_strings.push_back(temp);
}
for(int j=0; j<num_input_strs; j++)
{
cout << input_b32_strings[j] << endl;
cout << output_b16_strings[j] << endl;
cout << expected_b16_strings[j] << endl;
if( output_b16_strings[j] != expected_b16_strings[j] )
{
cout << "Error in conversion for string " << j << endl;
}
}
return 0;
}
I'm not aware of any commonly-used library devoted to base32 encoding but Crypto++ includes a public domain base32 encoder and decoder.
I don't use cpp, so correct me if I'm wrong. I wrote this code for the sake of translating it from C# to save my acquaintance the trouble. The original source, that which I used to create these methods, is on a different post, here, on stackoverflow:
https://stackoverflow.com/a/10981113/13766753
That being said, here's my solution:
#include <iostream>
#include <math.h>
class Base32 {
public:
static std::string dict;
static std::string encode(int number) {
std::string result = "";
bool negative = false;
if (number < 0) {
negative = true;
}
number = abs(number);
do {
result = Base32::dict[fmod(floor(number), 32)] + result;
number /= 32;
} while(number > 0);
if (negative) {
result = "-" + result;
}
return result;
}
static int decode(std::string str) {
int result = 0;
int negative = 1;
if (str.rfind("-", 0) == 0) {
negative = -1;
str = str.substr(1);
}
for(char& letter : str) {
result += Base32::dict.find(letter);
result *= 32;
}
return result / 32 * negative;
}
};
std::string Base32::dict = "0123456789abcdefghijklmnopqrstuvwxyz";
int main() {
std::cout << Base32::encode(0) + "\n" << Base32::decode(Base32::encode(0)) << "\n";
return 0;
}