Base 64 Encoding Losing data - c++

This is my fourth attempt at doing base64 encoding. My first tries work but it isn't standard. It's also extremely slow!!! I used vectors and push_back and erase a lot.
So I decided to re-write it and this is much much faster! Except that it loses data. -__-
I need as much speed as I can possibly get because I'm compressing a pixel buffer and base64 encoding the compressed string. I'm using ZLib. The images are 1366 x 768 so yeah.
I do not want to copy any code I find online because... Well, I like to write things myself and I don't like worrying about copyright stuff or having to put a ton of credits from different sources all over my code..
Anyway, my code is as follows below. It's very short and simple.
const static std::string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
inline bool IsBase64(std::uint8_t C)
{
return (isalnum(C) || (C == '+') || (C == '/'));
}
std::string Copy(std::string Str, int FirstChar, int Count)
{
if (FirstChar <= 0)
FirstChar = 0;
else
FirstChar -= 1;
return Str.substr(FirstChar, Count);
}
std::string DecToBinStr(int Num, int Padding)
{
int Bin = 0, Pos = 1;
std::stringstream SS;
while (Num > 0)
{
Bin += (Num % 2) * Pos;
Num /= 2;
Pos *= 10;
}
SS.fill('0');
SS.width(Padding);
SS << Bin;
return SS.str();
}
int DecToBinStr(std::string DecNumber)
{
int Bin = 0, Pos = 1;
int Dec = strtol(DecNumber.c_str(), NULL, 10);
while (Dec > 0)
{
Bin += (Dec % 2) * Pos;
Dec /= 2;
Pos *= 10;
}
return Bin;
}
int BinToDecStr(std::string BinNumber)
{
int Dec = 0;
int Bin = strtol(BinNumber.c_str(), NULL, 10);
for (int I = 0; Bin > 0; ++I)
{
if(Bin % 10 == 1)
{
Dec += (1 << I);
}
Bin /= 10;
}
return Dec;
}
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
int PaddingAmount = ((-Result.size() * 3) & 3);
for (int I = 0; I < PaddingAmount; ++I)
Result += '=';
return Result;
}
std::string DecodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = Data.size(); I > 0; --I)
{
if (Data[I - 1] != '=')
{
std::string Characters = Copy(Data, 0, I);
for (std::size_t J = 0; J < Characters.size(); ++J)
Binary += DecToBinStr(Base64Chars.find(Characters[J]), 6);
break;
}
}
for (std::size_t I = 0; I < Binary.size(); I += 8)
{
Result += (char)BinToDecStr(Copy(Binary, I, 8));
if (I == 0) ++I;
}
return Result;
}
I've been using the above like this:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604)); //IMG.677*604
std::cout<<DecodeBase64(Data); //Prints IMG.677*601
}
As you can see in the above, it prints the wrong string. It's fairly close but for some reason, the 4 is turned into a 1!
Now if I do:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(1366) + "*" + ::ToString(768)); //IMG.1366*768
std::cout<<DecodeBase64(Data); //Prints IMG.1366*768
}
It prints correctly.. I'm not sure what is going on at all or where to begin looking.
Just in-case anyone is curious and want to see my other attempts (the slow ones): http://pastebin.com/Xcv03KwE
I'm really hoping someone could shed some light on speeding things up or at least figuring out what's wrong with my code :l

The main encoding issue is that you are not accounting for data that is not a multiple of 6 bits. In this case, the final 4 you have is being converted into 0100 instead of 010000 because there are no more bits to read. You are supposed to pad with 0s.
After changing your Copy like this, the final encoded character is Q, instead of the original E.
std::string data = Str.substr(FirstChar, Count);
while(data.size() < Count) data += '0';
return data;
Also, it appears that your logic for adding padding = is off because it is adding one too many = in this case.
As far as comments on speed, I'd focus primarily on trying to reduce your usage of std::string. The way you are currently converting the data into a string with 0 and 1 is pretty inefficent considering that the source could be read directly with bitwise operators.

I'm not sure whether I could easily come up with a slower method of doing Base-64 conversions.
The code requires 4 headers (on Mac OS X 10.7.5 with G++ 4.7.1) and the compiler option -std=c++11 to make the #include <cstdint> acceptable:
#include <string>
#include <iostream>
#include <sstream>
#include <cstdint>
It also requires a function ToString() that was not defined; I created:
std::string ToString(int value)
{
std::stringstream ss;
ss << value;
return ss.str();
}
The code in your main() — which is what uses the ToString() function — is a little odd: why do you need to build a string from pieces instead of simply using "IMG.677*604"?
Also, it is worth printing out the intermediate result:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));
std::cout << Data << std::endl;
std::cout << DecodeBase64(Data) << std::endl; //Prints IMG.677*601
}
This yields:
SU1HLjY3Nyo2MDE===
IMG.677*601
The output string (SU1HLjY3Nyo2MDE===) is 18 bytes long; that has to be wrong as a valid Base-64 encoded string has to be a multiple of 4 bytes long (as three 8-bit bytes are encoded into four bytes each containing 6 bits of the original data). This immediately tells us there are problems. You should only get zero, one or two pad (=) characters; never three. This also confirms that there are problems.
Removing two of the pad characters leaves a valid Base-64 string. When I use my own home-brew Base-64 encoding and decoding functions to decode your (truncated) output, it gives me:
Base64:
0x0000: SU1HLjY3Nyo2MDE=
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 31 00 IMG.677*601.
Thus it appears you have encode the null terminating the string. When I encode IMG.677*604, the output I get is:
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 34 IMG.677*604
Base64: SU1HLjY3Nyo2MDQ=
You say you want to speed up your code. Quite apart from fixing it so that it encodes correctly (I've not really studied the decoding), you will want to avoid all the string manipulation you do. It should be a bit manipulation exercise, not a string manipulation exercise.
I have 3 small encoding routines in my code, to encode triplets, doublets and singlets:
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
This is written as C code rather than using native C++ idioms, but the code shown should compile with C++ (unlike the C99 initializers elsewhere in the source). The base_64_map[] array corresponds to your Base64Chars string. The pad character passed in is normally '=', but can be '\0' since the system I work with has eccentric ideas about not needing padding (pre-dating my involvement in the code, and it uses a non-standard alphabet to boot) and the code handles both the non-standard and the RFC 3548 standard.
The driving code is:
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
int base64_encode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
return(base64_encode_internal(data, datalen, buffer, buflen, base64_pad));
}
The base64_pad constant is the '='; there's also a base64_encode_nopad() function that supplies '\0' instead. The errors are somewhat arbitrary but relevant to the code.
The main point to take away from this is that you should be doing bit manipulation and building up a string that is an exact multiple of 4 bytes for a given input.

std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
if (Binary.size() % 6)
{
Binary.resize(Binary.size() + 6 - Binary.size() % 6, '0');
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
if (Result.size() % 4)
{
Result.resize(Result.size() + 4 - Result.size() % 4, '=');
}
return Result;
}

Related

encoding / decoding issue

I have a issue with my code below which does encoding of a vector of long into a string by storing the differences of the sequence.
The encode / decode works fine as long as the value is same or below 2^30
Any value above it, the logic fails. Note that the sizeof(long) is 8 bytes.
static std::string encode(const std::vector<long>& path) {
long lastValue = 0L;
std::stringstream result;
for (long value : path) {
long delta = value - lastValue;
lastValue = value;
long var = 0;
// Shift the delta value left by 1 bit and encode each 5-bit chunk into a character
for (var = delta < 0 ? ~(delta << 1) : delta << 1; var >= 32L; var >>= 5) {
result << (char)((32L | var & 31L) + 63L); //char is getting written to result stringstream
}
// Encode the last 5-bit chunk into a character
result << (char)(var + 63L); // char is getting written to result stringstream
}
std::cout << std::endl;
return result.str();
}
static std::unique_ptr<std::vector<long>> decode(const std::string& encoded) {
auto decoded = std::make_unique<std::vector<long>>();
long last_val = 0;
int index = 0;
while (index < encoded.length()) {
int shift = 0;
long current = 1;
int c;
do {
c = encoded[index++] - 63 - 1;
current += c << shift;
shift += 5;
} while (c >= 31);
long v = ( (current & 1) == 0 ? current >> 1 : ~(current >> 1) );
last_val += v;
decoded->push_back(last_val);
}
return std::move(decoded);
}
Can someone please provide insight what might be going wrong ?
inside the decode function, it was required to declare c as "long" and not as "int".

Windows API base64 encode/decod

I want to base64 a big file (500MB)
I use this code but it doesn't work for a large file
I test CryptStringToBinary but it doesn't work too
what should I do????
The issue is clearly that there is not enough memory to store a 500 megabyte string in a 32-bit application.
The one solution is alluded to by the this link, which writes the data to a string. Assuming that the code works correctly, it is not that hard to adjust it to write to a file stream.
#include <windows.h>
#include <fstream>
static const wchar_t *Base64Digits = L"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int Base64Encode(const BYTE* pSrc, int nLenSrc, std::wostream& pDstStrm, int nLenDst)
{
wchar_t pDst[4];
int nLenOut = 0;
while (nLenSrc > 0) {
if (nLenDst < 4) return(0);
int len = 0;
BYTE s1 = pSrc[len++];
BYTE s2 = (nLenSrc > 1) ? pSrc[len++] : 0;
BYTE s3 = (nLenSrc > 2) ? pSrc[len++] : 0;
pSrc += len;
nLenSrc -= len;
//------------------ lookup the right digits for output
pDst[0] = Base64Digits[(s1 >> 2) & 0x3F];
pDst[1] = Base64Digits[(((s1 & 0x3) << 4) | ((s2 >> 4) & 0xF)) & 0x3F];
pDst[2] = Base64Digits[(((s2 & 0xF) << 2) | ((s3 >> 6) & 0x3)) & 0x3F];
pDst[3] = Base64Digits[s3 & 0x3F];
//--------- end of input handling
if (len < 3) { // less than 24 src bits encoded, pad with '='
pDst[3] = L'=';
if (len == 1)
pDst[2] = L'=';
}
nLenOut += 4;
// write the data to a file
pDstStrm.write(pDst,4);
nLenDst -= 4;
}
if (nLenDst > 0) *pDst = 0;
return (nLenOut);
}
The only changes done were to write the 4 bytes to a wide stream instead of appending the data to a string
Here is an example call:
int main()
{
std::wofstream ofs(L"testfile.out");
Base64Encode((BYTE*)"This is a test", strlen("This is a test"), ofs, 1000);
}
The above produces a file with the base64 string VGhpcyBpcyBhIHRlc3Q=, which when decoded, produces This is a test.
Note that the parameter is std::wostream, which means any wide output stream class (such as std::wostringstream) will work also.

C++ Any faster method to write a large binary file?

Goal
My goal is to quickly create a file from a large binary string (a string that contains only 1 and 0).
Straight to the point
I need a function that can achieve my goal. If I am not clear enough, please read on.
Example
Test.exe is running...
.
Inputted binary string:
1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
Done!
File(Test.txt) In Byte(s):
0xFF, 0xAA
.
Test.exe executed successfully!
Explanation
First, Test.exe requested the user to input a binary string.
Then, it converted the inputted binary string to hexadecimal.
Finally, it wrote the converted value to a file called Test.txt.
I've tried
As an fail attempt to achieve my goal, I've created this simple (and possibly horrible) function (hey, at least I tried):
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr )
{
std::ofstream OutputFile( Destination, std::ofstream::binary );
for( ::UINT Index1 = 0, Dec = 0;
// 8-Bit binary.
Index1 != BinaryStr.length( )/8;
// Get the next set of binary value.
// Write the decimal value as unsigned char to file.
// Reset decimal value to 0.
++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
{
// Convert the 8-bit binary to hexadecimal using the
// positional notation method - this is how its done:
// http://www.wikihow.com/Convert-from-Binary-to-Decimal
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
}
OutputFile.close( );
};
Example of usage
#include "Global.h"
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr );
int main( void )
{
std::string Bin = "";
// Create a binary string that is a size of 9.53674 mb
// Note: The creation of this string will take awhile.
// However, I only start to calculate the speed of writing
// and converting after it is done generating the string.
// This string is just created for an example.
std::cout << "Generating...\n";
while( Bin.length( ) != 80000000 )
Bin += "10101010";
std::cout << "Writing...\n";
BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );
std::cout << "Done!\n";
#ifdef IS_DEBUGGING
std::cout << "Paused...\n";
::getchar( );
#endif
return( 0 );
};
Problem
Again, that was my fail attempt to achieve my goal. The problem is the speed. It is too slow. It took more than 7 minutes. Are there any method to quickly create a file from a large binary string?
Thanks in advance,
CLearner
I'd suggest removing the substr call in the inner loop. You are allocating a new string and then destroying it for each character that you process. Replace this code:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
Dec += Inc;
by something like:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
Dec += Inc;
The majority of your time is spent here:
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
When I comment that out the file is written in seconds. I think you need to finetune your conversion.
I think I'd consider something like this as a starting point:
#include <bitset>
#include <fstream>
#include <algorithm>
int main() {
std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
std::ofstream out("junk.bin", std::ios::binary | std::ios::out);
std::transform(std::istream_iterator<std::bitset<8> >(in),
std::istream_iterator<std::bitset<8> >(),
std::ostream_iterator<unsigned char>(out),
[](std::bitset<8> const &b) { return b.to_ulong();});
return 0;
}
Doing a quick test, this processes an input file of 80 million bytes in about 6 seconds on my machine. Unless your files are much larger than what you've mentioned in your question, my guess is this is adequate speed, and the simplicity is going to be hard to beat.
Something not entirely unlike this should be significantly faster:
void
text_to_binary_file(const std::string& text, const char *fname)
{
unsigned char wbuf[4096]; // 4k is a good size of "chunk to write to file"
unsigned int i = 0, j = 0;
std::filebuf fp; // dropping down to filebufs may well be faster
// for this problem
fp.open(fname, std::ios::out|std::ios::trunc);
memset(wbuf, 0, 4096);
for (std::string::iterator p = text.begin(); p != text.end(); p++) {
wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
j++;
if (j == CHAR_BIT) {
j = 0;
i++;
}
if (i == 4096) {
if (fp.sputn(wbuf, 4096) != 4096)
abort();
memset(wbuf, 0, 4096);
i = 0;
j = 0;
}
}
if (fp.sputn(wbuf, i+1) != i+1)
abort();
fp.close();
}
Proper error handling left as an exercise.
So instead of converting back and forth between std::strings, why not use a bunch of machine word-sized integers for fast access?
const size_t bufsz = 1000000;
uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);
int i;
for (i = 0; i < bufsz; i++) {
ofile << hex << setw(8) << setfill('0') << buf[i];
// or if you want raw binary data instead of formatted hex:
ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}
delete[] buf;
For me, this runs in a fraction of a second.
Even though late, I want to place my example for handling such strings.
Architecture specific optimizations may use unaligned loads of chars into multiple registers for 'squeezing' out the bits in parallel. This untested example code does not check the chars and avoids alignment and endianness requirements. It assumes the characters of that binary string to represent contiguous octets (bytes) with the most significant bit first, not words and double words, etc., where their specific representation in memory (and in that string) would require special treatment for portability.
//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.
//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());
std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();
//You may treat cases, where (bits % 8 != 0) as error
//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
if(i & 0x7) //bit 7 ... 1
{
byte |= uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
else //bit 0: write and advance to the the next most significant bit of an octet
{
byte |= uint8_t(*cIt) - uint8_t('0');
ofs.put(byte);
//advance
++i;
++cIt;
byte = uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
}
ofs.flush();
This make a 76.2 MB (80,000,000 bytes) file of 1010101010101......
#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace std;
int main( void )
{
char Bin=0;
ofstream myfile;
myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
int c=0;
Bin = 0xAA;
while( c!= 80000000 ){
myfile.write(&Bin,1);
c++;
}
myfile.close();
cout << "Done!\n";
return( 0 );
};

C++ Bytes To Bits Conversion And Then Print

Code Taken From: Bytes to Binary in C Credit: BSchlinker
The following code I modified to take more than 1 Byte at a time. I modified it, and got it half working and then got really confused on my loops. :( Ive spent the last day and a half trying to figure it out... but my C++ skills are not really that good (still learning!)
#include <iostream>
using namespace std;
char show_binary(unsigned char u, unsigned char *result,int len);
int main()
{
unsigned char p40[3] = {0x40, 0x00, 0x0a};
unsigned char bits[8*(sizeof(p40))];
int c;
c=sizeof(p40);
show_binary(*p40, bits, 3);
cout << "\n\n";
cout << "BIN = ";
do{
for (int i = 0; i < 8; i++)
printf("%d",bits[i+(8*c)]);
c++;
}while(c < 3);
cout << "\n";
int a;
cin >> a;
return 0;
}
char show_binary(unsigned char u, unsigned char *result, int len)
{
unsigned char mask = 1;
unsigned char bits[8*sizeof(result)];
int a,b,c;
a=0;
b=0;
c=len;
do{
for (int i = 0; i < 8; i++)
bits[i+(8*a)] = (u[&a] & (mask << i)) != 0;
a++;
}while(a < len);
//Need to reverse it?
do{
for (int i = 8; i != -1; i--)
result[i+(8*c)] = bits[i+(8*c)];
b++;
c--;
}while(b < len);
return *result;
}
After I spit out:
cout << "BIN = ";
do{
for (int i = 0; i < 8; i++)
printf("%d",bits[i+(8*c)]);
c++;
}while(c < 3);
Id like to take bit[11] ~ bit[the end] and compute a BYTE every 8 bits. If that makes sense. But first the function should work. Any pro tips on how this should be done? And of course, rip my code apart. I like to learn.
Man, there is a lot going on in this code, so it's hard to know where to start. Suffice to say, you're trying a bit too hard. It sounds like you are trying to 1) pass in a byte array; 2) turn those bytes into a string representation of the binary; and 3) turn that string representation back into a value?
It just so happens I recently did something similar to this in C, which should still work using a C++ compiler.
#include <stdio.h>
#include <string.h>
/* A macro to get a substring */
#define substr(dest, src, dest_size, startPos, strLen) snprintf(dest, dest_size, "%.*s", strLen, src+startPos)
/* Pass in char* array of bytes, get binary representation as string in bitStr */
void str2bs(const char *bytes, size_t len, char *bitStr) {
size_t i;
char buffer[9] = "";
for(i = 0; i < len; i++) {
sprintf(buffer,
"%c%c%c%c%c%c%c%c",
(bytes[i] & 0x80) ? '1':'0',
(bytes[i] & 0x40) ? '1':'0',
(bytes[i] & 0x20) ? '1':'0',
(bytes[i] & 0x10) ? '1':'0',
(bytes[i] & 0x08) ? '1':'0',
(bytes[i] & 0x04) ? '1':'0',
(bytes[i] & 0x02) ? '1':'0',
(bytes[i] & 0x01) ? '1':'0');
strncat(bitStr, buffer, 8);
buffer[0] = '\0';
}
}
To get the string of binary back into a value it can by done with bit shifting:
unsigned char bs2uc(char *bitStr) {
unsigned char val = 0;
int toShift = 0;
int i;
for(i = strlen(bitStr)-1; i >= 0; i--) {
if(bitStr[i] == '1') {
val = (1 << toShift) | val;
}
toShift++;
}
return val;
}
Once you had a binary string you could then take substrings of any arbitrary 8 bits (or less, I guess) and turn them back into bytes.
char *bitStr; /* Let's pretend this is populated with a valid string */
char byte[9] = "";
substr(byte, bitStr, 9, 4, 8);
/* This would create a substring of length 8 starting from index 4 of bitStr */
unsigned char b = bs2uc(byte);
I've actually created a whole suite of value -> binary string -> value functions if you'd like to take a look at them. GitHub - binstr

base32 conversion in C++

does anybody know any commonly used library for C++ that provides methods for encoding and decoding numbers from base 10 to base 32 and viceversa?
Thanks,
Stefano
[Updated] Apparently, the C++ std::setbase() IO manipulator and normal << and >> IO operators only handle bases 8, 10, and 16, and is therefore useless for handling base 32.
So to solve your issue of converting
strings with base 10/32 representation of numbers read from some input to integers in the program
integers in the program to strings with base 10/32 representations to be output
you will need to resort to other functions.
For converting C style strings containing base 2..36 representations to integers, you can use #include <cstdlib> and use the strtol(3) & Co. set of functions.
As for converting integers to strings with arbitrary base... I cannot find an easy answer. printf(3) style format strings only handle bases 8,10,16 AFAICS, just like std::setbase. Anyone?
Did you mean "base 10 to base 32", rather than integer to base32? The latter seems more likely and more useful; by default standard formatted I/O functions generate base 10 string format when dealing with integers.
For the base 32 to integer conversion the standard library strtol() function will do that. For the reciprocal, you don't need a library for something you can easily implement yourself (not everything is a lego brick).
Here's an example, not necessarily the most efficient, but simple;
#include <cstring>
#include <string>
long b32tol( std::string b32 )
{
return strtol( b32.c_str(), 0, 32 ) ;
}
std::string itob32( long i )
{
unsigned long u = *(reinterpret_cast<unsigned long*>)( &i ) ;
std::string b32 ;
do
{
int d = u % 32 ;
if( d < 10 )
{
b32.insert( 0, 1, '0' + d ) ;
}
else
{
b32.insert( 0, 1, 'a' + d - 10 ) ;
}
u /= 32 ;
} while( u > 0 );
return b32 ;
}
#include <iostream>
int main()
{
long i = 32*32*11 + 32*20 + 5 ; // BK5 in base 32
std::string b32 = itob32( i ) ;
long ii = b32tol( b32 ) ;
std::cout << i << std::endl ; // Original
std::cout << b32 << std::endl ; // Converted to b32
std::cout << ii << std::endl ; // Converted back
return 0 ;
}
In direct answer to the original (and now old) question, I don't know of any common library for encoding byte arrays in base32, or for decoding them again afterward. However, I was presented last week with a need to decode SHA1 hash values represented in base32 into their original byte arrays. Here's some C++ code (with some notable Windows/little endian artifacts) that I wrote to do just that, and to verify the results.
Note that in contrast with Clifford's code above, which, if I'm not mistaken, assumes the "base32hex" alphabet mentioned on RFC 4648, my code assumes the "base32" alphabet ("A-Z" and "2-7").
// This program illustrates how SHA1 hash values in base32 encoded form can be decoded
// and then re-encoded in base16.
#include "stdafx.h"
#include <string>
#include <vector>
#include <iostream>
#include <cassert>
using namespace std;
unsigned char Base16EncodeNibble( unsigned char value )
{
if( value >= 0 && value <= 9 )
return value + 48;
else if( value >= 10 && value <= 15 )
return (value-10) + 65;
else //assert(false);
{
cout << "Error: trying to convert value: " << value << endl;
}
return 42; // sentinal for error condition
}
void Base32DecodeBase16Encode(const string & input, string & output)
{
// Here's the base32 decoding:
// The "Base 32 Encoding" section of http://tools.ietf.org/html/rfc4648#page-8
// shows that every 8 bytes of base32 encoded data must be translated back into 5 bytes
// of original data during a decoding process. The following code does this.
int input_len = input.length();
assert( input_len == 32 );
const char * input_str = input.c_str();
int output_len = (input_len*5)/8;
assert( output_len == 20 );
// Because input strings are assumed to be SHA1 hash values in base32, it is also assumed
// that they will be 32 characters (and bytes in this case) in length, and so the output
// string should be 20 bytes in length.
unsigned char *output_str = new unsigned char[output_len];
char curr_char, temp_char;
long long temp_buffer = 0; //formerly: __int64 temp_buffer = 0;
for( int i=0; i<input_len; i++ )
{
curr_char = input_str[i];
if( curr_char >= 'A' && curr_char <= 'Z' )
temp_char = curr_char - 'A';
if( curr_char >= '2' && curr_char <= '7' )
temp_char = curr_char - '2' + 26;
if( temp_buffer )
temp_buffer <<= 5; //temp_buffer = (temp_buffer << 5);
temp_buffer |= temp_char;
// if 8 encoded characters have been decoded into the temp location,
// then copy them to the appropriate section of the final decoded location
if( (i>0) && !((i+1) % 8) )
{
unsigned char * source = reinterpret_cast<unsigned char*>(&temp_buffer);
//strncpy(output_str+(5*(((i+1)/8)-1)), source, 5);
int start_index = 5*(((i+1)/8)-1);
int copy_index = 4;
for( int x=start_index; x<(start_index+5); x++, copy_index-- )
output_str[x] = source[copy_index];
temp_buffer = 0;
// I could be mistaken, but I'm guessing that the necessity of copying
// in "reverse" order results from temp_buffer's little endian byte order.
}
}
// Here's the base16 encoding (for human-readable output and the chosen validation tests):
// The "Base 16 Encoding" section of http://tools.ietf.org/html/rfc4648#page-10
// shows that every byte original data must be encoded as two characters from the
// base16 alphabet - one charactor for the original byte's high nibble, and one for
// its low nibble.
unsigned char out_temp, chr_temp;
for( int y=0; y<output_len; y++ )
{
out_temp = Base16EncodeNibble( output_str[y] >> 4 ); //encode the high nibble
output.append( 1, static_cast<char>(out_temp) );
out_temp = Base16EncodeNibble( output_str[y] & 0xF ); //encode the low nibble
output.append( 1, static_cast<char>(out_temp) );
}
delete [] output_str;
}
int _tmain(int argc, _TCHAR* argv[])
{
//string input = "J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH";
vector<string> input_b32_strings, output_b16_strings, expected_b16_strings;
input_b32_strings.push_back("J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH");
expected_b16_strings.push_back("4EEC41C9238B127268B4392348FB165989BA4A27");
input_b32_strings.push_back("2HPUCIVW2EVBANIWCXOIQZX6N5NDIUSX");
expected_b16_strings.push_back("D1DF4122B6D12A10351615DC8866FE6F5A345257");
input_b32_strings.push_back("U4BDNCBAQFCPVDBL4FBG3AANGWVESI5J");
expected_b16_strings.push_back("A7023688208144FA8C2BE1426D800D35AA4923A9");
// Use the base conversion tool at http://darkfader.net/toolbox/convert/
// to verify that the above base32/base16 pairs are equivalent.
int num_input_strs = input_b32_strings.size();
for(int i=0; i<num_input_strs; i++)
{
string temp;
Base32DecodeBase16Encode(input_b32_strings[i], temp);
output_b16_strings.push_back(temp);
}
for(int j=0; j<num_input_strs; j++)
{
cout << input_b32_strings[j] << endl;
cout << output_b16_strings[j] << endl;
cout << expected_b16_strings[j] << endl;
if( output_b16_strings[j] != expected_b16_strings[j] )
{
cout << "Error in conversion for string " << j << endl;
}
}
return 0;
}
I'm not aware of any commonly-used library devoted to base32 encoding but Crypto++ includes a public domain base32 encoder and decoder.
I don't use cpp, so correct me if I'm wrong. I wrote this code for the sake of translating it from C# to save my acquaintance the trouble. The original source, that which I used to create these methods, is on a different post, here, on stackoverflow:
https://stackoverflow.com/a/10981113/13766753
That being said, here's my solution:
#include <iostream>
#include <math.h>
class Base32 {
public:
static std::string dict;
static std::string encode(int number) {
std::string result = "";
bool negative = false;
if (number < 0) {
negative = true;
}
number = abs(number);
do {
result = Base32::dict[fmod(floor(number), 32)] + result;
number /= 32;
} while(number > 0);
if (negative) {
result = "-" + result;
}
return result;
}
static int decode(std::string str) {
int result = 0;
int negative = 1;
if (str.rfind("-", 0) == 0) {
negative = -1;
str = str.substr(1);
}
for(char& letter : str) {
result += Base32::dict.find(letter);
result *= 32;
}
return result / 32 * negative;
}
};
std::string Base32::dict = "0123456789abcdefghijklmnopqrstuvwxyz";
int main() {
std::cout << Base32::encode(0) + "\n" << Base32::decode(Base32::encode(0)) << "\n";
return 0;
}