does anybody know any commonly used library for C++ that provides methods for encoding and decoding numbers from base 10 to base 32 and viceversa?
Thanks,
Stefano
[Updated] Apparently, the C++ std::setbase() IO manipulator and normal << and >> IO operators only handle bases 8, 10, and 16, and is therefore useless for handling base 32.
So to solve your issue of converting
strings with base 10/32 representation of numbers read from some input to integers in the program
integers in the program to strings with base 10/32 representations to be output
you will need to resort to other functions.
For converting C style strings containing base 2..36 representations to integers, you can use #include <cstdlib> and use the strtol(3) & Co. set of functions.
As for converting integers to strings with arbitrary base... I cannot find an easy answer. printf(3) style format strings only handle bases 8,10,16 AFAICS, just like std::setbase. Anyone?
Did you mean "base 10 to base 32", rather than integer to base32? The latter seems more likely and more useful; by default standard formatted I/O functions generate base 10 string format when dealing with integers.
For the base 32 to integer conversion the standard library strtol() function will do that. For the reciprocal, you don't need a library for something you can easily implement yourself (not everything is a lego brick).
Here's an example, not necessarily the most efficient, but simple;
#include <cstring>
#include <string>
long b32tol( std::string b32 )
{
return strtol( b32.c_str(), 0, 32 ) ;
}
std::string itob32( long i )
{
unsigned long u = *(reinterpret_cast<unsigned long*>)( &i ) ;
std::string b32 ;
do
{
int d = u % 32 ;
if( d < 10 )
{
b32.insert( 0, 1, '0' + d ) ;
}
else
{
b32.insert( 0, 1, 'a' + d - 10 ) ;
}
u /= 32 ;
} while( u > 0 );
return b32 ;
}
#include <iostream>
int main()
{
long i = 32*32*11 + 32*20 + 5 ; // BK5 in base 32
std::string b32 = itob32( i ) ;
long ii = b32tol( b32 ) ;
std::cout << i << std::endl ; // Original
std::cout << b32 << std::endl ; // Converted to b32
std::cout << ii << std::endl ; // Converted back
return 0 ;
}
In direct answer to the original (and now old) question, I don't know of any common library for encoding byte arrays in base32, or for decoding them again afterward. However, I was presented last week with a need to decode SHA1 hash values represented in base32 into their original byte arrays. Here's some C++ code (with some notable Windows/little endian artifacts) that I wrote to do just that, and to verify the results.
Note that in contrast with Clifford's code above, which, if I'm not mistaken, assumes the "base32hex" alphabet mentioned on RFC 4648, my code assumes the "base32" alphabet ("A-Z" and "2-7").
// This program illustrates how SHA1 hash values in base32 encoded form can be decoded
// and then re-encoded in base16.
#include "stdafx.h"
#include <string>
#include <vector>
#include <iostream>
#include <cassert>
using namespace std;
unsigned char Base16EncodeNibble( unsigned char value )
{
if( value >= 0 && value <= 9 )
return value + 48;
else if( value >= 10 && value <= 15 )
return (value-10) + 65;
else //assert(false);
{
cout << "Error: trying to convert value: " << value << endl;
}
return 42; // sentinal for error condition
}
void Base32DecodeBase16Encode(const string & input, string & output)
{
// Here's the base32 decoding:
// The "Base 32 Encoding" section of http://tools.ietf.org/html/rfc4648#page-8
// shows that every 8 bytes of base32 encoded data must be translated back into 5 bytes
// of original data during a decoding process. The following code does this.
int input_len = input.length();
assert( input_len == 32 );
const char * input_str = input.c_str();
int output_len = (input_len*5)/8;
assert( output_len == 20 );
// Because input strings are assumed to be SHA1 hash values in base32, it is also assumed
// that they will be 32 characters (and bytes in this case) in length, and so the output
// string should be 20 bytes in length.
unsigned char *output_str = new unsigned char[output_len];
char curr_char, temp_char;
long long temp_buffer = 0; //formerly: __int64 temp_buffer = 0;
for( int i=0; i<input_len; i++ )
{
curr_char = input_str[i];
if( curr_char >= 'A' && curr_char <= 'Z' )
temp_char = curr_char - 'A';
if( curr_char >= '2' && curr_char <= '7' )
temp_char = curr_char - '2' + 26;
if( temp_buffer )
temp_buffer <<= 5; //temp_buffer = (temp_buffer << 5);
temp_buffer |= temp_char;
// if 8 encoded characters have been decoded into the temp location,
// then copy them to the appropriate section of the final decoded location
if( (i>0) && !((i+1) % 8) )
{
unsigned char * source = reinterpret_cast<unsigned char*>(&temp_buffer);
//strncpy(output_str+(5*(((i+1)/8)-1)), source, 5);
int start_index = 5*(((i+1)/8)-1);
int copy_index = 4;
for( int x=start_index; x<(start_index+5); x++, copy_index-- )
output_str[x] = source[copy_index];
temp_buffer = 0;
// I could be mistaken, but I'm guessing that the necessity of copying
// in "reverse" order results from temp_buffer's little endian byte order.
}
}
// Here's the base16 encoding (for human-readable output and the chosen validation tests):
// The "Base 16 Encoding" section of http://tools.ietf.org/html/rfc4648#page-10
// shows that every byte original data must be encoded as two characters from the
// base16 alphabet - one charactor for the original byte's high nibble, and one for
// its low nibble.
unsigned char out_temp, chr_temp;
for( int y=0; y<output_len; y++ )
{
out_temp = Base16EncodeNibble( output_str[y] >> 4 ); //encode the high nibble
output.append( 1, static_cast<char>(out_temp) );
out_temp = Base16EncodeNibble( output_str[y] & 0xF ); //encode the low nibble
output.append( 1, static_cast<char>(out_temp) );
}
delete [] output_str;
}
int _tmain(int argc, _TCHAR* argv[])
{
//string input = "J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH";
vector<string> input_b32_strings, output_b16_strings, expected_b16_strings;
input_b32_strings.push_back("J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH");
expected_b16_strings.push_back("4EEC41C9238B127268B4392348FB165989BA4A27");
input_b32_strings.push_back("2HPUCIVW2EVBANIWCXOIQZX6N5NDIUSX");
expected_b16_strings.push_back("D1DF4122B6D12A10351615DC8866FE6F5A345257");
input_b32_strings.push_back("U4BDNCBAQFCPVDBL4FBG3AANGWVESI5J");
expected_b16_strings.push_back("A7023688208144FA8C2BE1426D800D35AA4923A9");
// Use the base conversion tool at http://darkfader.net/toolbox/convert/
// to verify that the above base32/base16 pairs are equivalent.
int num_input_strs = input_b32_strings.size();
for(int i=0; i<num_input_strs; i++)
{
string temp;
Base32DecodeBase16Encode(input_b32_strings[i], temp);
output_b16_strings.push_back(temp);
}
for(int j=0; j<num_input_strs; j++)
{
cout << input_b32_strings[j] << endl;
cout << output_b16_strings[j] << endl;
cout << expected_b16_strings[j] << endl;
if( output_b16_strings[j] != expected_b16_strings[j] )
{
cout << "Error in conversion for string " << j << endl;
}
}
return 0;
}
I'm not aware of any commonly-used library devoted to base32 encoding but Crypto++ includes a public domain base32 encoder and decoder.
I don't use cpp, so correct me if I'm wrong. I wrote this code for the sake of translating it from C# to save my acquaintance the trouble. The original source, that which I used to create these methods, is on a different post, here, on stackoverflow:
https://stackoverflow.com/a/10981113/13766753
That being said, here's my solution:
#include <iostream>
#include <math.h>
class Base32 {
public:
static std::string dict;
static std::string encode(int number) {
std::string result = "";
bool negative = false;
if (number < 0) {
negative = true;
}
number = abs(number);
do {
result = Base32::dict[fmod(floor(number), 32)] + result;
number /= 32;
} while(number > 0);
if (negative) {
result = "-" + result;
}
return result;
}
static int decode(std::string str) {
int result = 0;
int negative = 1;
if (str.rfind("-", 0) == 0) {
negative = -1;
str = str.substr(1);
}
for(char& letter : str) {
result += Base32::dict.find(letter);
result *= 32;
}
return result / 32 * negative;
}
};
std::string Base32::dict = "0123456789abcdefghijklmnopqrstuvwxyz";
int main() {
std::cout << Base32::encode(0) + "\n" << Base32::decode(Base32::encode(0)) << "\n";
return 0;
}
Related
I have written a program that sets up a client/server TCP socket over which the user sends an integer value to the server through the use of a terminal interface. On the server side I am executing byte commands for which I need hex values stored in my array.
sprint(mychararray, %X, myintvalue);
This code takes my integer and prints it as a hex value into a char array. The only problem is when I use that array to set my commands it registers as an ascii char. So for example if I send an integer equal to 3000 it is converted to 0x0BB8 and then stored as 'B''B''8' which corresponds to 42 42 38 in hex. I have looked all over the place for a solution, and have not been able to come up with one.
Finally came up with a solution to my problem. First I created an array and stored all hex values from 1 - 256 in it.
char m_list[256]; //array defined in class
m_list[0] = 0x00; //set first array index to zero
int count = 1; //count variable to step through the array and set members
while (count < 256)
{
m_list[count] = m_list[count -1] + 0x01; //populate array with hex from 0x00 - 0xFF
count++;
}
Next I created a function that lets me group my hex values into individual bytes and store into the array that will be processing my command.
void parse_input(char hex_array[], int i, char ans_array[])
{
int n = 0;
int j = 0;
int idx = 0;
string hex_values;
while (n < i-1)
{
if (hex_array[n] = '\0')
{
hex_values = '0';
}
else
{
hex_values = hex_array[n];
}
if (hex_array[n+1] = '\0')
{
hex_values += '0';
}
else
{
hex_values += hex_array[n+1];
}
cout<<"This is the string being used in stoi: "<<hex_values; //statement for testing
idx = stoul(hex_values, nullptr, 16);
ans_array[j] = m_list[idx];
n = n + 2;
j++;
}
}
This function will be called right after my previous code.
sprint(mychararray, %X, myintvalue);
void parse_input(arrayA, size of arrayA, arrayB)
Example: arrayA = 8byte char array, and arrayB is a 4byte char array. arrayA should be double the size of arrayB since you are taking two ascii values and making a byte pair. e.g 'A' 'B' = 0xAB
While I was trying to understand your question I realized what you needed was more than a single variable. You needed a class, this is because you wished to have a string that represents the hex code to be printed out and also the number itself in the form of an unsigned 16 bit integer, which I deduced would be something like unsigned short int. So I created a class that did all this for you named hexset (I got the idea from bitset), here:
#include <iostream>
#include <string>
class hexset {
public:
hexset(int num) {
this->hexnum = (unsigned short int) num;
this->hexstring = hexset::to_string(num);
}
unsigned short int get_hexnum() {return this->hexnum;}
std::string get_hexstring() {return this->hexstring;}
private:
static std::string to_string(int decimal) {
int length = int_length(decimal);
std::string ret = "";
for (int i = (length > 1 ? int_length(decimal) - 1 : length); i >= 0; i--) {
ret = hex_arr[decimal%16]+ret;
decimal /= 16;
}
if (ret[0] == '0') {
ret = ret.substr(1,ret.length()-1);
}
return "0x"+ret;
}
static int int_length(int num) {
int ret = 1;
while (num > 10) {
num/=10;
++ret;
}
return ret;
}
static constexpr char hex_arr[16] = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
unsigned short int hexnum;
std::string hexstring;
};
constexpr char hexset::hex_arr[16];
int main() {
int number_from_file = 3000; // This number is in all forms technically, hex is just another way to represent this number.
hexset hex(number_from_file);
std::cout << hex.get_hexstring() << ' ' << hex.get_hexnum() << std::endl;
return 0;
}
I assume you'll probably want to do some operator overloading to make it so you can add and subtract from this number or assign new numbers or do any kind of mathematical or bit shift operation.
The following code (in C++) is supposed to get some data along with it's size (in terms of bytes) and return the string containing the hexadecimal code. size is the size of the memory block with its location stored in val.
std::string byteToHexString(const unsigned char* val, unsigned long long size)
{
unsigned char temp;
std::string vf;
vf.resize(2 * size+1);
for(unsigned long long i= 0; i < size; i++)
{
temp = val[i] / 16;
vf[2*i] = (temp <= 9)? '0' + temp: 'A' + temp - 10; // i.e., (10 = 9 + 1)
temp = val[i] % 16;
vf[2*i+1] = (temp <= 9)? '0' + temp: 'A' + temp - 10; // i.e., (10 = 9 + 1)
}
vf[2*size] = '\0';
return (vf);
}
So on executing the above function the following way:
int main()
{
unsigned int a = 5555;
std::cout << byteToHexString((unsigned char*)(&a), 4);
return 0;
}
The output we obtain is:
B3150000
Shouldn't the output rather be 000015B3? So why is this displaying in reverse order? Is there something wrong with the code (I am using g++ compiler in Ubuntu)?
You are seeing the order in which bytes are stored for representing integers on your architecture, which happens to be little-endian. That means, the least-significant byte comes first.
If you want to display it in normal numeric form, you either need to detect the endianness of your architecture and switch the code accordingly, or just use a string stream:
unsigned int a = 5555;
std::ostringstream ss;
ss << std::setfill( '0' ) << std::setw( sizeof(a)*2 ) << std::hex << a;
std::cout << ss.str() << std::endl;
Goal
My goal is to quickly create a file from a large binary string (a string that contains only 1 and 0).
Straight to the point
I need a function that can achieve my goal. If I am not clear enough, please read on.
Example
Test.exe is running...
.
Inputted binary string:
1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
Done!
File(Test.txt) In Byte(s):
0xFF, 0xAA
.
Test.exe executed successfully!
Explanation
First, Test.exe requested the user to input a binary string.
Then, it converted the inputted binary string to hexadecimal.
Finally, it wrote the converted value to a file called Test.txt.
I've tried
As an fail attempt to achieve my goal, I've created this simple (and possibly horrible) function (hey, at least I tried):
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr )
{
std::ofstream OutputFile( Destination, std::ofstream::binary );
for( ::UINT Index1 = 0, Dec = 0;
// 8-Bit binary.
Index1 != BinaryStr.length( )/8;
// Get the next set of binary value.
// Write the decimal value as unsigned char to file.
// Reset decimal value to 0.
++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
{
// Convert the 8-bit binary to hexadecimal using the
// positional notation method - this is how its done:
// http://www.wikihow.com/Convert-from-Binary-to-Decimal
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
}
OutputFile.close( );
};
Example of usage
#include "Global.h"
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr );
int main( void )
{
std::string Bin = "";
// Create a binary string that is a size of 9.53674 mb
// Note: The creation of this string will take awhile.
// However, I only start to calculate the speed of writing
// and converting after it is done generating the string.
// This string is just created for an example.
std::cout << "Generating...\n";
while( Bin.length( ) != 80000000 )
Bin += "10101010";
std::cout << "Writing...\n";
BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );
std::cout << "Done!\n";
#ifdef IS_DEBUGGING
std::cout << "Paused...\n";
::getchar( );
#endif
return( 0 );
};
Problem
Again, that was my fail attempt to achieve my goal. The problem is the speed. It is too slow. It took more than 7 minutes. Are there any method to quickly create a file from a large binary string?
Thanks in advance,
CLearner
I'd suggest removing the substr call in the inner loop. You are allocating a new string and then destroying it for each character that you process. Replace this code:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
Dec += Inc;
by something like:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
Dec += Inc;
The majority of your time is spent here:
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
When I comment that out the file is written in seconds. I think you need to finetune your conversion.
I think I'd consider something like this as a starting point:
#include <bitset>
#include <fstream>
#include <algorithm>
int main() {
std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
std::ofstream out("junk.bin", std::ios::binary | std::ios::out);
std::transform(std::istream_iterator<std::bitset<8> >(in),
std::istream_iterator<std::bitset<8> >(),
std::ostream_iterator<unsigned char>(out),
[](std::bitset<8> const &b) { return b.to_ulong();});
return 0;
}
Doing a quick test, this processes an input file of 80 million bytes in about 6 seconds on my machine. Unless your files are much larger than what you've mentioned in your question, my guess is this is adequate speed, and the simplicity is going to be hard to beat.
Something not entirely unlike this should be significantly faster:
void
text_to_binary_file(const std::string& text, const char *fname)
{
unsigned char wbuf[4096]; // 4k is a good size of "chunk to write to file"
unsigned int i = 0, j = 0;
std::filebuf fp; // dropping down to filebufs may well be faster
// for this problem
fp.open(fname, std::ios::out|std::ios::trunc);
memset(wbuf, 0, 4096);
for (std::string::iterator p = text.begin(); p != text.end(); p++) {
wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
j++;
if (j == CHAR_BIT) {
j = 0;
i++;
}
if (i == 4096) {
if (fp.sputn(wbuf, 4096) != 4096)
abort();
memset(wbuf, 0, 4096);
i = 0;
j = 0;
}
}
if (fp.sputn(wbuf, i+1) != i+1)
abort();
fp.close();
}
Proper error handling left as an exercise.
So instead of converting back and forth between std::strings, why not use a bunch of machine word-sized integers for fast access?
const size_t bufsz = 1000000;
uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);
int i;
for (i = 0; i < bufsz; i++) {
ofile << hex << setw(8) << setfill('0') << buf[i];
// or if you want raw binary data instead of formatted hex:
ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}
delete[] buf;
For me, this runs in a fraction of a second.
Even though late, I want to place my example for handling such strings.
Architecture specific optimizations may use unaligned loads of chars into multiple registers for 'squeezing' out the bits in parallel. This untested example code does not check the chars and avoids alignment and endianness requirements. It assumes the characters of that binary string to represent contiguous octets (bytes) with the most significant bit first, not words and double words, etc., where their specific representation in memory (and in that string) would require special treatment for portability.
//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.
//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());
std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();
//You may treat cases, where (bits % 8 != 0) as error
//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
if(i & 0x7) //bit 7 ... 1
{
byte |= uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
else //bit 0: write and advance to the the next most significant bit of an octet
{
byte |= uint8_t(*cIt) - uint8_t('0');
ofs.put(byte);
//advance
++i;
++cIt;
byte = uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
}
ofs.flush();
This make a 76.2 MB (80,000,000 bytes) file of 1010101010101......
#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace std;
int main( void )
{
char Bin=0;
ofstream myfile;
myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
int c=0;
Bin = 0xAA;
while( c!= 80000000 ){
myfile.write(&Bin,1);
c++;
}
myfile.close();
cout << "Done!\n";
return( 0 );
};
This is my fourth attempt at doing base64 encoding. My first tries work but it isn't standard. It's also extremely slow!!! I used vectors and push_back and erase a lot.
So I decided to re-write it and this is much much faster! Except that it loses data. -__-
I need as much speed as I can possibly get because I'm compressing a pixel buffer and base64 encoding the compressed string. I'm using ZLib. The images are 1366 x 768 so yeah.
I do not want to copy any code I find online because... Well, I like to write things myself and I don't like worrying about copyright stuff or having to put a ton of credits from different sources all over my code..
Anyway, my code is as follows below. It's very short and simple.
const static std::string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
inline bool IsBase64(std::uint8_t C)
{
return (isalnum(C) || (C == '+') || (C == '/'));
}
std::string Copy(std::string Str, int FirstChar, int Count)
{
if (FirstChar <= 0)
FirstChar = 0;
else
FirstChar -= 1;
return Str.substr(FirstChar, Count);
}
std::string DecToBinStr(int Num, int Padding)
{
int Bin = 0, Pos = 1;
std::stringstream SS;
while (Num > 0)
{
Bin += (Num % 2) * Pos;
Num /= 2;
Pos *= 10;
}
SS.fill('0');
SS.width(Padding);
SS << Bin;
return SS.str();
}
int DecToBinStr(std::string DecNumber)
{
int Bin = 0, Pos = 1;
int Dec = strtol(DecNumber.c_str(), NULL, 10);
while (Dec > 0)
{
Bin += (Dec % 2) * Pos;
Dec /= 2;
Pos *= 10;
}
return Bin;
}
int BinToDecStr(std::string BinNumber)
{
int Dec = 0;
int Bin = strtol(BinNumber.c_str(), NULL, 10);
for (int I = 0; Bin > 0; ++I)
{
if(Bin % 10 == 1)
{
Dec += (1 << I);
}
Bin /= 10;
}
return Dec;
}
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
int PaddingAmount = ((-Result.size() * 3) & 3);
for (int I = 0; I < PaddingAmount; ++I)
Result += '=';
return Result;
}
std::string DecodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = Data.size(); I > 0; --I)
{
if (Data[I - 1] != '=')
{
std::string Characters = Copy(Data, 0, I);
for (std::size_t J = 0; J < Characters.size(); ++J)
Binary += DecToBinStr(Base64Chars.find(Characters[J]), 6);
break;
}
}
for (std::size_t I = 0; I < Binary.size(); I += 8)
{
Result += (char)BinToDecStr(Copy(Binary, I, 8));
if (I == 0) ++I;
}
return Result;
}
I've been using the above like this:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604)); //IMG.677*604
std::cout<<DecodeBase64(Data); //Prints IMG.677*601
}
As you can see in the above, it prints the wrong string. It's fairly close but for some reason, the 4 is turned into a 1!
Now if I do:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(1366) + "*" + ::ToString(768)); //IMG.1366*768
std::cout<<DecodeBase64(Data); //Prints IMG.1366*768
}
It prints correctly.. I'm not sure what is going on at all or where to begin looking.
Just in-case anyone is curious and want to see my other attempts (the slow ones): http://pastebin.com/Xcv03KwE
I'm really hoping someone could shed some light on speeding things up or at least figuring out what's wrong with my code :l
The main encoding issue is that you are not accounting for data that is not a multiple of 6 bits. In this case, the final 4 you have is being converted into 0100 instead of 010000 because there are no more bits to read. You are supposed to pad with 0s.
After changing your Copy like this, the final encoded character is Q, instead of the original E.
std::string data = Str.substr(FirstChar, Count);
while(data.size() < Count) data += '0';
return data;
Also, it appears that your logic for adding padding = is off because it is adding one too many = in this case.
As far as comments on speed, I'd focus primarily on trying to reduce your usage of std::string. The way you are currently converting the data into a string with 0 and 1 is pretty inefficent considering that the source could be read directly with bitwise operators.
I'm not sure whether I could easily come up with a slower method of doing Base-64 conversions.
The code requires 4 headers (on Mac OS X 10.7.5 with G++ 4.7.1) and the compiler option -std=c++11 to make the #include <cstdint> acceptable:
#include <string>
#include <iostream>
#include <sstream>
#include <cstdint>
It also requires a function ToString() that was not defined; I created:
std::string ToString(int value)
{
std::stringstream ss;
ss << value;
return ss.str();
}
The code in your main() — which is what uses the ToString() function — is a little odd: why do you need to build a string from pieces instead of simply using "IMG.677*604"?
Also, it is worth printing out the intermediate result:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));
std::cout << Data << std::endl;
std::cout << DecodeBase64(Data) << std::endl; //Prints IMG.677*601
}
This yields:
SU1HLjY3Nyo2MDE===
IMG.677*601
The output string (SU1HLjY3Nyo2MDE===) is 18 bytes long; that has to be wrong as a valid Base-64 encoded string has to be a multiple of 4 bytes long (as three 8-bit bytes are encoded into four bytes each containing 6 bits of the original data). This immediately tells us there are problems. You should only get zero, one or two pad (=) characters; never three. This also confirms that there are problems.
Removing two of the pad characters leaves a valid Base-64 string. When I use my own home-brew Base-64 encoding and decoding functions to decode your (truncated) output, it gives me:
Base64:
0x0000: SU1HLjY3Nyo2MDE=
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 31 00 IMG.677*601.
Thus it appears you have encode the null terminating the string. When I encode IMG.677*604, the output I get is:
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 34 IMG.677*604
Base64: SU1HLjY3Nyo2MDQ=
You say you want to speed up your code. Quite apart from fixing it so that it encodes correctly (I've not really studied the decoding), you will want to avoid all the string manipulation you do. It should be a bit manipulation exercise, not a string manipulation exercise.
I have 3 small encoding routines in my code, to encode triplets, doublets and singlets:
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
This is written as C code rather than using native C++ idioms, but the code shown should compile with C++ (unlike the C99 initializers elsewhere in the source). The base_64_map[] array corresponds to your Base64Chars string. The pad character passed in is normally '=', but can be '\0' since the system I work with has eccentric ideas about not needing padding (pre-dating my involvement in the code, and it uses a non-standard alphabet to boot) and the code handles both the non-standard and the RFC 3548 standard.
The driving code is:
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
int base64_encode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
return(base64_encode_internal(data, datalen, buffer, buflen, base64_pad));
}
The base64_pad constant is the '='; there's also a base64_encode_nopad() function that supplies '\0' instead. The errors are somewhat arbitrary but relevant to the code.
The main point to take away from this is that you should be doing bit manipulation and building up a string that is an exact multiple of 4 bytes for a given input.
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
if (Binary.size() % 6)
{
Binary.resize(Binary.size() + 6 - Binary.size() % 6, '0');
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
if (Result.size() % 4)
{
Result.resize(Result.size() + 4 - Result.size() % 4, '=');
}
return Result;
}
How do I detect the length of an integer? In case I had le: int test(234567545);
How do I know how long the int is? Like telling me there is 9 numbers inside it???
*I have tried:**
char buffer_length[100];
// assign directly to a string.
sprintf(buffer_length, "%d\n", 234567545);
string sf = buffer_length;
cout <<sf.length()-1 << endl;
But there must be a simpler way of doing it or more clean...
How about division:
int length = 1;
int x = 234567545;
while ( x /= 10 )
length++;
or use the log10 method from <math.h>.
Note that log10 returns a double, so you'll have to adjust the result.
Make a function :
int count_numbers ( int num) {
int count =0;
while (num !=0) {
count++;
num/=10;
}
return count;
}
Nobody seems to have mentioned converting it to a string, and then getting the length. Not the most performant, but it definitely does it in one line of code :)
int num = -123456;
int len = to_string(abs(num)).length();
cout << "LENGTH of " << num << " is " << len << endl;
// prints "LENGTH of 123456 is 6"
You can use stringstream for this as shown below
stringstream ss;
int i = 234567545;
ss << i;
cout << ss.str().size() << endl;
if "i" is the integer, then
int len ;
char buf[33] ;
itoa (i, buf, 10) ; // or maybe 16 if you want base-16 ?
len = strlen(buf) ;
if(i < 0)
len-- ; // maybe if you don't want to include "-" in length ?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
int i=2384995;
char buf[100];
itoa(i, buf, 10); // 10 is the base decimal
printf("Lenght: %d\n", strlen(buf));
return 0;
}
Beware that itoa is not a standard function, even if it is supported by many compilers.
len=1+floor(log10(n));//c++ code lib (cmath)
looking across the internet it's common to make the mistake of initializing the counter variable to 0 and then entering a pre-condition loop testing for as long as the count does not equal 0. a do-while loop is perfect to avoid this.
unsigned udc(unsigned u) //unsigned digit count
{
unsigned c = 0;
do
++c;
while ((u /= 10) != 0);
return c;
}
it's probably cheaper to test whether u is less than 10 to avoid the uneccessary division, increment, and cmp instructions for cases where u < 10.
but while on that subject, optimization, you could simply test u against constant powers of ten.
unsigned udc(unsigned u) //unsigned digit count
{
if (u < 10) return 1;
if (u < 100) return 2;
if (u < 1000) return 3;
//...
return 0; //number was not supported
}
which saves you 3 instructions per digit, but is less adaptable for different radixes inaddition to being not as attractive, and tedious to write by hand, in which case you'd rather write a routine to write the routine before inserting it into your program. because C only supports very finite numbers, 64bit,32bit,16bit,8bit, you could simply limit yourself to the maximum when generating the routine to benefit all sizes.
to account for negative numbers, you'd simply negate u if u < 0 before counting the number of digits. of course first making the routine support signed numbers.
if you know that u < 1000,
it's probably easier to just write, instead of writing the routine.
if (u > 99) len = 3;
else
if (u > 9) len = 2;
else len = 1;
Here are a few different C++ implementations* of a function named digits() which takes a size_t as argument and returns its number of digits. If your number is negative, you are going to have to pass its absolute value to the function in order for it to work properly:
The While Loop
int digits(size_t i)
{
int count = 1;
while (i /= 10) {
count++;
}
return count;
}
The Exhaustive Optimization Technique
int digits(size_t i) {
if (i > 9999999999999999999ull) return 20;
if (i > 999999999999999999ull) return 19;
if (i > 99999999999999999ull) return 18;
if (i > 9999999999999999ull) return 17;
if (i > 999999999999999ull) return 16;
if (i > 99999999999999ull) return 15;
if (i > 9999999999999ull) return 14;
if (i > 999999999999ull) return 13;
if (i > 99999999999ull) return 12;
if (i > 9999999999ull) return 11;
if (i > 999999999ull) return 10;
if (i > 99999999ull) return 9;
if (i > 9999999ull) return 8;
if (i > 999999ull) return 7;
if (i > 99999ull) return 6;
if (i > 9999ull) return 5;
if (i > 999ull) return 4;
if (i > 99ull) return 3;
if (i > 9ull) return 2;
return 1;
}
The Recursive Way
int digits(size_t i) { return i < 10 ? 1 : 1 + digits(i / 10); }
Using snprintf() as a Character Counter
⚠ Requires #include <stdio.h> and may incur a significant performance penalty compared to other solutions. This method capitalizes on the fact that snprintf() counts the characters it discards when the buffer is full. Therefore, with the right arguments and format specifiers, we can force snprintf() to give us the number of digits of any size_t.
int digits(size_t i) { return snprintf (NULL, 0, "%llu", i); }
The Logarithmic Way
⚠ Requires #include <cmath> and is unreliable for unsigned integers with more than 14 digits.
// WARNING! There is a silent implicit conversion precision loss that happens
// when we pass a large int to log10() which expects a double as argument.
int digits(size_t i) { return !i? 1 : 1 + log10(i); }
Driver Program
You can use this program to test any function that takes a size_t as argument and returns its number of digits. Just replace the definition of the function digits() in the following code:
#include <iostream>
#include <stdio.h>
#include <cmath>
using std::cout;
// REPLACE this function definition with the one you want to test.
int digits(size_t i)
{
int count = 1;
while (i /= 10) {
count++;
}
return count;
}
// driver code
int main ()
{
const int max = digits(-1ull);
size_t i = 0;
int d;
do {
d = digits(i);
cout << i << " has " << d << " digits." << '\n';
i = d < max ? (!i ? 9 : 10 * i - 1) : -1;
cout << i << " has " << digits(i) << " digits." << '\n';
} while (++i);
}
* Everything was tested on a Windows 10 (64-bit) machine using GCC 12.2.0 in Visual Studio Code .
As long as you are mixing C stdio and C++ iostream, you can use the snprintf NULL 0 trick to get the number of digits in the integer representation of the number. Specifically, per man 3 printf If the string exceeds the size parameter provided and is truncated snprintf() will return
... the number of characters (excluding the terminating null byte)
which would have been written to the final string if enough space
had been available.
This allows snprintf() to be called with the str parameter NULL and the size parameter 0, e.g.
int ndigits = snprintf (NULL, 0, "%d", 234567545)
In your case where you simply wish to output the number of digits required for the representation, you can simply output the return, e.g.
#include <iostream>
#include <cstdio>
int main() {
std::cout << "234567545 is " << snprintf (NULL, 0, "%d", 234567545) <<
" characters\n";
}
Example Use/Output
$ ./bin/snprintf_trick
234567545 is 9 characters
note: the downside to using the snprintf() trick is that you must provide the conversion specifier which will limit the number of digits representable. E.g "%d" will limit to int values while "%lld" would allow space for long long values. The C++ approach using std::stringstream while still limited to numeric conversion using the << operator handles the different integer types without manually specifying the conversion. Something to consider.
second note: you shouldn't dangle the "\n" of the end of your sprintf() conversion. Add the new line as part of your output and you don't have to subtract 1 from the length...